Image Processing - Worcester Polytechnic Institute › ... › unrestricted › MQP_ImageProcessing.pdfProject Number: MQP-SDO-208 Image Processing A Major Qualifying Project Report

Project Number: MQP-SDO-208

Image Processing

A Major Qualifying Project Report

Submitted to The Faculty

of

Worcester Polytechnic Institute

In partial fulfillment of the requirements for the

Degree of Bachelor of Science

by

Manuel Alberto Reyes

Xinzhe Zhang

April, 2017

Approved:

Professor Sarah Olson

This report represents the work of WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely

publishes these reports on its website without editorial or peer review. For more information about the projects program at WPI, please see

http://www.wpi.edu/academics/ugradstudies/project-learning.html

Abstract

Image processing is the set of processes which involve manipulating and extracting informationfrom images with systematic methods. In this paper, we overview the mathematics behind a numberof image processing techniques and their uses. We go into depth on the subjects of the identification offace like features of images and expression of gradients in images through vector graphics. We concludewith unique implementations of those subjects in MATLAB programs.

Executive Summary

As humanity has moved into the information era and computers have become an integral part of howwe live, the demands people have on visual forms of data have increased and new techniques to managesuch data have been developed. The improvements we seek in managing visual data cover a wide rangeincluding more efficient compression and sharing, automated identification and manipulation, and morepowerful tools to manipulate and produce images and video. Computers have improved considerablyover the years, but the resolution and frame rates we demand of our videos and images have grownwith them, forcing us to develop better methods to manage our data. Furthermore, with the growthof the internet our need to be able to share large amound of data is also increasing, meaning that tokeep up better methods for compression are necessary. With that same wealth of data made possiblewith the internet, new forms of pattern analysis are possible with technologies like machine learning.Among the people creating that data, many also want better tools to create and edit said data, oftenfor professional purposes. Those tools are so important that virtually no professional entertainmentprojects do not heavily take advantage of them anymore. To advance all varieties of these technologies,we explore the study of image processing.

In this paper we are going to discuss a few different techniques which cover the domains of objectdetection and image compression and some mathematical methods we will use to implement thosetechniques. After explaining in mathematical detail the process of the algorithms, we will discuss ourimplementations in MATLAB and the effectiveness of the implementations. Most algorithms we willcover have potentially very complex implementations, but we have created basic versions of for thepurpose of understanding how they behave. Along with these descriptions we will discuss how thealgorithms could be expanded to include new functionality.

To cover general edge detection we use the Canny Edge Detection method, first proposed by JohnCanny. It identifies the edges in an image as defined by paths in the image of high contrast. Thismethod can be easily applied to any image to prepare it for further manipulation. We also show howthe method Hough Transform can be applied to the output of the Canny Edge Detector to parameterizesimple shapes in the image such as lines.

We show a completely different approach to object detection with the Viola-Jones object detectionmethod, which is typically used to identify human faces in images. This is approached by training anAI (Artificial Intelligence) system, such as a neural network, to identify when a basic measure of valuesin an image match those of the kind of object to be identified. The measures are quick to computeallowing for the efficient censoring of pedestrian faces in video with the use of a blurring method suchas Gausian blurring.

We also explore image compression with a thin plate spline based vector image compression method.Most popular image storage methods treat images as matrices, but a vector image system stores theimage in terms of parametric functions. Vector based image formats often struggle to store colorgradients well so we address this with the use of thin plate splines. The basic implemntation we showdoes not compress sharp edges in images well, but can show the untapped potential of vector imageformats.

The techniques we explore in this paper are diverse and are not explored in great depth. Weprimarily intend to show the way that many different image processing methods can be expanded tohave greater practical usage.

1

Acknowledgements

We would like to thank our project advisor Professor Sarah Olson for providing plentiful guidance andfor being patient with us.

2

Contents

1 Introduction 71.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2 Edge Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Review of Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Matrix Convolution 112.1 Using Kernels as filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Examples of Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.2 Laplacian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Image Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Splines 173.1 Parametric Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Thin Plate Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Canny Edge Detection 204.1 Gaussian Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Gradient Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3 Nonmaximum Suppression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.4 Blob Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5 Hough Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Viola Jones object detection 245.1 Filter - Haar Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.2 Integral Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.3 Adaptive Boost Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4 Cascading Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Vector Image Approximation 356.1 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 Automated TSP creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Appendix 427.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.1.1 TPS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3

7.2 MATLAB Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.2.1 Code to compress images into the same size: . . . . . . . . . . . . . . . . . . . . 447.2.2 Integral Image code: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.2.3 Haar Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.2.4 Gaussian Blur code: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.2.5 Face Detection by Viola-Jones Algorithm code: . . . . . . . . . . . . . . . . . . 487.2.6 Edge detection code: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487.2.7 Gradient Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.2.8 Hough Transform matlab code: . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.2.9 Canny edge detection code: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4

List of Figures

1.1 Gradient on the x axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2 Gradient on the y axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Gradient on the x and y axes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Example for Gaussian Blurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Original Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 Image after Sharpened . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Example for Gaussian Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Image gradient example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Canny edge detector example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Canny edge detector intermediate steps . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Nonmaximum Suppression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.4 Example of Hough transform method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1 An example when using OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 This is an example for applying the Haar Features . . . . . . . . . . . . . . . . . . . . . 265.3 An example of the integral image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4 Example for the integral image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5 Suppose adaptive boosting separate all of the results into two sides . . . . . . . . . . . 295.6 We give more weight to the classifiers whose result is correct. . . . . . . . . . . . . . . . 295.7 This is the situation that all of the answers in the left side are wrong answers . . . . . . 305.8 The adaboost result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.9 This is an example that how cascading works . . . . . . . . . . . . . . . . . . . . . . . 315.10 This is an example for cascading classifier . . . . . . . . . . . . . . . . . . . . . . . . . 325.11 This is the result for the face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.12 This is the result for face detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.13 This is the result for non-face detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.14 This is the error that the algorithm made . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6.1 Example of simple raster to vector image method . . . . . . . . . . . . . . . . . . . . . 366.2 Mesh gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.3 TPS compression method on cat image . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.4 comparison of TPS compression with different numbers of control points . . . . . . . . 406.5 TPS compression with different numbers of control points on a the gradient image . . . 40

5

List of Tables

5.1 This is the face detection result table that we mentioned in the previous paragragh. Youcan find out that the detection on the single face images are all correct and the detectionrate on multiple faces images is only 62% . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.1 Comparison of TPS control point selection methods . . . . . . . . . . . . . . . . . . . . 396.2 Detailed TPS compression speed on Boston image . . . . . . . . . . . . . . . . . . . . . 416.3 Detailed TPS compression accuracy on Boston image . . . . . . . . . . . . . . . . . . . 41

7.1 Comparison of TPS compression performance . . . . . . . . . . . . . . . . . . . . . . . 427.2 Detailted TPS compression speed comparison . . . . . . . . . . . . . . . . . . . . . . . 427.3 Detailed TPS compression accuracy comparison . . . . . . . . . . . . . . . . . . . . . . 437.4 Detailed TPS compression speed on gradient image . . . . . . . . . . . . . . . . . . . . 437.5 Detailed TPS compression accuracy on gradient image . . . . . . . . . . . . . . . . . . 43

6

Chapter 1

Introduction

In imaging science, image processing is the processing of images using mathematical operations byusing any form of signal processing, the process of extracting useful information from symbolc data,where the input is an image, a series of images, or a video, such as a photograph or video frame; theoutput of image processing may be images, video, a set of characteristics, or other parameters relatedto the image [12][22]. Modern image processing is mainly done with computers because a computercan process complex images and video significantly faster than a human can. Most images and videoare now recorded in digital formats so the study of image processing has become essential to theadvancement of those formats. With advancements in image processing there have been many newimprovements to the formats we store images and video, methods to modify images and video, andsystems to identify complex features of images.

Images and videos in moderns computers are displayed as grids of colored pixels which each containthree colored lights with brightness levels of interger values between 0 and 255. Storing images andvideos as matrices of those values is generally very inefficient so there are many storage formats forcompressing images so that they take less data to store and share. Some image formats like .png can belossless, but because images usually contain much more information than is necessary for the purposesof the image most image and video formats are lossy. Those lossy formats do lose considerable amountsof data density, but in many cases is not noticable to human eyes.

Often when people discuss image and video processing they are referring to using commercialsoftware to edit images and video. The practice of doing such things existed before the creation ofcomputers, but today digital images and videos are relatively easy to edit and there are many newtechniques made possible with advancements in digital image processing. These techniques includechanging color balances, removing blemishes, and splicing elements from different images and videosinto one.

Through the use of computer vision, algorithms can be used to identify particular elements of imageswithout the help of humans. These algorithms are very complex and use artificial intelligence to identifythe contents of an image. An object detection algorithm is one that detects object of a particular classin an image such as humans or buildings. Much of object detection has been researched for the purposeof identifying faces and pedestrians to automatically censor them to protect the privacy of those in thephoto. There are many other applications in image retrieval and video surveillance [34].

7

1.1 Terminology

Later we will discuss a number of methods and techniques used on digital images for vaious purposes.For the purpose of explaining the methods we express bitmap images as matrices with dimensionsmatching the image, but the terms matrix and image may be used interchangeably.

Definition 1.1.1. Monochromatic Image Matrix A is a n×m matrix that corresponds to a greyscalen by m bitmap image and contains integer values between 0 and 255.

Often within a method the values of an image matrix may be manipulated to not contain integers.In these cases the matrix may still be referred to as an image and the matrices ultimately produced byan algorithm are assumed to have its values rounded to the nearest integer. For an image matrix A,Ai,j will be used to refer to the value of A in the (i, j) position.

Definition 1.1.2. Color Image Matrix C is a n ×m × 3 matrix that corresponds to n by m bitmapimage which is comprised of the component monochromatic image matrices R, G, and B of size n×m.

As they are easier to manipulate, color image will generally be expressed as the color channels R,G, and B. This representation is chosen because most modern computers express colored images inthe same way and the methods we use are created for use on such computers. In many cases onlythe greyscale representation of an image will be used to simplify the problem for the purposes ofexplanation because most algorithms applied to a greyscale image can also be applied to any of thethree color channels of an image.

Definition 1.1.3. Greyscale Matrix W is an average of color values of matrices R, G, and B such thatfor all i, j:

Wi,j =Ri,j +Gi,j +Bi,j

3

When discussing the error of an image, particularly in image approximation, it will refer to themean squared error of some output image and its corresponding source image for all color channels.This can be thought of as a measure of the average amount of inaccuracy in color values in one imageto another, with a value of 0 being the exact same image, 65025 being the exact opposite image, and16129 being the average of two random images. Therefore any image with an error less than 16129 hassome resemblance to its source.

1.2 Edge Cases

Many we use involve the values of adjacent or nearby to values being considered in the algorithm whichmay or may not exist. These cases come up most often very near the edges of an image. Though somealgorithms take a different approach, values outside of the scope of an image are assumed to have avalue of 0. This is sufficient for most methods because 0 is often used as a indicator for there being nomeaningful return. When 0 represents the color black this simplification it can cause some error, butas this only occurrs near the edges of the image it is not of great concern.

1.3 Review of Definitions

There are many common definitions which we will refer to later in this paper; they will be briefly listedhere.

8

Definition 1.3.1. One-dimensional Continuous Derivative

f ′(x) =df(x)

dx= lim

h→0

f(x+ h)− f(x)

h

The derivative of the function is a measurement to describe the rate of change of the function.

Definition 1.3.2. Partial Derivatives: A partial derivative of a function of several variables is itsderivative with respect to one of those variables, with the others held constant (as opposed to the totalderivative, in which all variables are allowed to vary). Partial derivatives are used in vector calculusand differential geometry. The partial derivative of a function f(x1, x2, ...) with respect to the variablexi is denoted by

∂f

∂xi(a1, ..., an) = lim

h→0

f(a1, ..., ai + h, ..., an)− f(a1, ..., ai, ...an)

h

Definition 1.3.3. Continuous Gradient: For a continuously valued function g(x, y), ∂g∂x

and ∂g∂y

are therates of change of g in the directions of the x and y axes. The gradinet of g is defined:

∇g =

[∂g

∂x,∂g

∂y

]Definition 1.3.4. Continuous Laplacian Operator: Laplacian Operator is a differential operatorgiven by the divergence of the gradient of a function. The symbols are usually denoted as ∇ · ∇ or∆. The Laplacian ∆f(p) of a function f at a point p is the rate at which the average value of f overspheres centered at p deviates from f(p) as the radius of the sphere grows. The laplacian formula isdenoted as

∆f = ∇ · ∇f = ∇ ·[∂f

∂x,∂f

∂y

]=

[∂

∂x,∂

∂y

]·[∂f

∂x,∂f

∂y

]=∂2f

∂x2+∂2f

∂y2

The magnitude of the gradient can by found by the L2 norm:

||∇f || =

√(∂f

∂x

)2

+

(∂f

∂y

)2

Also, the direction of greatest change in the gradient is given by:

θ = atan2

(∂f∂y

∂f∂x

)Where atan2 is defined:

atan2(x, y) =

arctan( yx) if x > 0,

arctan( yx) + π if x < 0 and y ≥ 0,

arctan( yx)− π if x < 0 and y < 0,

+π2

if x = 0 and y > 0,−π

2if x = 0 and y < 0,

undefined if x = 0 and y = 0

Gradients correspoinding to images can be seen in figures 1.1, 1.2, and 1.3.

9

Figure 1.1: ∇f =[∂f∂x , 0

]. This is the gra-

dient on x direction. Thered line points out the di-rection that the intensitychanges most - from theleft to the right.

Figure 1.2: ∇f =[0 , ∂f∂y

]. This is the gra-

dient on y direction. Thedirection that the intensitychanges most is from thetop to the bottom, whichis pointed out by red line.

Figure 1.3: ∇f =[∂f∂x , ∂f∂y

]. This is the gra-

dient on x,y direction. Aswe can see, the directionthat the intensity changesmost is from the bottomleft to the top right, whichis also the direction thatthe red line points out.

10

Chapter 2

Matrix Convolution

In several algorithms we discuss, finding a weighted sum of every submatrix of a particular size isnecessary. To achieve this, matrix convolution of a kernel representing a weighted sum and an imageis used. Matrix convolution is based on the convolution of functions defined below.

Definition 2.0.1. Convolution of two functions f and g is given by

(f ∗ g)(t) =

∫ ∞−∞

f(τ)g(t− τ)dτ

The convolution of two functions effectively maps the area under both functions as one functionis translated over the other. Convolution can be thought of as a function that produces the sumof the products of values for each combination of the functions. Matrix convolution is a discrete, twodimensional analogue of this process where the values of two matrices are used in place of the functions.

Definition 2.0.2. Matrix Convolution of a n × m kernel K and a n′ × m′ matrix G to produce an+ n′ − 1×m+m′ − 1 matrix M is denoted as

M = K ‡G

and is found by:

Mi,j =n∑k=1

m∑l=1

Kk,lHi−k+1,j−l+1, Ha,b =

{0 if a < 1, n′ < a, b < 1, or m′ < bGa,b otherwise

(2.1)

[31]

For an example, the matrix convolution of two 3× 3 matrices results in the 5× 5 matrix M . Thecomputation for the points (1, 2) and (3, 3) of M produced by such matrices are shown below.

M =

a b cd e fg h i

‡1 2 3

4 5 67 8 9

M1,2 = (a ∗ 0) + (b ∗ 0) + (c ∗ 0) + (d ∗ 0) + (e ∗ 0) + (f ∗ 0) + (g ∗ 0) + (h ∗ 1) + (i ∗ 2)

M3,3 = (a ∗ 1) + (b ∗ 2) + (c ∗ 3) + (d ∗ 4) + (e ∗ 5) + (f ∗ 6) + (g ∗ 7) + (h ∗ 8) + (i ∗ 9)

The bounds of the output matrix are one less than the sum of the original bounds because theoutput matrix acconts for every possible overlap of the input matrices. If the bounds of the output

11

matrix is larger it would simply contain extra rows or columns containing only 0. This is becausethis definition of matrix convolution assumes that values outside of the matrix are 0 which can beproblematic because those out of bounds values represent black which may lead to unwanted darkeningof resulting images. In some cases this can be resolved by instead using the value of the nearest edgeor corner of the image, but this has a small enough effect on the algorithms we use that this is notimplemented. The error caused by assumming values out of the matrix is 0 is illustrated in the equationbelow where the kernel takes the average of the four adjacent values.

K ‡G =1

4

[1 11 1

]‡

223 245 189124 137 10440 12 2

=

58.25 117 108.5 47.2586.75 182.25 168.75 73.25

41 78.25 63.75 26.510 13 3.5 0.5

= M

Despite the top row of G having the largest values, the top row of M has smaller values than the secondrow. This is because the top row of M is the sum the products of the bottom row of K and the toprow of G while the second row of M is the sum of the products of all of K and the first and secondrows of G. Essentially, the top row of M contains values produced by the partial overlap of K and Gso the resulting values are smaller.

Definition 2.0.3. Kernel: A kernel in the context of image processing is a small matrix used toapply effects like blurring, sharpening, outlining or embossing. They’re also used in machine learningfor ‘feature extraction’, a technique for determining the most important portions of an image. To dothis, the convolution of the kernel and some input image is computed.

2.1 Using Kernels as filters

When an image is acquired by a camera or other imaging system, often the vision system for whichit is intended is unable to use it directly. The image may be corrupted by random variations inintensity, variations in illumination, or poor contrast that must be dealt with in the early stages ofimage processing. By using matrix convolution kernels, many of those problems can be correctedthrough simple convolution.

2.1.1 Examples of Kernels

In most cases, the common way to blur an image is through the use of a Gaussian kernel based on thetwo dimensional Gaussian function. Two dimension Gaussian Function is:

G(x, y) =1

2πσ2e−

x2+y2

2σ2 (2.2)

G(x, y) is a function that depends on both x and y and has a standard deviation defined by σ. Byfinding the discreet values from the Gaussian function spaced evenly, a Guassian kernel can be easilycreated. By convoluting such a kernel and an input image, it will produce a blurred image. It also isimportant that when creating the kernel that the values are corrected for by scaling the values so thatthe sum of values is 1, otherwise the brightness of the resulting image will differ.

Using different Kernels create different results:

12

Blurring: This is an example of a non-Gaussian blurring kernel

KB =

−1 1 −11 1 1−1 1 −1

(2.3)

Figure 2.1: The left side is the original picture before blurring. The right side is the picture we blurredthrough convolution with a Gaussian Kernel in Eq 2.3. The original image is taken from [15].

In figure 2.1 it can be seen that the image has been blurred. It is most noticeable in the nose areaof the dog on the left. The image has been blurred by using the MATLAB code in appendix 7.2.4.Because the kernel used is realatively small the blurring is not very noticeable. Blurring is a widelyused effect in image processing, typically to reduce image noise and reduce detail. This sort of blurringis often used to protect one’s privacy by blurring a face or other personal information in an image.

Gaussian Sharpening: Using the following kernel, the image will be sharpened

KS =

0 −1 0−1 5 −10 −1 0

(2.4)

Comparing figure 2.2 and figure 2.3 it can be easily seen that the second image has much moredefined features than the first. Sharpening an image in essence increases the contrast of that imagebecause the kernel listed in equation 2.4 will highlight the central point in the matrices. Sharpeningan image can help some features to be easier to see, but it will also increase the amount of noise inthe image so it is not always useful. Sometimes sharpening an image can help in the process of featureidentification.

13

Figure 2.2: This is the orig-inal image which is very ob-scure. You couldn’t even fig-ure out the shape of the face.The original image is from[23].

Figure 2.3: This is the im-age which is sharpened by us-ing the kernel 2.4. Compar-ing with the figure on theleft, you will find that thisfigure is much clearer. Theimage is taken from [23].

Edge-detection: This kernel will convert an image into an edge-detection image where high valuesindicate areas of high contrast. The detection example is figure 2.4.

KED =

−1 0 −10 4 0−1 0 −1

(2.5)

Figure 2.4: The dog image one the left comes from [15]. The greyscale image on the right is the imagewhich applies the edge kernel 2.5. You can figure out the edges for the car and the dog. This is appliedby Matlab code in the appendix 7.2.6

14

2.1.2 Laplacian Kernel

The Laplacian of a function is usually used to express the combined magnitude of derivatives. In termsof an image, this is similar to an edge detector where places of high rates of change in value are found.An example of a kernel which can produce such a result is shown below:

∇2 =

0 1 01 −4 10 1 0

Using a Laplacian kernel like this will often produce many false positives due to random noise in theinput image. To avoid this, it is common to use a Gaussian blurring kernel first to remove that noise.As long as the blurring effect is not too strong, the important features can still be found. The resultof both a Gaussian kernel and Laplacian kernel is often called a Laplacian of Gaussian (LoG).

2.2 Image Gradients

A partial derivative, and similarly a gradient, of an image is not a well defined idea, but there are manyways to produce an analogouus function which produces matrices that serve the same purposes as thepartial differentiation of a function.For an n×m matrix A, the partial derivative and gradient at point Ai,j can be defined:

∂Ai,j∂x

=Ai+1,j − Ai,j

1,

∂Ai,j∂y

=Ai,j+1 − Ai,j

1

∇Ai,j =

[Ai+1,j − Ai,j

1,Ai,j+1 − Ai,j

1

]This definition is based off of the common method of defining a derivative:

df

dx= lim

x→0

f(x)− f(x− h)

h

In this discrete case, the smallest distance between values is 1 so we set h to 1 to approximate theeffect. Unfortunately, though it is a natural extension of the continuous definition of a derivative, thisis somewhat inprecise way to define the gradient, which does not have matching bounds on the twodimensions of the gradient. This method can also be redefined with convolution matrices.

X =∂A

∂x= Kx ‡ A, Y =

∂A

∂y= Ky ‡ A∇A = [X, Y ] (2.6)

Where Kx and Ky are kernels defined by:

Kx =[−1 1

], Ky =

[−11

]There are a number of other kernels which are used instead and are often considered more effective.

Sobel Operator [30]:

Kx =

−1 0 1−2 0 2−1 0 1

; Ky =

1 2 10 0 0−1 −2 −1

(2.7)

15

Prewitt Operator:

Kx =

−1 0 1−1 0 1−1 0 1

; Ky =

1 1 10 0 0−1 −1 −1

(2.8)

Roberts Cross:

Kx =

[1 00 −1

]; Ky =

[0 1−1 0

](2.9)

Going forward, we will usually use the Sobel operator because it is easy to implement in manysystems and is arguably more accurate than the Prewitt operator [16]. A natural extension of thegradient matrices is to produce matrices D and M which map the direction and magnitude of gradient.

Mi,j =√X2i,j + Y 2

i,j, Ai,j = atan2(Xi,j, Yi,j)

The following images in figure 2.5, are examples of the images produced by an image gradient.By using convolution with matrices like those shown in equation 2.7 with MATLAB code found inappendix 7.2.7, we created images correspoinding to the gradients in the x and y directions and themagnitude of those gradients combined.

(a) Original image (b) Magnitude image M

(c) Differential image X (d) Differential image Y

Figure 2.5: The Zebra image (a) is photographed by Eric Lafforgue [23].

16

Chapter 3

Splines

Splines are Piecewise polynomial functions that are smooth even accross the connecting points of thefunction. These are often used to make smooth curves in vector graphics and engineering. Becausesplines are smooth, defined by only a small number of points or vectors, and easy to manipulate theyare ideal for representing vector graphics. Vector graphics are very useful because they are losslesslyscaleable, but they usually are not practical for representing photographs because the range of colorsand visual noise is too high.

3.1 Parametric Splines

Splines are very commonly defined parametrically to describe a variety of curves in computer graphics.Those splines mainly used for vector graphics. A series of splines are used to create the outline ofobjects in a vector image which are often called paths. The most common of these kinds of splines isthe cubic Bezier curve. Bezier curves are defined by a series two or more control points.

Definition 3.1.1. Bezier curve B(t) is defined for 0 ≤ t ≤ 1 with n control points P0,P1, · · · ,Pn:

B(t) =n∑i=0

(n

i

)(1− t)n−itiPi

A Bezier curve curve can also be recursively defined as the linear interpolation of simpler Bezier curves.Let BP0,P1,··· ,Pn be the Bezier curve defined by points P0,P1, · · · ,Pn. Then any Bezier curve can bewritten:

BP0(t) = P0B(t) = BP0,P1,··· ,Pn(t) = (1− t)BP0,P1,··· ,Pn−1(t) + tBP1,P2,··· ,Pn(t)

A Bezier curve is cubic when n = 3 and there are four control points. Cubic Bezier curves areuseful because they can be used to define a wide range of curves while only needing four control points.Furthermore, cubic Bezier curves can easily approximate more complex, higher order curves with ease.There are many other ways splines can be defined parametrically, but Bezier curves are the most usedbecause of their balance of precision and simplicity.

3.2 Thin Plate Splines

One kind of spline that has great potential for image processing that is not commonly used is the thinplate spline (TPS). TPSs were created to represent a flat surface such as a sheet of metal morphed

17

to some desired shape. TPSs are not parametric and insetead they are defined for higher dimensions,though usually only the second dimensional case is considered. Thin plate splines are defined by controlpoints like Bezier curves, but the resulting surface is best described by an energy function.

Definition 3.2.1. Two Dimensional Thin Plate Spline T produced by the function f(x, y) that fitsto the set of n control points {(xi, yi, zi) | 1 ≤ i ≤ n} is described by a weighted combination of theenergy functions for the error and smoothness of f . The weighting constant p is valid for 0 ≤ p ≤ 1.

Eerror(f) =n∑i=1

||zi − f(xi, yi)||22

Esmooth(f) =

∫∫ [(∂2f

∂x2

)2

+ 2

(∂2f

∂x∂y

)2

+

(∂2f

∂y2

)2]dx dy

E(f) = (1− p)Eerror(f) + pEsmooth(f)

As p approaches a value of 1 the function approximates a plane fit to the least squares regressionof the control points. This definition does not include an explicit definition, but it has been foundby INSERTNAME that TPSs can be easily defined by a weighted sum of radial basis functions and asingle plane.

Definition 3.2.2. Two Dimensional Thin Plate Spline Radial Basis Function f(x, y) that fits to theset of n control points {(xi, yi, zi) | 1 ≤ i ≤ n} is defined generally as:

f(x, y) = a1 + a2x+ a3y +n∑i=1

biφ(||(xi, yi)− (x, y)||22)

where φ is the radial basis funciton defined φ(r) = r2 log(r) and a1, a2, and a3 are coefficients for theunderlying plane and bi is the weighting for radial basis functions corresponding to each control point(xi, yi, zi).

Generally when dealing with a TPS the most conputationally challenging part is obtaining coeffi-cients that define the spline’s function. To approach this problem we can write the system as:[

K DD O

] [ba

]=

[zo

]where Ki,j = φ(||(xi, yi)− (xj, yj)||22), the ith row of D is (1, xi, yi), O is a 3× 3 matrix of zeros, o is a3× 1 vector of zeros, a is the vector containing a1, a2, and a3, and b and z are the vectors respectivelycontaining all values of bi and zi. This also assumes that:

n∑i=1

bi =n∑i=1

bixi =n∑i=1

biyi = 0

3.3 QR Decomposition

One way to find the coefficients a and b is by using QR decomposition. QR decomposition can be usedin the process of solving the linear least squares problem:

min ||Ax− b||22

18

This can be applied to the linear system in the previous section by changing the matrices and vectorsto the same format: [

K DD O

]= L,

[ba

]= c,

[zo

]= ζ

So that the desired value of c is found by:

min ||Lc− ζ||22

Definition 3.3.1. QR Decomposition of a n × n matrix A is the product of matrices Q which isorthogonal and R which is upper triangular such that:

A = QR

For the purpose of comutational ease a permutation matrix P may be used to reorder the rows of Asuch as:

PA = QR

From this it follows that:QTAP = R

Note that because Q is orthogonal and P is a permutation matrix that:

QQT = PP T = I

With the QR decomposition of the matrix A from the linear least squares problem a method for solvingthe linear least squares problem can be derived as follows:

min ||Ax− b||22 = min ||QT (Ax− b)||22= min ||QT (APP Tx− b)||22= min ||(QTAP )P Tx−QTb||22= min ||RP Tx−QTb||22

From this the solution becomes:x = PR−1QTb

Which is relatively easy to compute because R is upper trangular. The most computationally expensivestep of this whole process is the actual creation of the QR decomposition itself. When using the popularGram-Schmidt orthogonalization or Householder reflectors methods on a n×m matrix it takes about2mn2 flops (Floating Point Operations) [18]. All other operations in solving the least squares solutionare significantly faster, so when solving for the coefficients of a TPS function with n control points andn is sufficiently large it takes about 2(n+ 3)3 flops.

19

Chapter 4

Canny Edge Detection

The Canny edge detector is a comprehensive edge detection algorithm that first created in 1986 byJohn Canny. When given a image it returns an image of the same size that has values of one whereverthere are strong edges on the image and zero everywhere else. This method is mainly used for imagesegmentation and it has become one of the most popular edge detection methods. Many improvementshave been made to Canny’s original system and our implementation has several of them[13].

(a) (b)

Figure 4.1: (a) input image. (b) final output of our canny edge detector. The Boston image was takenfrom [1].

Our implementation of the algorithm can be easily split into four distinct steps:

1. Apply Gaussian convolution kernel to the greyscale version of the image

2. Calculate the gradient map on the Gaussian blurred image

3. Apply nonmaximum suppression to thin lines

4. Use blob analysis to remove isolated weak lines

For simplicity our implementation operates on the greyscale. A more nuanced version of thisalgorithm could be performed on each color channel to account for a change in color but not a changein value. This could be solved by taking a union of the outputs, but there is a possibility of there beingtwo very close edges in seperate channels that should be one in the combined output that would behard to detect. The complete MATLAB code for our implementation of the Canny edge detector canbe found in appendix 7.2.9. An example of the input and output can be seen in figure 4.1.

20

4.1 Gaussian Blurring

The first step of the Canny edge detection algorithm is to turn the image into greyscale and apply aGaussain convolution kernel to blur the image. This is done to remove most basic noise so that thegradient map created in the next step. Without this step there will be a very large number of falsepositives and the gradient map found in the next step will be too noisy and the maximum suppressionstep will not be very effective. As to not lose too much detail from the original image the standarddeviation and size of the Gaussian kernel are kept relatively small. A 5× 5 kernel like the one used inour implementation as shown below is often used for this reason.

KG =1

115

2 4 5 4 24 9 12 9 45 12 15 12 54 9 12 9 42 4 5 4 2

5× 5 Gaussian kernel with σ = 1.4

Canny found that, at least in the one dimensional case, the optimal detector to find sharp edgesis a truncated step, but it is very sensitive and produces many unwanted minima and maxima. Thederivative of a Gauss kernel is a much more useful detector because it is easier to find extrema in thedata produced by it, thus a Gaussian kernel is used to filter the image[8]. With this kernel we canproduce a blurred image W ′ from a greyscale input W :

W ′ = W ‡KG

4.2 Gradient Map

To generate a gradient map any of the simple edge detectors discussed in chapter 2 can be used. Inour implementation we use the Sobel operator. With this we can produce matrices M containing themagnitude and A containing the angle of the gradient at any point in W .

X = W ′ ‡Kx, Y = W ′ ‡Ky

Mi,j =√X2i,j + Y 2

i,j, Ai,j = atan2(Xi,j, Yi,j)

The matrix M indicates the degree to which there is a change in value at any point in W . The matrixA indicates the direction of a change in value at any point in W .

4.3 Nonmaximum Suppression

The gradient map created in the previous step at this point is very fuzzy and has some thick linesso nonmaximum suppression is performed on M to produce M ′ which has thin lines that leave onlythe strongest edges along any path. This is done by first simplifiying the matrix A to A′ in order toindicate angles that are only orthogonal or diagonal by rounding to the nearest multiple of π

4. Next for

every point (i, j) in M where the adjacent point (i′, j′) in the direction indicated in the correspoindingpoint (i, j) in A′ if Mi,j < Mi′,j′ then M ′

i,j := 0 else M ′i,j := Mi,j. An example of how this process is

performed can be seen in figure 4.3 and the input and output of this process can be seen in figure 4.2.

21

(a) (b)

Figure 4.2: (a) Image of M .(b) Image of M ′. Only the parts of lines with the highest values remain,leaving the essential parts only. These are intermediate steps produced from manipulating the image infigure 4.1.

Figure 4.3: (a) gradient map with direction of center cell indicated. (b) map with rounded angle,indicating that 134 is to be compared with 214. (c) possible resulting map after nonmaximum suppression.

4.4 Blob Analysis

In this last step a lower and upper threshold are chosen. all “strong” points with values greater thanthe upper threshold are kept in the final image and all “bad” points with values less than the lowerthreshold are removed from the final image. The remaining “weak” points are only kept if they areadjacent to other strong points or adjacent to weak points that connect to strong points so that groupsof isolated weak points are removed. Blob analysis is used to group all orthogonally and diagonallyconnected weak points and check if those groups are also adjacent to any strong points. To do thematrix M ′ is iterated over to produce a matrix E. At every point (i, j) in M any bad points aremarked as 0 in E and strong points are marked as 1 in E. For any weak point (i, j) in M if there isan adjacent point in E which has been assigned a non-zero value then Ei,j takes that value, otherwisea new, previously unused value is assigned to Ei,j. Once all points in E have been assigned a value,all points in E which share a value which is not 1 or 0 and are adjacent to any points with value 1 arechanged to 1. This process is repeated until after an iteration no points were changed. At this pointall remaining points which have values other than 1 or 0 are changed to 0 because they are isolatedgroups of weak points[28].

22

4.5 Hough Transform

Hough Transform is a technique used to find specific shapes in a given black and white image. Thistechnique does not attempt to find a single best fit, but instead identifies how closely any variation ofthe shape fits. Hough transform is often applied to the output of the Canny Edge detection algorithmbecause the image contains the outlines of the image which can help to reveal the elements of the imagethat one intends to identify with Hough transform. The most typical usage of the hough transformis to find the straight lines in an image by defining all lines that lay in the image in terms of anangle θ and radius from center ρ. This is implemented by iterating over a set of possible angles andradii and summing the number of points in the original image that the line passes through. In ourimplementation we tested points on the original image closest to evenly spaced points on the test line.

(a) (b)

Figure 4.4: (a) Black and white input image. (b) image of output from hough transform

As can be seen in item (b) of figure 4.4, there are two bright points that correspond to the lines initem (a) of figure 4.4 at approximately θ = -50, ρ = 35 and θ = 70, ρ = 50. Figure 4.4 was producedby the MATLAB code in appendix 7.2.8.

23

Chapter 5

Viola Jones object detection

The problem to be solved is detection of faces in an image. A human can do this easily, but a computerneeds precise instructions and constraints. However, the Viola-Jones algorithm solved this problemfast and accurate. It made the Viola-Jones algorithm the first real-time face detection algorithm. Itwas proposed by Paul Viola and Micheal Jones in 2001 [33].

The Viola-Jones Algorithm is mainly divided into four steps:

1. Set up the filters (We will use the Haar Features here)

2. Create Integral Images

3. Adaboost Training

4. Cascading Classifiers

To make the task more manageable, Viola-Jones requires full view frontal upright faces. Thus, inorder to be detected, the entire face must point towards the camera and should not be tilted to eitherside. Because the detection step is most often followed by a recognition step, these limits on pose arequite acceptable in practice [33].

5.1 Filter - Haar Features

Haar-like Features, which is proposed by Paul Viola and Micheal Jones in 2001 too, are digital imagefeatures used in object recognition. Compared with pixel kernels, which was discussed in Chapter 2,the Haar-like Features use another way to set up the image filters. The advantage of Haar Featuresis that the features can be used on a finite quantity of training data to improve the accuracy [33].Moreover, this is also one of the earliest as well as the most successful recognition algorithm. HaarFeatures can be used not only on face detection, but also can be extended to detect other objects.In the detection phase of the Viola-Jones object detection framework, a window of the target size is

24

moved over the input image, and for each subsection of the image the Haar-like feature is calculated.This difference is compared to a learned threshold that separates non-target objects from target ob-jects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality isslightly better than random guessing) a large number of Haar-like features are necessary to describe anobject with sufficient accuracy. In the Viola-Jones object detection framework, the Haar-like featuresare therefore organized in something called a classifier cascade to form a strong learner or classifier [26].

Definition 5.1.1. Indicator Function Assume that the Σ is the whole set whose typical element isω and G ⊆ Σ, the indicator function defined as:

1G(ω) =

{0 if ω /∈ G1 if ω ∈ G

is bounded and measurable [29].

Definition 5.1.2. Rectangular Haar-like features A simple rectangular Haar-like feature can bedefined as the difference of the sum of pixels inside the rectangle areas, which can be at any positionand scale within the original image. This modified feature set is called a 2-rectangle feature. The valuesindicate a certain characteristic of a particular area of the image. Each feature type can indicate theexistence (or absence) of certain characteristics in the image, such as edges or changes in texture. Forexample, a 2-rectangle feature can indicate where the border lies between a dark region and a lightregion.Moreover, let A denote an image and P as a feature, both of the same size N ×N . The difference ina feature which is associated with pattern P of image A is defined by

Difference =N∑i=1

N∑j=1

Ai,j1Pi,j (white)−N∑i=1

N∑j=1

Ai,j1Pi,j (black)

To compensate the effect of different lighting conditions, all the images should be mean and variancenormalized beforehand.

Where both ‘1Pi,j white’ and ‘1Pi,j black’ are all indicator functions. Like the indicator function5.1.1.The image A is the whole set Σ, the white strip and black strip areas are subsets G. ‘1Pi,j white’ willequal to 1 if Pi,j locates in the white strip in the selected Haar Features. ‘1Pi,j black’ will equal to1 if Pi,j locates in the black strip in the selected Haar Features. It’s the way for us to calculate thedifference between the black strip and the white strip.

The rectangular part in the image with both white and black strips are the Haar Features. Whatthe machine will do is using the accumulation of the pixels in the black area minus the accumulation ofthe pixels in the white area. This accumulation of pixels is sufficient since all human faces share somesimilar properties. So, these characteristics may be matched by using Haar Features in 5.1. Exampleinclude:

1. The eye region is darker than the upper-cheeks

2. The nose bridge region is brighter than the eyes

25

Figure 5.1: An example when using Haar Features. The original image comes from OpenCV [32].The features in the second column are both 2-rectangular features and the third column are both triple-rectangular features. When we use the Haar Features in these pictures, we need to use the accumulationvalue in the white area to minus the accumulation value in the black area.

In figure 5.1, the rectangle which covers the eyes’ region will have the following result. The accu-mulation of the RGB pixels in the white strips is greater than the accumulation of the RGB pixels inthe black region.

In figure 5.2, using four different Haar-like patterns: The size and position of patterns can varyprovided its black and white rectangles have the same dimension, border each other, and keep theirrelative positions. The number of features one can draw from an image is somewhat manageable: forinstance, a 24×24 image, has 43200 features for the upper patterns and 27600, 20736 for lower patterns.So the picture will have 134736 features [35].

Figure 5.2: Those are the four Haar Features that we are going to use in the project. First two are2-rectangular features which will help us detect the edges better. The third feature will help us detect thenose area better. The forth feature will help us detect mouth better.

26

5.2 Integral Image

Moreover, when we apply the Haar Features on a real image, there will be millions of calculation. Thelarge computational time is due to the calculation of the difference between the white and black regions,it needs to accumulate all of the pixel values together. So, it’s necessary for us to use another way tosave the calculation.

Definition 5.2.1. Integral Image In image processing, an integral image is a data structure andalgorithm for quickly and efficiently generating the sum of values in a rectangular subset. It was intro-duced to computer graphics in 1984 by Frank Crow. In computer vision it was popularized by Lewisand then given the name ‘integral image’ and prominently used within the Viola-Jones object detectionframework in 2001 [19]. Historically, this principle is very well known in the study of multi-dimensionalprobability distribution functions, namely in computing 2D probabilities (area under the probabilitydistribution) from the respective cumulative distribution functions.

Rectangle features can be computed very rapidly using an intermediate representation for the imagewhich is called the integral image. The integral image at location II(i, j) is the sum of the pixels aboveand to the left of i, j, inclusive:

Figure 5.3: This is the way to calculate the integral image. A,B,C,D are four different blocks of animage. 1, 2, 3, 4 are the accumulated value at location 1, 2, 3, 4 in integral image.

IIi,j =i∑

m=0

j∑n=0

Pm,n

Where IIi,j is the integral image and Pi,jis the pixel’s value at location (i, j) in original image. Usingthe following pair of recurrences:

Rowi,j = Rowi,j−1 + Pi,j

IIi,j = IIi−1,j +Rowi,j

where Rowi,j is the cumulative sum in ith row,

Rowi,2 = Rowi,1 + Pi,2 (5.1)

27

A B

CD

21

3 4

Figure 5.4: This is the example to use the image integral on real image. The original image is separatedinto 4 areas - A,B,C,D and the value at point 1, 2, 3, 4 are accumulations in area A,A + B,A + C,A +B + C + D The original image comes from website [3].

In figure 5.4, the integral image’s value at point 1 is the sum of the all of the pixels in area A. Thevalue at point 4 is the sum of the all of the pixels in area A+B +C +D. So when we want to get thedifference between the area A and area D, we just need to use the value at point (4+1−3−2)−1. Thisalgorithm simplifies the calculation into a total of 5 additions and subtractions instead of summing allof the pixels in the images.

5.3 Adaptive Boost Training

Adaptive Boost is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapirein 2003. It can be used in conjunction with many other types of learning algorithms to improve theirperformance. The output of the other learning algorithms (‘weak learners’) is combined into a weightedsum that represents the final output of the boosted classifier. AdaBoost (which is a simplify versionof Adaptive Boosting) is adaptive in the sense that subsequent weak learners are tweaked in favor ofthose instances misclassified by previous classifiers. Problems in machine learning often suffer fromthe curse of dimensionality. Since each sample may consist of a huge number of potential features (forinstance, there can be 162,336 Haar Features, as used by the Viola-Jones object detection framework,in a 24×24 pixel image window), and evaluating every feature can reduce the speed of classifier trainingand execution.

Training Example: The orange circles in Figure 5.5 mean the correct results and the blue circlesmean the wrong results. The line in the graphs are weak classifiers. Each weak classifier can divide allof the results into two sides. Some training examples are shown in figure 5.5.

Definition 5.3.1. Weak Classifiers: Weak classifiers (or weak learners) are classifiers which performonly slightly better than a random classifier. Thus, these are classifiers which have some clue on howto predict the right labels, but not as much as strong classifiers.One of the simplest weak classifiers is the Decision Stump, which is a one-level Decision Tree. It selectsa threshold for one feature and splits the data on that threshold. AdaBoost will then train an army ofthese Decision Stumps which each focus on one part of the characteristics of the data [37].

After doing steps in figure 5.5, we will give one weight to the side which contained more correctresults with this classifier. It becomes like the figure 5.6.

28

Figure 5.5: On the left, these are original results before we start the adaptive boosting. The straightline in the right picture is the first classifier which divids the results into two sides

Figure 5.6: On the left side, circles on the right portion of the image (the side we are going to weight)became bigger, which means those circles have higher weight. Moreover, we train the program a secondtime with a different image like the image on the right.

When we train the program again in the fight figure 5.6, the results will divide all of the resultsinto two sides again. Like the figure 5.7. Moreover, the weak-classifiers on the left side only have oneweight when they detect whether the object is a human face. However, the weak-classifiers on the rightside have two weights when they detect whether the object is a human face. When we keep trainingthe weak classifiers, we can find that in figure 5.7 and 5.8. The more correct results it made during thetraining, the more weight the classifier that will have. After training for millions of times with randompictures, we can believe that the classifier that has the highest weight will have the highest probabilityto make a correct decision.

In other words, the method we used here is similar to the voting system. The voter has moreweights will influence more on the results.

29

Figure 5.7: Sometimes, it will divide all of the weak-classifiers which concludes the wrong answer on theother side. Then, all of the classifiers which gets the correct answer will be weighted into a higher level.

Figure 5.8: After many iterations of the algorithm, the lane separate all of the weak classifiers into twosides and keep giving weights to the side which contains more correct weak classifiers

5.4 Cascading Classifiers

Cascading theory is totally different than the voting or stacking system. However, it is a casewhich is based on the concatenation of different classifiers, using all information collected from theoutput from the previous classifier as the information for the next classifier in the cascade. CascadingClassifiers are trained with thousands of ‘positive’ sample of a particular object and some images witha ‘negative’ sample (not containing the object). After the classifier is trained, it can be applied to aregion of an image and it can detect the object much faster, as shown in figure 5.9. This theory wasalso proposed by Paul Viola and Michael Jones in 2001 [33].

30

All of the outputs

Correct Answers

Classifier 1

Classifier 2

Classifier N

Put all of the results into the first classifier

80% of the total outputs

After filtering N - 2 times

Only 20% of the original outputs are correct answers

Figure 5.9: This is the flow chart describing how all of the results are first given to Classifier 1 and thena subset is given to Classifier 2. This process is repeated until it passes all of the classifiers.

In a cascading classifier, we can set up an random rule for passing the test. The following processingis an example for applying the cascading. For this sample, we set up a rule: ‘If there are exactly twofeatures that return negative result, then the image belongs to class1. Otherwise, it belongs to class2’. When we pass an image into the cascading system, the decision tree will look like:

feature 1 negative

feature 2 negative

feature 3 negative → class2

feature 3 positive→ class1

feature 2 positive

feature 3 negative → class1


feature 1 positive

feature 2 negative

feature 3 negative→ class1


feature 2 positive

feature 3 negative→ class2


When we apply the cascading theory within the Viola-Jones algorithm, there are many steps thatare summarized in the flow chart in figure 5.10. When the number of ‘true’ is larger than 0.6N , whereN is the total number of classifiers, then we classify the test image as a human face. However, sincewe used adaptive boosting in the Viola-Jones algorithm before, the weights for each weak classifier aredifferent. So we need to plug the weights into this algorithm too. By applying the cascading algorithm,some times, we will get an answer when the program passed hundreds of classifiers instead of usingmillions of classifiers.

31

Image

Classifier 1

Classifier 2

Classifier N

True

False

True

The number of true is larger

than 0.6N

Is a human face

Is not a human face

Yes

No

Image

Classifier 1

Classifier 2

Classifier N

True

False

True

The number of true is larger

than 0.6N

Is a human face

Yes

Classifier K

Figure 5.10: In these two images, we supposed that all of the classifiers had the same weights and setup the limitation for passing as 0.6N , where N is the total number of classifiers. So, once the number of‘true’ is larger than 0.6N , the image passed the test.

5.5 Conclusion and Discussion

After using hundreds of images (around 120) [9] [21], we found that Viola-Jones could recognize humanfaces with a high accuracy rate(86.78 percentages 5.1), regardless of whether an image has a singleface or multiple faces. Moreover, for the image that doesn’t have a human face, Viola-Jones algorithmwill return nothing instead of a wrong answer. However, since Viola-Jones face detection is one ofthe earliest face detection algorithms, it’s unavoidable that this algorithm will incorrectly detect andclassify an object as a face. For example: I used red solid lines to mark the face region detected bythe algorithm in appendix 7.2.5. A successful example of identifying a single face using the algorithmis shown in figure 5.11. The successful detection of multiple faces in a single image is shown in figure5.12. In addition, figure 5.13 is an image with some bricks. There are no human faces in this image,so the image generated by Matlab on the right is same as the original image on the left.

Table 5.1: This is the face detection result table that we mentioned in the previous paragragh. You canfind out that the detection on the single face images are all correct and the detection rate on multiplefaces images is only 62% .

Correct Results Number of Images RateSingle Face 88 88 100%No Face 12 15 80%Multiple Faces 5 8 62.5%Total 105 121 86.78%

In figure 5.14, this is an image with many people on a city street. It’s obvious that the Viola-Jonesalgorithm was not able to correctly classify the faces in this image. There are three small red squares inthe generated image on the right, but only one of them is human face. The other two are the buildingroof and empty space. Furthermore, there are still a lot of human faces which were not recognized.There should be several reasons for this situation. The size of the image is 995× 796. However, thereare too many faces and other objects in the Shibuya cross image, which causes a large amount ofnoise in the algorithm. Furthermore, The algorithm is 15 years old and it was state of the art whenit was originally created and many improvements have been made over the past 15 years to increasethe efficiency and correctness of face detection. All in all, Viola-Jones algorithm is a really basic facedetection algorithm.

In this program, we implemented the image compressing part appendix 7.2.1, Haar Features settingup part appendix 7.2.3, Integral Image part appendix 7.2.2. We finally used the Viola-Jones algorithm

32

Figure 5.11: The left one is the original image that is from a downloaded database [17]. The figure onthe right is the image after the face detection algorithm where the red box denotes the region classified asa face.

Figure 5.12: The left one is the original image that is from a downloaded database [11]. The figure onthe right is the image after the face detection algorithm where the red boxes denote the regions classifiedas faces.

which is a built in function in Matlab to generate detection images, which is showing in appendix 7.2.5.Image Processing is a really vast field, and so is face detection. The face detection technology now use‘genetic algorithm’ [36] as well as ‘eigen-face technique’ [5]. These updated algorithms have a higheraccuracy rate and are more efficient than the Viola-Jones algorithm developed 15 years ago. I alsobelieve that in the next few years, with the development of the computer sciences, lots of new detectionalgorithm will be proposed. There are still many improvements in face detection to be made.

33

Figure 5.13: The left one is the original image that comes from online [7]. The figure on the right is theimage after the face detection algorithm where a red box is not seen because no face was detected in theimage.

Figure 5.14: The left one is the original image that is from a downloaded database [9]. The figure onthe right is the image after the face detection algorithm where the red boxes denote the regions classifiedas faces.

34

Chapter 6

Vector Image Approximation

In modern computer graphics systems, the vast majority of image and video data is stored in rasterbased formats like .png, .jpg, .mp4, etc. which are designed around storing matrices of color values.Though there are some formats which are primarily spline based like .swf and .svg, these formats,usually refered to as vector graphics, are generally less popular because there are no generally effectiveways to compress images of real objects while preserving the details of the photographs and becausemost digital screens output in a raster format. Because raster image formats are so much more commonthan vector image formats the usage of vector image formats is relatively unexplored.

There are several distinct advantages to vector image formats compared to raster image formats.One of the most notable of these advantages is the scalability of a vector image. An image stored in araster format has a fixed maximum render size. A computer can easily interprolate a raster image toproduce a new raster of a larger size, but it will inevitably show pixelation or blurring because thereis no way for the system to predict what color a new point in the raster should look like other thanthe adjacent pixels. Vector images on the other hand are defined by continuous splines so renderinga vector image at a larger scale does not produce any jagged or blurred edges. The detail of vectorimages is instead limited by the number of points used to define the image. Another great advantage tovector images is that they can be easily manipulated and animated in ways that can be much harder forraster images. This is because in raster images morphing of shapes causes the same kinds of problemsas shading, while vector images maintain detail regardless of any transformations made to the shapesin the image.

As modern screens are now reaching the point where human eyes cannot distinguish individual pixelsthe small, potentially unnoticeable, details in images may become less important to most uses, at whichpoint vector based images may become more popular for either their scalability, their manipulability,or for their abilty to efficiently compress homogenous parts of an image while keeping detail in otherareas. At the moment, vector images are usually not very effective for the purposes of image compressionbecause most images contain large amount of visual noise or complex shade gradients, both of whichdo not compress well in common vector image systems. Because colored gradients can be representedwith simple continuous functions like splines, it is likely that implementing such functionality couldimprove the compression of images significantly.

6.1 Prior Work

Though vector based image formats are not popular or well developed as compression techniques therehave been a number of attempts to create such systems. It is important to note that in most vector

35

image formats an image is described by a number of control points, the relationship of those controlpoints, and color values associated with individual points or groups of points. In vector image formatsthat use many different colors, like one which associates a color to each control point, the colors willtake up a considerable portion of data. Traditional vector image formats use shapes defined by splinesthat are shaded as solid colors. They are primarily limited by the number of control points because foreach unique color there are many control points associated with it.

Though it is basic, the raster to vector image conversion method created by Dr. Lakshman Prasadand Dr. Alexei Skourikhine show a good baseline for how to approach converting raster images tovector images. Their technique is designed to produce .svg files which do support Bezier splines andways to produce gradients, but instead they opted to simply define the shapes in the image by straightlines. Possibly the most important step in their method is the first where the Canny edge dectectionmethod is used to parse the sections of the image. With the edges identified the algorithm splits theimage into triangular cells which fit within the borders of the edges. The cells are assigned the averagecolor of the area of the source image covered by the cell and the adjacent cells of similar color aregrouped together and made into larger polygons of one color. A few of the steps of this method canbe seen in figure 6.1. This method does not produce images that have great detail in many aspects,but it is efficient and it produces recognizable images. Usage of the Canny edge detector described inchapter 4 is particularly important in this algorithm because it highlights the importance of identifyingthe edges that vector images define elements in an image with [27].

(a) (b) (c)

Figure 6.1: (a) input image (b) after canny edge detection and trangulation (c) final result.Images are taken from [27].

Though there are many ways to implement color gradients into vector image systems, the mostpopular is the usage of mesh gradients. Mesh gradients are connected groups patches of an imagedefined by a closed shape where a color gradient is enclosed by the shape. The colors are associatedwith the control points defining the bounds of the shape so mesh gradient systems usually force allpatches to be either triangular and have three control points or quadrilateral and have four controlpoints in order to keep computing the gradient simple. The shape of patches also can be defined bymore complex edges like cubic Bezier curves as seen in the popular Coons Patch method shown infigure 6.2 (a). A series of these patches can be used together to efficiently produce large sectoins ofimages with shading and gradients. One reason that that mesh gradiets have been cemented as themost popular method for creating gradients in vector images is that the .svg file format supports it.

36

Unfortunately the implementation found in .svg images are limited to quadrilateral based patches whichare normally connected in a grid. Methods for expressing images with mesh gradients generally start bysegmenting the input image into different sections which are assigned seperate meshes [6]. One methodfor converting an image into a mesh gradient capable vector image by Hu, Lai, and Martin focuses onusing grids of patches like those supported in the .svg format, but also includes the functionality toinclude an arbitrary number of holes in a grid can be seen in figure 6.2 (b). This method is reasonablyeffective, but it also shows how requiring the mesh to be a grid of quadrilaterals can cause there to bemany control points that add little to no detail in some parts of the mesh because other nearby areasdemanded many control points to properly represent [14]. A similar method by Liao, Xiu, and Yuinstead uses triangular meshes which are more flexible in their usage, but also introduce many othercomplications [20].

(a) (b)

Figure 6.2: (a) examples of mesh gradients implemented with Coons Patch method; taken from [6].(b) input image, resulting mesh, and final result of method created by Hu et al.; taken from [14].

Another technique for creating vector based color gradients is the usage of diffusion curves originallyproposed by Alexandrina Orzan. Diffusion curves are splines such as cubic Bezier curves that have acolor assigned to each side and a blur value. When rendering the image, the colors are spread to thespace around the curve on its respective side through an iterative process. With this technique a solidobject can easily be evenly shaded by having the colors on the inside of a shape be the same whileenabling colors to change accross the object when necessary. The blurring value is used to blur the colorson a particular curve so that not all the curves in the image have sharp color differentials. Blurring theedges is a simplistic way to solve this problem, but it is very useful for creating shading within objects.Orzan and her associates have found that the performance of diffusion curves is comparable to that ofmesh gradients and in many ways are easier to produce automatically [24].

In response to the lack of smooth edges found in diffusion curves that blurring the curves solvesimperfectly, Finch, Hoppe, and Snyder created a system which uses thin plate splines to create opti-mally smooth color gradients. The thin plate splines produce gradients which are able to guaranteesmoothness in the gradient and continuity in the first derivative. To produce more complex shapes,the system they created also implements tear curves which break continuity, crease curves which breakcontinuity of the first derivative, but maintain continuity the value across the curve, and contour curveswhich constrain the magnitude of the derivative. With these systems, a much greater range of possibleshadings are creatable with the same number of control points as compared to diffusion curves, butthe difficulty of managing the more complex system means that an automated method for producingimages in this system has not been created [10].

37

6.2 Automated TSP creation

Presently no one has created a method for automatically converting an image into a thin plate splinebased image system like the kind created by Hoppe et al. [10]. As a proof of concept, we have createda basic implementation of a TPS image system and a program to convert images into that format.Our TPS system does not not implement any of the constraint curves and instead only has individualconstraint points for the TPS. The program we created was implemented in MATLAB and is able totake in a raster image such as a .png or .jpg, compute the best fitting coefficients for the given controlpoints, and render an image defined by the generated TPS. Runtimes and average error per pixelbetween the original and recreated images were recorded. To remove much of the noise in the inputimage, computations are performed on a slightly blurred version of the image produced with a gaussianconvolution kernel like the kind used in the Canny edge detector described in chapter 4. The imagegradients of that blurred image is also computed for some subroutines in the program. Our programalso does not consider the red, green, and blue color channels simultaneously. Instead it performs thesame operation on each channel seperately and combines them durring rendering.

(a) (b)

Figure 6.3: (a) input image of cat taken from [14]. (b) output of our automatic TPS image converterproduced with 100 control points.

The number of control points can be adjusted, but they are placed in the image randomly and theniteratively moved to adjacent points up to 100 times along the gradient of the original image to a placeto maximze the magnitude of the slope. Other methods of adjusting the selected 100 points were testedand compared on the cat image in figure 6.3 (a). In all four methods, the points are first completelyrandomly chosen and then potentially moved by comparing their positions with the gradient producedthe by LoG as described in chapter 2. The four methods are:

• Randomized points are not moved

• Randomized points are moved to maximize magnitude of the slope

• Randomized points are moved to minimize the magnitude of the slope

• All points are randomized, half maximize the slope, the other half minimize

The program was run with each method only five times each, but maximizing the magnitude of theslope was more consistent than the other methods and it had about 10% less error on average than theother methods so that method was used in all other measurements. Detailed data on the accuracy ofthe different methods can be seen in table 6.1.

38

Technique randomminimize gradientmagnitude

maximize gradientmagnitude

alternating maximizer

1 2,258.5 2,195.2 1,966.3 2,258.52 2,286.1 2,347.1 2,017.3 2,286.13 1,979.7 1,795.6 2,121.9 1,979.74 2,446.6 2,664.8 2,043.6 2,446.65 2,482.3 2,297.3 2,070.3 2,482.3

Average 2,290.6 2,260.0 2,043.9 2,290.6

Table 6.1: Average error produced by TPS compression with 100 control points of the cat image withdifferent control point placement techniques.

6.3 Results and Discussion

The basic system we have impelemnted loses much of the details of the original images, but produces filesmuch reduced in size. Therefore a valuable potential advantage of this system would be in compression.For example, the cat image which is originally a 22.4 KB file in .jpg format produced a file that is 4.21KB in size. This also does not include many simple improvements to the compression that could bemade to the system such as truncating the values stored to be integers instead of floating point valuesor removing certain clear redundancies. Unfortunately it is hard to say how effective this method wouldbe as helping to compress most common kinds of photographs. As can be seen in figure 6.4 it can stillbe difficult to recognize that the image is a city even when 1000 control points are used and the imagesthat used 100 control points or less are unrecognizable. On the other hand, images that rely on havingparticular gradients perform well, like the gradient image in figure 6.5.

Though the TPS compression system can easily render an image that has been already processed,it is not very efficient when compressing. This is beause it involves finding the coefficients for eachcontrol point which minimize the error of the resulting TPS to the original image. Our program usesthe MATLAB function tpaps to find the coefficients. That function finds the coefficients through theusage of QR decomposition in the same way that is outlined in sections 3.2 and 3.3. The inefficiency ofthis method is its greatest weakness and it is difficult to determine whether this can be overcome. As itis unlikely for major improvements in the exact computation of least squares solutions, most potentialimprovements could be found in more efficient approximation algorithms. Unfortunately, it is unclearhow approximation methods could affect the accuracy of the resulting TPS compressed image. Fromthe test data found in table 7.1 it is clear that the speed of the system is related to the size of the inputimage, but the accuracy is not. Furthermore, the consistency of these results can be seen in table 7.2and table 7.3. The gradient image clearly performs much better in terms of accuracy than the otherimages because of how it is fundamentally a basic gradient.

The images produced by our TPS compression program are basic and do not implement many ofthe features developed by Hoppe et al., but the addition of those features could result in unreasonablyslow processing speeds [10]. In testing our system we found that for an image less than a quarter of astandard HD monitor (1920x1080) like the Boston image (620x385) can take considerable amounts oftime. When using 1000 control points that image took about 39 seconds to be processed. Thankfullythe runtime efficiency appears to have an approximately linear growth instead of cubic, though this isa conclution from limited data the needed amount of detail in an image is limited so the pattern mayhold for as much as could practically matter. This can be seen in table 6.2 and table 7.4, where fora larger numbers of control points the increase in time is close to linearly correlated to the number

39

(a) (b) (c)

(d) (e) (f)

Figure 6.4: Input image (a) of Boston and the TPS compressed results (b), (c), (d), (e), and (f) withrespectively 10, 50, 100, 500, and 1000 control points. Image taken from [1].

(a) (b) (c)

Figure 6.5: (a) Input image. (b), (c) TPS compression output with 10, 100 control points respectively.Image taken from [2].

of control points. The discrepency in the pattern for smaller numbers of control points is most likelycaused by overhead produced by the program. The implementation of more features to the systemscould provide better ways to express the details of the image, but considering those new ways to adddetail to the image could considerable slow down the system we have created that is already too slowfor practical usage.

The accuracy of the basic system we have created is not great, but the system does show somebehaviors in the accuracy. Though it is not surprising, the system is very accurate at recreatingsmooth color gradients. This can be best seen in figure 6.5 and in the sky of image of boston in figure6.4. Unlike with the speed of the program, the accuracy of the TPS compression method is difficult toproject. Varying number of control points the size of the image seems to have little effect. Increasingthe number of control points used seems to generally improve the accuracy of the compression as canbe seen in table 6.3, but it may begin to hit asymptotic limits as can be seen in table 7.5. In bothcases, there are considerable dimishining returns on increasing the number of control points, but thatis to be expected.

40

Control points 10 50 100 500 10001 1.234 2.291 3.566 14.088 39.2412 0.83 1.944 3.283 13.789 39.0593 0.81 1.903 3.307 13.667 38.7614 0.846 1.965 3.256 13.677 39.3055 0.865 1.935 3.187 13.834 39.041

Average 0.917 2.0076 3.3198 13.811 39.0814

Table 6.2: Time in seconds to TPS compress the Boston image shown in figure 6.4.

error 10 50 100 500 10001 5146.8 4657.7 2951.3 1951.3 1437.102 6100 3990 3530 1950 15003 9130 4640 3690 1950 14704 6830 4460 3590 1770 13905 7260 5150 3490 1710 1470

Average 6891.82 4579.86 3450.54 1867.56 1452.92

Table 6.3: Average error from TPS compression of the Boston image shown in figure 6.4.

It is difficult to say if thin plate splines would be an effective tool for improving vector imagesystems. They provide a level of accuracy and precision not found in mesh gradients and diffusioncurves, but may be too computationaly intensive to have any practical advantages. To know for sure amuch more complex automated TPS image creation system would have to be created which implementsconstraint curves. Creating a more advanced TPS system would take considerable effort and mightcreate new problems for the system that may not make up for the added functionality.

41

Chapter 7

Appendix

7.1 Tables

7.1.1 TPS data

Image Cat Boston Gradient BeijingHeight 592 620 1,920 4,226Width 304 385 1,200 2,783Pixels 179,968 238,700 2,304,000 11,760,958Time 2.5858 3.3198 28.1676 145.4858Error 2,043.88 3450.54 23.3717 3360.34

Table 7.1: Comparison of proportions of used images, and time in seconds and average error fromrunning TPS compression with 100 control points. Time and Error averaged over five iterations of eachcombination. Images from [4],[1],[2],[25].

Image Cat Boston Gradient Beijing1 2.947 3.566 28.423 143.8432 2.518 3.283 28.243 145.1793 2.495 3.307 28.138 145.5824 2.432 3.256 28.106 146.5735 2.537 3.187 27.928 146.252

Average 2.5858 3.3198 28.1676 145.4858

Table 7.2: Comparison of time in seconds to TPS compress different images with 100 control points.

42

Image Cat Boston Gradient Beijing1 1,966.30 2,951.30 27.30 3,352.402 2,017.30 3,532.10 20.89 3,120.003 2,121.90 3,688.60 26.63 3,088.704 2,043.60 3,594.20 24.21 4,055.705 2,070.30 3,486.50 17.83 3,184.90

Average 2,043.88 3450.54 23.3717 3360.34

Table 7.3: Comparison of average error from TPS compressing images with 100 control points.

Control points 10 50 100 500 10001 5.725 15.757 28.423 122.958 255.4882 5.434 15.396 28.243 127.945 283.8813 5.294 15.597 28.138 124.896 274.8824 5.452 15.581 28.106 120.605 277.8225 5.263 15.664 27.928 123.735 277.55

Average 5.4336 15.599 28.1676 124.0278 273.9246

Table 7.4: Time in seconds to TPS compress the gradient image from [2].

Control points 10 50 100 500 10001 950.9766 45.3143 27.2996 2.4181 2.042 1,683.3 69.2392 20.8886 2.6686 1.99963 1,074.0 84.7913 26.625 2.3008 1.71524 398.3286 33.665 24.2111 2.6183 1.9135 916.5857 38.1979 17.8342 2.4558 1.891

Average 1004.63818 54.24154 23.3717 2.49232 1.91176

Table 7.5: Average error from TPS compression of the gradient image from [2].

43

7.2 MATLAB Code

7.2.1 Code to compress images into the same size:

1 % Res ize a l l o f the p i c t u r e s in to same s i z e

3 s r c F i l e s = d i r ( ’R:\Academic\MQP\ ImageProcess ing \ ’ ) ;f o r i = 1 : l ength ( s r c F i l e s )

5 f i l ename = s t r c a t ( ’R:\Academic\MQP\ ImageProcess ing \ ’ , s r c F i l e s ( i ) . name) ;I = imread ( f i l ename ) ;

7 Inew = i m r e s i z e ( I , [ 128 128 ] , n ea r e s t ) ;f i l ename = s p r i n t f ( ’R:\Academic\MQP\Compressed image%d ’ , i ) ;

9 imwrite ( Inew , ’ f i l ename ’ , ’ jpg ’ ) ;end

7.2.2 Integral Image code:

f unc t i on I I = in t eg ra l image ( image )2

[ l , w] = s i z e ( image ) ;4

f o r i = 1 : l6 f o r j = 1 :w

inimage ( i , j ) = sum(sum( image ( 1 : l , 1 :w) ) ) ;8 end

end10

I I = inimage ;12 end

7.2.3 Haar Features

s r c F i l e s = d i r ( ’R:\ academic\ v i o l a j o n e s \ v i o l a j o n e s \2\∗ . jpg ’ ) ;2 %d i r = ’R:\ academic\ v i o l a j o n e s \2 ’ ;

%mkdir ( d i r ) ;4

HF = HaarFeatures ( image , max) ;6

[ l1 , w1 , r1 , c1 ] = s i z e (A) ;8 [ l2 , w2 , r2 , c2 ] = s i z e (B) ;

[ l3 , w3 , r3 , c3 ] = s i z e (C) ;10 [ l4 , w4 , r4 , c4 ] = s i z e (D) ;

12 %counter s are the t imes that the program ge t s the c o r r e c t answercounterA = ze ro s ( s i z e (A) ) ;

14 counterB = ze ro s ( s i z e (B) ) ;counterC = ze ro s ( s i z e (C) ) ;

16 counterD = ze ro s ( s i z e (D) ) ;

44

18 numbers = length ( s r c F i l e s ) ;

20 W e a k c l a s s i f i e r ;

22 f o r i = 1 : l ength ( s r c F i l e s )f i l ename = s t r c a t ( ’R:\ academic\ v i o l a j o n e s \ o r i g i n a l p i c t u r e s \ ’ , s r c F i l e s ( i ) . name) ;

24 I = imread ( f i l ename ) ;I I = in t eg ra l image ( I ) ;

26

f o r i = 1 : l 128 f o r j = 1 :w1

f o r k = 1 : r130 f o r l = 1 : c1

d i f f e r e n c e = I I ( i+k , j+l )−2∗ I I (k , l ) ;32 i f d i f f e r e n c e >= 0 ;

counterA ( i , j , k , l ) = counterA ( i , j , k , l ) + 1 ;34 end

end36 end

end38 end

40

f o r i = 1 : l 242 f o r j = 1 :w2

f o r k = 1 : r244 f o r l = 1 : c2


counterB ( i , j , k , l ) = counterB ( i , j , k , l ) + 1 ;48 end

end50 end

end52 end

54

f o r i = 1 : l 356 f o r j = 1 :w3

f o r k = 1 : r358 f o r l = 1 : c3


counterC ( i , j , k , l ) = counterC ( i , j , k , l ) + 1 ;62 end

end64 end

end66 end

68

f o r i = 1 : l 470 f o r j = 1 :w4

f o r k = 1 : r472 f o r l = 1 : c4

d i f f e r e n c e = I I ( i+k , j+l )−2∗ I I (k , l ) ;

45

74 i f d i f f e r e n c e >= 0 ;counterD ( i , j , k , l ) = counterD ( i , j , k , l ) + 1 ;

76 endend

78 endend

80 end

82

84 end

86

f o r i = 1 : l 188 f o r j = 1 :w1

f o r k = 1 : r190 f o r l = 1 : c1

i f counterA ( i , j , k , l ) >= num/2 ;92 W e a k c l a s s i f i e r = [ W e a k c l a s s i f i e r ; [ i , j , k , l ] ] ;

end94 end

end96 end

end98

totalnum = length ( W e a k c l a s s i f i e r ) ;

7.2.4 Gaussian Blur code:

f unc t i on im = image blur ( image , std )2

RGB = imread ( image ) ;4

R = RGB( : , : , 1 ) ;6 G = RGB( : , : , 2 ) ;

B = RGB( : , : , 3 ) ;8

s = s i z e (R) ;10

R new = blur (R, std ) ;12 G new = blur (G, std ) ;

B new = blur (B, std ) ;14

Output ( : , : , 1 ) = R new ;16 Output ( : , : , 2 ) = G new ;

Output ( : , : , 3 ) = B new ;18

subplot ( 1 , 2 , 1 ) ;20 imshow (RGB) ;

subplot ( 1 , 2 , 2 ) ;22 imshow ( Output ) ;

end24

46

%The Gaussian Function26 f unc t i on [ new ] = blur ( old , s td )

28 % have problems with ke rne l in Gaussian Blur .x = ze ro s (3 , 3 ) ;

30 y = ze ro s (3 , 3 ) ;%x ( 1 , : ) = ones (1 , 3 ) ;

32 %x ( 3 , : ) = ones (1 , 3 ) ;%y ( : , 1 ) = ones (3 , 1 ) ;

34 %y ( : , 3 ) = ones (3 , 1 ) ;

36 %x = [ 0 . 0 7 8 , 0 . 1 2 3 , 0 . 0 7 8 ; 0 . 1 2 3 , 0 . 1 9 5 , 0 . 1 2 3 ; 0 . 0 7 8 , 0 . 1 2 3 , 0 . 0 7 8 ] ;%[ x y ] = meshgrid ( 0 : 2 , 0 : 2 ) ;

38 %x = [ 1 , 2 , 4 , 2 , 1 ; 2 , 4 , 8 , 4 , 2 ; 4 , 8 , 1 6 , 8 , 4 ; 2 , 4 , 8 , 4 , 2 ; 1 , 2 , 4 , 2 , 1 ] ;x = [ 1 , 2 , 3 , 2 , 1 ; 2 , 3 , 4 , 3 , 2 ; 3 , 4 , 5 , 4 , 3 ; 2 , 3 , 4 , 3 , 2 ; 1 , 2 , 3 , 2 , 1 ] ;

40 y = x ;matr ix exp = ze ro s (5 ) ;

42 matrix exp = exp(−(x.ˆ2+y . ˆ 2 ) /(2∗ std ˆ2) ) ;[ a b ] = s i z e ( o ld ) ;

44 Gau kernel = (1/ s q r t (2∗ pi ∗ std ∗ std ) ) ∗matrix exp ;Tem = double ( o ld ) ;

46 Output = double ( o ld ) ;M = ze ro s (5 ) ;

48 Gau kernel = (1/sum(sum( Gau kernel ) ) ) ∗Gau kernel ;

50 Ext = ze ro s ( a+4,b+4) ;

52 f o r m = 1 : af o r n = 1 : b

54 Ext (m+2,n+2) = Tem(m, n) ;end

56 end

58 f o r i = 1 :2Ext ( i , 3 : b+2) = Tem( 1 , : ) ;

60 Ext ( a+5−i , 3 : b+2) = Tem( a , : ) ;end

62

f o r j = 1 :264 Ext ( : , j ) = Ext ( : , 3 ) ;

Ext ( : , b+5− j ) = Ext ( : , b+2) ;66 end

68

Ext (1 , 1 ) = Tem(1 ,1 ) ;70 Ext ( a+4,b+4) = Tem( a , b) ;

72 f o r i = 3 : ( a+2)f o r j = 3 : ( b+2)

74 M = Ext ( i −2: i +2, j −2: j +2) .∗ Gau kernel ;Output ( i −2, j−2) = sum(sum(M) ) ;

76 endend

78

new = uint8 ( Output ) ;80

47

end

7.2.5 Face Detection by Viola-Jones Algorithm code:

c l e a r a l l2 c l c

4 FaceDetect ion = v i s i o n . CascadeObjectDetector ;

6 I = imread ( ’ f 3 . jpg ’ ) ;BB = step ( FaceDetection , I ) ;

8

f i g u r e ,10 imshow ( I ) ; hold on

f o r i = 1 : s i z e (BB, 1 )12 r e c t a n g l e ( ’ Po s i t i on ’ , BB( i , : ) , ’ LineWidth ’ , 5 , ’ L ineSty l e ’ , ’− ’ , ’ EdgeColor ’ , ’ r ’ ) ;

end14

t i t l e ( ’ f 3 ’ ) ;16 hold o f f

7.2.6 Edge detection code:

f unc t i on im = image edge de t e c t i on ( image , std )2

RGB = imread ( image ) ;4

R = RGB( : , : , 1 ) ;6 G = RGB( : , : , 2 ) ;

B = RGB( : , : , 3 ) ;8

s = s i z e (R) ;10

%R new = blur (R, std ) ;12 %G new = blur (G, std ) ;

%B new = blur (B, std ) ;14

R new = e d g e d e t e c t i o n (R, std ) ;16 G new = e d g e d e t e c t i o n (G, std ) ;

B new = e d g e d e t e c t i o n (B, std ) ;18

Output ( : , : , 1 ) = R new ;20 Output ( : , : , 2 ) = G new ;

Output ( : , : , 3 ) = B new ;22

subplot ( 1 , 2 , 1 ) ;24 imshow (RGB) ;

subplot ( 1 , 2 , 2 ) ;26 imshow ( Output ) ;

48

end

7.2.7 Gradient Images

1 W = imread ( ’ Zebra . jpg ’ ) ,I = rgb2gray (W) ;

3 [Gx, Gy] = imgradientxy ( I , ’ p r ew i t t ’ ) ;f i g u r e

5 imshowpair (Gx, Gy, ’ montage ’ ) ;%imshow (Gx)

7.2.8 Hough Transform matlab code:

%my Hough Transform algor i thm2

RGB=imread ( ’ t e s t P i c . png ’ ) ;4 BW = im2bw(RGB, . 5 ) ;

6 %c r e a t e output matrix[maxX,maxY]= s i z e (BW) ;

8 maxR = round ( s q r t (maxXˆ2+maxYˆ2) . / 2 ) ;%bounds have round down e r r o rR= −maxR:maxR;

10 T= −89:90;H= ze ro s ( s i z e (R, 2 ) , s i z e (T, 2 ) ) ;

12

%i t e r a t e over matrix14 f o r i = 1 : s i z e (R, 2 )

f o r j = 1 : s i z e (T, 2 )16 out = f a l s e ; %i s the l i n e not i n s i d e the image?

%count 1 ’ s on l i n e18 %f i n d length o f t e s t l i n e

Len=0; %length o f l i n e in box20 pInte r = ze ro s (2 ) ; %po in t s that the l i n e i n t e r s e c t the box

i f T( j )==0 %l i n e i s v e r t i c l e22 Len=maxY;

pInte r =[maxX./2+R( i ) , 0 ; maxX./2+R( i ) ,maxY ] ;24 i f R( i ) > maxX./2 %out o f bounds check

Len=0;26 end

e l s e i f T( j )==90 %l i n e i s h o r i z o n t a l28 Len=maxY;

pInte r =[0 , maxY./2+R( i ) ; maxX, maxY./2+R( i ) ] ;30 i f R( i ) > maxY./2 %out o f bounds check

Len=0;32 end

e l s e34 m = tand (T( j )−90) ;

x0=(R( i ) .∗ cosd (T( j ) ) )+(maxX. / 2 ) ;36 y0=(R( i ) .∗ s ind (T( j ) ) )+(maxY. / 2 ) ;

49

b = y0 − m.∗ x0 ;38

%f i n d 4 i n t e r s e c t i o n s o f l i n e and box40 %x=0,x=maxX, y=0,y=maxY

i n t e r =[0 ,b ; maxX, b+m.∗maxX; −b . /m, 0 ; (maxY−b) . /m,maxY ] ;42 v a l i d = ones (4 , 1 ) ;%check which i n t e r s e c t i o n s are in the box

f o r k = 1 :444 i f i n t e r (k , 1 )<0 | | i n t e r (k , 1 )>maxX | | i n t e r (k , 2 )<0 | | i n t e r (k , 2 )>maxY

v a l i d ( k ) =0;46 end

end48 %take only po in t s in box

i f max( v a l i d )==150 point1=true ;

f o r k = 1 :452 i f v a l i d ( k )==1 && point1

pInte r ( 1 , : )=i n t e r (k , : ) ;54 point1=f a l s e ;

e l s e i f v a l i d ( k )==156 pInte r ( 2 , : )=i n t e r (k , : ) ;

end58 end

end60 Len = s q r t ( ( p Inte r (1 , 1 )−pInte r (2 , 1 ) ) ˆ2 + ( pInte r (1 , 2 )−pInte r (2 , 2 ) ) ˆ2 ) ;

62 end

64 %c r e a t e s e r i e s o f t e s t po in t s on BW

66 checks = ze ro s ( round ( Len ) +1 ,2) ; %array o f po in t s to checkcheckcount = s i z e ( checks , 1 ) ;

68 f o r k= 1 : checkcountchecks (k , : )= [ ( k .∗ pInte r (1 , 1 ) + ( checkcount−k ) .∗ pInte r (2 , 1 ) ) . / checkcount

, . . .70 ( k .∗ pInte r (1 , 2 ) + ( checkcount−k ) .∗ pInte r (2 , 2 ) ) . / checkcount

] ;end

72 checks = round ( checks ) ; %round to nea r e s t p i x e l%sum t e s t po in t s on BW

74 f o r k=1: checkcount%check i f po int i s in

76 i f ˜( checks (k , 1 )<=0 | | checks (k , 1 )>maxX | | checks (k , 2 )<=0 | | checks (k , 2 )>maxY)

i f BW( checks (k , 1 ) , checks (k , 2 ) )78 H( i , j )=H( i , j ) +1;

end80 end

end82 end

end84

86 imshow ( imadjust ( mat2gray (H) ) , ’XData ’ ,T, ’YData ’ ,R , . . .’ I n i t i a l M a g n i f i c a t i o n ’ , ’ f i t ’ ) ;

88 x l a b e l ( ’ \ theta ’ ) , y l a b e l ( ’ \ rho ’ ) ;a x i s on , a x i s normal , hold on ;

50

90 colormap ( gca , hot ) ;

7.2.9 Canny edge detection code:

RGB=imread ( ’ c i t yTes t . jpg ’ ) ;2 Grey=rgb2gray (RGB) ;

%BW = edge ( Grey , ’ canny ’ ) ;4

6

%Gauss b lur the image8 B = i m g a u s s f i l t ( Grey , 1 . 4 ) ;

10

12 %use Sobel operator to f i n d the d i r e c t i o n a l g rad i en t%c r e a t e convo lut ion k e r n e l s

14 ConvX = [−1 0 1 ; −2 0 2 ; −1 0 1 ] ;ConvY = [−1 −2 −1; 0 0 0 ; 1 2 1 ] ;

16 %c r e a t e magnitude and d i r e c t i o n matr i ce sGx = conv2 (B, ConvX , ’ v a l i d ’ ) ;

18 Gy = conv2 (B, ConvY , ’ v a l i d ’ ) ;G = (Gx.ˆ2 + Gy. ˆ 2 ) . ˆ 0 . 5 ;

20 Gangle = atan2 (Gy,Gx) ;

22

%apply non−maximum suppre s s i on24 G2 = G;

%s i m p l i f y Gangle26 Gangle2 = Gangle ;

f o r i = 1 : numel ( Gangle )28 i f Gangle ( i ) < 0

Gangle2 ( i ) = Gangle ( i ) + pi ;30 end

Gangle2 ( i ) = round ( Gangle2 ( i ) . ∗ 4 . / p i ) ;32 end

f o r i = 2 : ( s i z e ( Gangle , 1 )−1)34 f o r j = 2 : ( s i z e ( Gangle , 2 )−1)

%opperate on 4 p o s s i b l e ang l e s36 i f Gangle2 ( i , j ) == 0 | | Gangle2 ( i , j ) == 4 %grad i ent −

i f G( i , j ) < G( i , j−1) | | G( i , j ) < G( i , j +1)38 G2( i , j ) = 0 ;

end40 end

i f Gangle2 ( i , j ) == 1 %grad i en t /42 i f G( i , j ) < G( i −1, j−1) | | G( i , j ) < G( i +1, j +1)

G2( i , j ) = 0 ;44 end

end46 i f Gangle2 ( i , j ) == 2 %grad i en t |

i f G( i , j ) < G( i −1, j ) | | G( i , j ) < G( i +1, j )48 G2( i , j ) = 0 ;

end

51

50 endi f Gangle2 ( i , j ) == 3 %grad i en t \

52 i f G( i , j ) < G( i −1, j +1) | | G( i , j ) < G( i +1, j−1)G2( i , j ) = 0 ;

54 endend

56 endend

58 G2 = G2. / ( max(max(G2) ) ) ; %make va lue s <1G = G. / ( max(max(G) ) ) ;

60

62 %applay double th r e sho ldHighHold = . 7 ;

64 LowHold = . 3 ;%remove bad weak edges with blob a n a l y s i s

66 Bmap = ze ro s ( s i z e (G2) ) ; %c r e a t e blob mapBn = 1 ; %counter f o r the blob number

68 B l i s t = ze ro s (2 , 1 ) ; % c r e a t e empty blob l i s tf o r i = 1 : s i z e (G2, 1 )

70 f o r j = 1 : s i z e (G2, 2 )i f i==1 | | i==s i z e (G2, 1 ) | | j==1 | | j==s i z e (G2, 2 )

72 G2( i , j ) = 0 ; %zero edgese l s e i f G2( i , j ) <LowHold

74 G2( i , j ) = 0 ; %remove very weak edgese l s e i f G2( i , j ) >HighHold

76 Bmap( i , j ) = −1;e l s e %a s s i g n blob number

78 as s i gned = f a l s e ;f o r i i = 1 :3

80 f o r j j = 1 :3%look f o r ad jacent blob

82 i f Bmap( i+i i −2, j+j j −2)>0 && . . .(˜ a s s i gned | | Bmap( i+i i −2, j+j j −2)<Bmap( i , j ) )

84 Bmap( i , j )=Bmap( i+i i −2, j+j j −2) ;a s s i gned = true ;

86 endend

88 endi f ˜( a s s i gned )

90 Bmap( i , j ) = Bn ;Bn = Bn +1;

92 endend

94 endend

96

98 %add to adjacent blob l i s tf o r i = 1 : s i z e (G2, 1 )

100 f o r j = 1 : s i z e (G2, 2 )f o r i i = 1 :3

102 f o r j j = 1 :3i f Bmap( i , j )>0 && (Bmap( i+i i −2, j+j j −2)>0 | | Bmap( i+i i −2, j+j j −2)==−1 )&&

. . .104 Bmap( i+i i −2, j+j j −2)˜=Bmap( i , j ) && . . .

52

˜ l o g i c a l (max( min ( ismember ( B l i s t , [Bmap( i , j ) ;Bmap( i+i i −2, j+j j −2) ] ) ) ) )106 % i f Bmap( i+i i −2, j+j j −2)==−1

% B l i s t = [ B l i s t [Bmap( i+i i −2, j+j j −2) ;Bmap( i , j ) ] ] ;108 % e l s e

B l i s t = [ B l i s t [Bmap( i , j ) ;Bmap( i+i i −2, j+j j −2) ] ] ;110 % end

end112 end

end114 end

end116

118 %c o n s o l i d a t e Blob l i s tf o r i = 1 : s i z e ( B l i s t , 2 )

120 f o r j = 1 : ( s i z e ( B l i s t , 2 )− i )i f B l i s t (1 , i ) == B l i s t (2 , i+j )

122 B l i s t (2 , i+j ) = B l i s t (2 , i ) ;end

124

end126

end128

130 %merge blobsf o r i = 1 : s i z e (G2, 1 )

132 f o r j = 1 : s i z e (G2, 2 )s e t = f a l s e ;

134 i f Bmap( i , j )>0f o r k = 1 : s i z e ( B l i s t , 2 )

136 i f ˜ s e t && B l i s t (1 , s i z e ( B l i s t , 2 ) −k +1)==Bmap( i , j )Bmap( i , j ) = B l i s t (2 , s i z e ( B l i s t , 2 ) −k +1) ;

138 s e t = true ;end

140 endend

142 endend

144

%take only s t rong blobs146 G3 = ze ro s ( s i z e (G2) ) ;

f o r i = 1 : s i z e (G2, 1 )148 f o r j = 1 : s i z e (G2, 2 )

i f Bmap( i , j ) == −1150 G3( i , j ) = 1 ;

end152 end

end154

imshow (G3) ;

53

Bibliography

[1] Boston wallpaper. http://www.wallpapers13.com/boston-desktop-wallpaper-free-download/.[Online; accessed 23 - March - 2017].

[2] Gradient wallpaper. http://www.lifewallpapers.net/data/out/75/

4170069-color-gradient-wallpaper-hd.png. [Online; accessed 23 - March - 2017].

[3] Penguins. www.adra.com/penguins-2, 2013, Oct 10th. [Online; accessed 23 - February - 2017].

[4] Mom cat and her son. http://imgur.com/gallery/5wqdk, 2017. [Online; accessed 23 - March -2017].

[5] P Alshwar. Face recognition using multiple face eigen subspaces. International Journal of Com-puter Applications, 1(2), 2010.

[6] T Bah. Advanced gradients for svg. http://www.svgopen.org/2011/papers/18-Advanced_

Gradients_for_SVG/, 2011.

[7] Apollo Brick. http://www.apollobrick.com/nfx-non-face-extra. [Online; accessed 23 - Febru-ary - 2017].

[8] J. Canny. A computational approach to edge detection. http://www.csee.wvu.edu/~xinl/

library/papers/comp/canny1986.pdf, 1986.

[9] RPI ECSE department. Face databases from other research groups. https://www.ecse.rpi.

edu/~cvrl/database/other_Face_Databases.htm. [Online; accessed 23 - February - 2017].

[10] M. Finch, H. Hoppe, and J. Snyder. Freeform vector graphics with controlled thin-plate splines.In ACM Transactions on Graphics (Proceedings of the 2011 SIGGRAPH Asia Conference), vol-ume 30, 2011.

[11] C. Fink. Z100’s jingle ball 2011 presented by aeropostale - press room. http://www.zimbio.com/photos/Cubbie+Fink/Z100+Jingle+Ball+2011+Presented+Aeropostale/MKmM7tVKKxx. [On-line; accessed 23 - February - 2017].

[12] R.C. Gonzalez and R.E. Woods. Digital Image Processing. Pearson; Third edition, Boston, MA,2008.

[13] B. Green. Canny edge detection tutorial. https://web.archive.org/web/20160324173252/

http://dasl.mem.drexel.edu/alumni/bGreen/www.pages.drexel.edu/_weg22/can_tut.

html, 2002.

54

[14] S. M. Hu, Y. K. Lai, and R. R. Martin. Automatic and topology-preserving gradient mesh genera-tion for image vectorization. In ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH2009), volume 28, 2009.

[15] Imgur. Dog. http://imgur.com/y3YB, 2012 Oct 23th. [Online; accessed 23 - February - 2017].

[16] R. Jain, R. Kasturi, and B. G. Schunck. Machine Vision, chapter 5. McGraw-Hill, 1995.

[17] We know your dreams. http://weknowyourdreams.com/face.html. [Online; accessed 23 - Febru-ary - 2017].

[18] D. Q. Lee. Numerically efficient methods for solving least squares problems. http://math.

uchicago.edu/~may/REU2012/REUPapers/Lee.pdf, 2012.

[19] J.P. Lewis. Fast template matching. Industrial Light and Magic, Vision Interface, 1995.

[20] B. Liao, T. Xia, and Y. Yu. Patch-based image vectorization with automatic curvilinear featurealignment. In ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia2009, volume 28, 2009.

[21] MIT. Face database. http://web.mit.edu/emeyers/www/face_databases.html. [Online; ac-cessed 23 - February - 2017].

[22] J. M. F. Moura. What is signal processing? IEEE Signal Processing Magazine, 26:6–6, 2009.

[23] Victor Powell on Setosa. Image convolution kernel. https://www.flickr.com/photos/

mytripsmypics/5181510940, 2010 Aug 21th. [Online; accessed 23 - February - 2017].

[24] A. Orzan, A. Bousseau, H. Winnemoller, P. Barla, J. Thollot, and D. Salesin. Diffusion curves: Avector representation for smooth-shaded images. In ACM Transactions on Graphics (Proceedingsof SIGGRAPH 2008), volume 27, 2008.

[25] J. F. Peron. Beijing wallpaper. http://kingofwallpapers.com/city-pictures/img-001.php?

pic=/city-pictures/city-pictures-001.jpg, 2011. [Online; accessed 23 - March - 2017].

[26] C.P. Papageorgiou; M. Oren; T. Poggio. A general framework for object detection. A generalframework for object detection - sixth International Conference on Computer Vision, 2002.

[27] L. Prasad and A Skourikhine. Raster to vector conversion of images for efficient svg representation.http://www.svgopen.org/2005/papers/Prasad_Abstract_R2V4SVG/.

[28] L. G. Shapiro and G. C. Stockman. Computer Vision, chapter 3. 2002.

[29] J. Shatah, P.D. Monsour, R. Goldsmith, and W. Klump. Probability Theory. American Mathe-matical Society, New York City, NY, 2001.

[30] I. Sobel. History and definition of the so-called ”sobel operator”, more appropriately namedthe sobel-feldman operator. https://www.researchgate.net/publication/239398674_An_

Isotropic_3_3_Image_Gradient_Operator, 2014.

[31] L. Vandevenne. Lode’s computer graphics tutorial, image filtering. http://lodev.org/cgtutor/filtering.html, 2004. [Online; accessed 27 - March - 2017].

55

[32] Open CV Open Source Computer Version. Face detection using haar cascades. http://docs.

opencv.org/trunk/d7/d8b/tutorial_py_face_detection.html, 2015. [Online; accessed 23 -February - 2017].

[33] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. http://wearables.cc.gatech.edu/paper_of_week/viola01rapid.pdf, 2001.

[34] L.M. Wang, J.B. Shi, G. Song, and I.F. Shen. Object detection combining recognition and segmen-tation. https://www.cis.upenn.edu/~jshi/papers/obj_det_liming_accv07.pdf, 2013. On-line; accessed 23 - February - 2017.

[35] Y.Q. Wang. An analysis of the viola-jones face detection algorithm. Image Processing On Line,4, 2014.

[36] LQ Xiong. Alert aggregation algorithm based on genetic clustering algorithm. Journal of ComputerApplications, 28(4), 2008.

[37] Z.H. Zhou. Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC, UnitedKindom, 2012. [Book; accessed 23 - February - 2017].

56

Documents

Image Processing - Worcester Polytechnic Institute › ... › unrestricted › MQP_ImageProcessing.pdfProject Number: MQP-SDO-208 Image Processing A Major Qualifying Project Report