9
Training and Testing Multi-class Logistic Classifier Pi19404 March 31, 2014

Multi Class Logistic Regression Training and Testing

Embed Size (px)

DESCRIPTION

The article contains details about Theano based Python source code for Training and Testing Multi Class Logistic Regression accompanied by C/C++ Rigen based code developed as part of OpenVision Repository www.github.com/pi19404/OpenVision

Citation preview

  • Training andTesting Multi-classLogistic Classifier

    Pi19404

    March 31, 2014

  • Contents

    Contents

    Training and Testing Multi-class Logistic Classier 3

    0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.1.1 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40.1.2 Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    0.2 Theano Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2 | 9

  • Training and Testing Multi-class Logistic Classifier

    Training and Testing Multi-classLogistic Classifier

    0.1 IntroductionIn this article we will look at training and testing of a Multi-classLogistic Classifier

    Logistic regression is a probabilistic, linear classifier. It is parametrizedby a weight matrix W and a bias vector b. Classification is doneby projecting data points onto a set of hyperplanes, the distanceto which is used to determine a class membership probability.

    Mathematically this can be expressed as

    P (Y = ijx;W; b) =

    e

    W

    i

    x+b

    i

    P

    j

    e

    W

    j

    x+b

    j

    Corresponding to each class yi

    logistic classifier is paramemter-ized by a set of parameters Wi

    ; b

    i

    .

    These parameters are used to compute the class probability.

    Given a unknown vector x,The prediction is performed as

    y

    pred

    = argmax

    i

    P (Y = ijx;W; b)

    y

    pred

    = argmax

    i

    e

    W

    i

    x+b

    i

    P

    j

    e

    W

    j

    x+b

    j

    Given a set of labelled training data Xi

    ; Y

    i

    where i in1; : : : ; N weneed to estimate these parameters.

    3 | 9

  • Training and Testing Multi-class Logistic Classifier

    0.1.1 Loss Function

    Ideally we would like to compute the parameters so that the 01loss is minimized

    `

    0;1

    =

    jDj

    X

    i=0

    I

    f(x

    (i)

    )6=y

    (i)

    f(x) = argmax

    k

    P (Y = y

    k

    jx; )

    P (Y = y

    k

    jx; ) is modelled using logistic function.

    The 0 1 loss function is not differentiable ,hence optimizing itfor large modes is computationally infesible.

    Instead we maximize the log-likelyhood of the classifier giventhe training data D.

    Maximum Likelyhood estimation is used to perform this opera-tion.

    Estimate the parameters so that likelyhood of training data Dis maximized under the model parameters

    It is assumed that the data samples are independent ,so the prob-ability of the set is the product of probabilities of individualexamples.

    L( = W; b;D) = argmax

    N

    Y

    i=1

    P (Y = y

    i

    jX = x

    i

    ;W; b)

    L(;D) = argmax

    N

    X

    i=1

    logP (Y = y

    i

    jX = x

    i

    ;W; b)

    L(;D) = argmin

    N

    X

    i=1

    logP (Y = y

    i

    jX = x

    i

    ;W; b)

    It should be noted that Likelyhood of correct class is not sameas number of right predictions.

    Log Likelyhood function can be considered as differential versionof the 0 1 loss function.

    In the present application negative log-likelyhood is used as theloss function

    Optimal parameters are learned by minimizing the loss function.

    4 | 9

  • Training and Testing Multi-class Logistic Classifier

    In the present application gradient based methods are used forminimization.

    Specifically stochastic gradient descent and conjugated gradientdescent are used for minimization of the loss function.

    The cost function is expressed as

    L(;D) =

    N

    X

    i=1

    logP (Y = y

    i

    jX = x

    i

    ;W; b)

    L(;D) =

    N

    X

    i=1

    log

    e

    W

    i

    x+b

    i

    P

    j

    e

    W

    j

    x+b

    j

    L(;D) =

    N

    X

    i=1

    loge

    W

    i

    x+b

    i

    log

    X

    j

    e

    W

    j

    x+b

    j

    L(;D) =

    N

    X

    i=1

    W

    i

    x+ b

    i

    + log

    1

    P

    j

    e

    W

    j

    x+b

    j

    The first part of the sum is affine,second is a log of sum ofexponentials which is convex Thus the loss function is convex.

    Thus we can compute the parameters corresponding to globalmaxima of the loss function using gradient descent methods.

    Thus we compute the derivatives of the loss function L(;D)with respect to ,@`=@W and @`=@b

    0.1.2 Theano

    Theano is a Python library that allows you to define, optimize,and evaluate mathematical expressions involving multi-dimensionalarrays efficiently.

    It is a expression compiler,and can evaluate symbolic expressionwhen executed.Typically programs which are implemented in C/C++can be written concisely and efficiently in Theano.

    computing the gradients in most programming languages (C/C++,Matlab, Python), involves manually deriving the expressions forthe gradient of the loss with respect to the parameters @`=@W ,and @`=@b,

    This approah not only involves manual coding but the derivativescan get difficult to compute for complex models,

    5 | 9

  • Training and Testing Multi-class Logistic Classifier

    With Theano, this work is greatly simplified as it performs au-tomatic differentiation .

    0.1.3 Example

    For demonstration,we will use MNIST dataset The MNIST datasetconsists of handwritten digit images and it is divided in 60,000examples for the training set and 10,000 examples for testing.The official training set of 60,000 is divided into an actualtraining set of 50,000 examples and 10,000 validation exam-ples All digit images have been size-normalized and centered in afixed size image of 28 x 28 pixels. In the original dataset eachpixel of the image is represented by a value between 0 and 255,where 0 is black, 255 is white and anything in between is adifferent shade of grey.

    The dataset can be found at http://deeplearning.net/data/mnist/mnist.pkl.gz.

    The data set is pickled can be loaded using python pickle package.

    The data set consists of training,validation and test set.

    The data set consists of feature vector of length 28x28 = 784and number of classes are 10.

    A class called Logistic Regression is defined which encapsulatesthe methods that are used to perform training and testing ofmulti-class Logistic Regression classifier.

    6 | 9

  • Training and Testing Multi-class Logistic Classifier

    0.2 Theano Code

    The python code for training and testing can be found in the gitrepository https://github.com/pi19404/OpenVision ImgML/LogisticRegression.py file.

    the ImgML/load_datasets.py contains methods to load datasetsfrom pickel files or SVM format files.

    """ symbolic expressions defining input and output vectors"""

    x=T.matrix('x');

    y=T.ivector('y');

    """ The mnist dataset in pickel format"""

    model_name1="/media/LENOVO_/repo/mnist.pkl.gz"

    """ creating object of class Logistic regression"""

    """ input is 28*28 dimension feature vector ,and

    output lables are digits from 0-9 """

    classifier = LogisticRegression(x,y,28*28,10);

    """ loading the datasets"""

    [train,test,validate]=load_datasets.load_pickle_data(model_name1);

    """ setting the dataset"""

    classifier.set_datasets(train,test,validate);

    #

    #classifier.init_classifier(model_name1);n_out

    """ Training the classifiers"""

    classifier.train_classifier(0.13,1000,30);

    """ Saving the model """

    classifier.save('1');

    #x=classifier.train[0].get_value(borrow=True)[0];

    #classifier.predict(x);

    """ Loading the model"""

    classifier.load('1')

    x=train[0].get_value(borrow=True);

    y=train[1].eval();

    print 'True class:'+`y`

    xx,yy=classifier.predict(x);

    print 'Predicted class:' + `yy`

    classifier.testing();

    7 | 9

  • Training and Testing Multi-class Logistic Classifier

    C/C++ code has been also written using Eigen/OpenCV andincorporated in OpenVision Library. This can be found in the fileshttps://github.com/pi19404/OpenVision ImgML/LogisticRegression.

    cpp and ImgML/LogisticRegression.hpp files.

    8 | 9

  • Bibliography

    Bibliography[1] James Bergstra et al. Theano: a CPU and GPU Math Expression Compiler.

    In: Proceedings of the Python for Scientic Computing Conference (SciPy). Oral

    Presentation. Austin, TX, June 2010.

    9 | 9

    Training and Testing Multi-class Logistic Classifier IntroductionLoss FunctionTheanoExample

    Theano CodeReferences