123
We have been discussing some simple ideas from statistical learning theory. PR NPTEL course – p.1/123

We have been discussing some simple ideas from statistical … · 2017. 8. 4. · •We have been discussing some simple ideas from statistical learning theory. • The risk minimization

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • • We have been discussing some simple ideas fromstatistical learning theory.

    PR NPTEL course – p.1/123

  • • We have been discussing some simple ideas fromstatistical learning theory.

    • The risk minimization framework that we discussedgives us a better perspective on understanding theunifying theme in different learning algorithms.

    PR NPTEL course – p.2/123

  • • We have been discussing some simple ideas fromstatistical learning theory.

    • The risk minimization framework that we discussedgives us a better perspective on understanding theunifying theme in different learning algorithms.

    • We will now go back to studying pattern classificationalgorithms.

    PR NPTEL course – p.3/123

  • • We have been discussing some simple ideas fromstatistical learning theory.

    • The risk minimization framework that we discussedgives us a better perspective on understanding theunifying theme in different learning algorithms.

    • We will now go back to studying pattern classificationalgorithms.

    • We will first briefly review algorithms for learninglinear classifiers and then start looking at methods tolearn nonlinear classifiers.

    PR NPTEL course – p.4/123

  • Linear Models

    • In the two class case, the linear classifier is given by

    h(X) = sign(W TX + w0)

    PR NPTEL course – p.5/123

  • Linear Models

    • In the two class case, the linear classifier is given by

    h(X) = sign(W TX + w0)

    • We have seen that we can also think of h(X) as

    h(X) = sign(W TΦ(X) + w0),

    where Φ(X) = [φ1(X), · · · , φm(X)]T

    as long as φi are fixed (possibly non-linear) functions.

    PR NPTEL course – p.6/123

  • • We discussed many algorithms for learning W .

    PR NPTEL course – p.7/123

  • • We discussed many algorithms for learning W .• The Perceptron algorithm is a simple error-correcting

    method that is guarenteed to find a separatinghyperplane if one exists.

    PR NPTEL course – p.8/123

  • • We discussed many algorithms for learning W .• The Perceptron algorithm is a simple error-correcting

    method that is guarenteed to find a separatinghyperplane if one exists.

    • The perceptron convergence theorem shows thatgiven any training set of linearly separable patterns,the algorithm will find a separating hyperplane.

    PR NPTEL course – p.9/123

  • • We discussed many algorithms for learning W .• The Perceptron algorithm is a simple error-correcting

    method that is guarenteed to find a separatinghyperplane if one exists.

    • The perceptron convergence theorem shows thatgiven any training set of linearly separable patterns,the algorithm will find a separating hyperplane.

    • Our discussion on statistical learning theory gives usan idea of how many iid examples we should havebefore we can be confident that the hyperplane thatseparates the examples will also do well on test data.

    PR NPTEL course – p.10/123

  • • We have also seen the least-squares method wherewe find W to minimize

    J(W ) =1

    n

    i

    (W TXi − yi)2

    where, for simplicity of notation, we have assumedaugumented feature vectors.

    PR NPTEL course – p.11/123

  • • We have also seen the least-squares method wherewe find W to minimize

    J(W ) =1

    n

    i

    (W TXi − yi)2

    where, for simplicity of notation, we have assumedaugumented feature vectors.

    • In our risk minimization framework, H is parametrizedby W , we take h(X) = W TX and minimize empiricalrisk under squared-error loss function.

    PR NPTEL course – p.12/123

  • • We have seen how to obtain the least-squaressolution:

    W ∗ = (ATA)−1ATY

    where rows of matrix A are feature vectors andcomponents of Y are yi.

    PR NPTEL course – p.13/123

  • • We have seen how to obtain the least-squaressolution:

    W ∗ = (ATA)−1ATY

    where rows of matrix A are feature vectors andcomponents of Y are yi.

    • The least-squares method can also be used to learnlinear regression models.

    PR NPTEL course – p.14/123

  • • We have seen how to obtain the least-squaressolution:

    W ∗ = (ATA)−1ATY

    where rows of matrix A are feature vectors andcomponents of Y are yi.

    • The least-squares method can also be used to learnlinear regression models.

    • The only difference is that in a regression model, theyi are real-valued.

    PR NPTEL course – p.15/123

  • • We have seen that we can also minimize the empiricalrisk J(W ) using gradient descent.

    PR NPTEL course – p.16/123

  • • We have seen that we can also minimize the empiricalrisk J(W ) using gradient descent.

    • We can also run this gradient descent in anincremental fashion by considering one example at atime.

    PR NPTEL course – p.17/123

  • • We have seen that we can also minimize the empiricalrisk J(W ) using gradient descent.

    • We can also run this gradient descent in anincremental fashion by considering one example at atime.

    • That gives us another classical algorithm called theLMS algorithm.

    PR NPTEL course – p.18/123

  • • We have also seen that we can use the least squaresidea to learn a model g(W TX) by redefining J as

    J(W ) =1

    n

    i

    (g(W TXi) − yi)2

    PR NPTEL course – p.19/123

  • • We have also seen that we can use the least squaresidea to learn a model g(W TX) by redefining J as

    J(W ) =1

    n

    i

    (g(W TXi) − yi)2

    • An important example is the logistic regression wherewe take g as the sigmoid function.

    PR NPTEL course – p.20/123

  • • We have also seen that we can use the least squaresidea to learn a model g(W TX) by redefining J as

    J(W ) =1

    n

    i

    (g(W TXi) − yi)2

    • An important example is the logistic regression wherewe take g as the sigmoid function.

    • We minimize J by incremental version of gradientdescent.

    PR NPTEL course – p.21/123

  • • Another important method for learning linearclassifiers is the Fisher Linear Discriminant.

    PR NPTEL course – p.22/123

  • • Another important method for learning linearclassifiers is the Fisher Linear Discriminant.

    • Here, we look for a direction W such that the patternsof the two classes get ‘well-separated’ when projectedonto this one-dimensional subspace.

    PR NPTEL course – p.23/123

  • • Another important method for learning linearclassifiers is the Fisher Linear Discriminant.

    • Here, we look for a direction W such that the patternsof the two classes get ‘well-separated’ when projectedonto this one-dimensional subspace.

    • As we mentioned, Fisher Linear Discriminant can bethought of as a special case of least-squares methodof learning a linear regression model with specialtarget values.

    PR NPTEL course – p.24/123

  • Beyond linear models

    • Learning linear models (classifiers) is generallyefficient.

    PR NPTEL course – p.25/123

  • Beyond linear models

    • Learning linear models (classifiers) is generallyefficient.

    • However, linear models are not always sufficient.

    PR NPTEL course – p.26/123

  • Beyond linear models

    • Learning linear models (classifiers) is generallyefficient.

    • However, linear models are not always sufficient.• Best linear functions may still be a poor fit.

    PR NPTEL course – p.27/123

  • Beyond linear models

    • Learning linear models (classifiers) is generallyefficient.

    • However, linear models are not always sufficient.• Best linear functions may still be a poor fit.• We have looked at three broad approaches to

    learning nonlinear classifiers.

    PR NPTEL course – p.28/123

  • Beyond linear models

    • Learning linear models (classifiers) is generallyefficient.

    • However, linear models are not always sufficient.• Best linear functions may still be a poor fit.• We have looked at three broad approaches to

    learning nonlinear classifiers.• We now discuss neural network models.

    PR NPTEL course – p.29/123

  • Neural network models

    • We need a ‘good’ parameterized class of nonlinearfunctions to learn nonlinear classifiers.

    PR NPTEL course – p.30/123

  • Neural network models

    • We need a ‘good’ parameterized class of nonlinearfunctions to learn nonlinear classifiers.

    • Artificial neural networks are one such class

    PR NPTEL course – p.31/123

  • Neural network models

    • We need a ‘good’ parameterized class of nonlinearfunctions to learn nonlinear classifiers.

    • Artificial neural networks are one such class• Nonlinear functions are built up through composition

    of summation and sigmoids.

    PR NPTEL course – p.32/123

  • Neural network models

    • We need a ‘good’ parameterized class of nonlinearfunctions to learn nonlinear classifiers.

    • Artificial neural networks are one such class• Nonlinear functions are built up through composition

    of summation and sigmoids.• Useful for both classification and Regression.

    PR NPTEL course – p.33/123

  • • In this course we will study only multilayer feedforwardnetworks.

    PR NPTEL course – p.34/123

  • • In this course we will study only multilayer feedforwardnetworks.

    • They are useful because they offer goodparameterized class of nonlinear functions and thereare some efficient algorithms to learn them.

    PR NPTEL course – p.35/123

  • • In this course we will study only multilayer feedforwardnetworks.

    • They are useful because they offer goodparameterized class of nonlinear functions and thereare some efficient algorithms to learn them.

    • However, historically, development of (artificial) neuralnetwork models was motivated by some ideas on thestructure of human brain.

    PR NPTEL course – p.36/123

  • • In this course we will study only multilayer feedforwardnetworks.

    • They are useful because they offer goodparameterized class of nonlinear functions and thereare some efficient algorithms to learn them.

    • However, historically, development of (artificial) neuralnetwork models was motivated by some ideas on thestructure of human brain.

    • We briefly look at this perspective of neural networksas an approach to engineering intelligent systems.

    PR NPTEL course – p.37/123

  • What is an Artificial Neural Network?

    PR NPTEL course – p.38/123

  • What is an Artificial Neural Network?

    "A parallel distributed information processor made up of

    simple processing units that has a propensity for acquiring

    problem solving knowledge through experience"

    PR NPTEL course – p.39/123

  • What is an Artificial Neural Network?

    "A parallel distributed information processor made up ofsimple processing units that has a propensity foracquiring problem solving knowledge through experience"

    • Large number of inter connected units

    PR NPTEL course – p.40/123

  • What is an Artificial Neural Network?

    "A parallel distributed information processor made up ofsimple processing units that has a propensity foracquiring problem solving knowledge through experience"

    • Large number of inter connected units• Each unit implements simple function, nonlinear

    PR NPTEL course – p.41/123

  • What is an Artificial Neural Network?

    "A parallel distributed information processor made up ofsimple processing units that has a propensity foracquiring problem solving knowledge through experience"

    • Large number of inter connected units• Each unit implements simple function, nonlinear• The ‘knowledge’ resides in the interconnection

    strengths.

    PR NPTEL course – p.42/123

  • What is an Artificial Neural Network?

    "A parallel distributed information processor made up ofsimple processing units that has a propensity foracquiring problem solving knowledge through experience"

    • Large number of inter connected units• Each unit implements simple function, nonlinear• The ‘knowledge’ resides in the interconnection

    strengths.• Problem solving ability is often through ‘learning’

    PR NPTEL course – p.43/123

  • What is an Artificial Neural Network?

    "A parallel distributed information processor made up ofsimple processing units that has a propensity foracquiring problem solving knowledge through experience"

    • Large number of inter connected units• Each unit implements simple function, nonlinear• The ‘knowledge’ resides in the interconnection

    strengths.• Problem solving ability is often through ‘learning’

    An architecture inspired by the structure of Brain

    PR NPTEL course – p.44/123

  • The Human Brain

    • Neuron - the basic computing unit

    PR NPTEL course – p.45/123

  • The Human Brain

    • Neuron - the basic computing unit• Brain is a highly organized structure of networks of

    interconnected neurons

    PR NPTEL course – p.46/123

  • The Human Brain

    • Neuron - the basic computing unit• Brain is a highly organized structure of networks of

    interconnected neurons• In the Brain

    No of neurons ∽ 1011 (100 billion)

    PR NPTEL course – p.47/123

  • The Human Brain

    • Neuron - the basic computing unit• Brain is a highly organized structure of networks of

    interconnected neurons• In the Brain

    No of neurons ∽ 1011 (100 billion)Average synapses per neuron ∽ 10000

    (1000-100000)

    PR NPTEL course – p.48/123

  • The Human Brain

    • Neuron - the basic computing unit• Brain is a highly organized structure of networks of

    interconnected neurons• In the Brain

    No of neurons ∽ 1011 (100 billion)Average synapses per neuron ∽ 10000

    (1000-100000)Total synapses ∽ 1015

    PR NPTEL course – p.49/123

  • The Human Brain

    • Neuron - the basic computing unit• Brain is a highly organized structure of networks of

    interconnected neurons• In the Brain

    No of neurons ∽ 1011 (100 billion)Average synapses per neuron ∽ 10000

    (1000-100000)Total synapses ∽ 1015

    Neuron time constants ∽ Milliseconds

    PR NPTEL course – p.50/123

  • The Human Brain

    • Neuron - the basic computing unit• Brain is a highly organized structure of networks of

    interconnected neurons• In the Brain

    No of neurons ∽ 1011 (100 billion)Average synapses per neuron ∽ 10000

    (1000-100000)Total synapses ∽ 1015

    Neuron time constants ∽ MillisecondsSingle neuron can send 100 spikes per second

    PR NPTEL course – p.51/123

  • A rough estimate of processing power:

    One arithmetic operation per synapse→ 104 operations per neuron per spike→ 106 operations per neuron per sec→ 1017 operations per sec!!(A gigaflop is 109 operations, teraflop is 1012 operations!)

    PR NPTEL course – p.52/123

  • A rough estimate of processing power:

    One arithmetic operation per synapse→ 104 operations per neuron per spike→ 106 operations per neuron per sec→ 1017 operations per sec!!(A gigaflop is 109 operations, teraflop is 1012 operations!)

    Massive parallelism can deliver massive computingpower,

    PR NPTEL course – p.53/123

  • A rough estimate of processing power:

    One arithmetic operation per synapse→ 104 operations per neuron per spike→ 106 operations per neuron per sec→ 1017 operations per sec!!(A gigaflop is 109 operations, teraflop is 1012 operations!)

    Massive parallelism can deliver massive computingpower,

    if we know how to manage it

    PR NPTEL course – p.54/123

  • Digital computers:• Precise design, highly constrained, not very adaptive

    or fault tolerant, Centralized control, deterministic,basic switching times ∽ 10−9 sec

    PR NPTEL course – p.55/123

  • Digital computers:• Precise design, highly constrained, not very adaptive

    or fault tolerant, Centralized control, deterministic,basic switching times ∽ 10−9 sec

    Natural neural networks:• massively parallel, highly adaptive and fault tolerant,

    self configuring, self repairing, noisy, stochastic, basicswitching time ∽ 10−3 sec

    PR NPTEL course – p.56/123

  • Digital computers:• Precise design, highly constrained, not very adaptive

    or fault tolerant, Centralized control, deterministic,basic switching times ∽ 10−9 sec

    Natural neural networks:• massively parallel, highly adaptive and fault tolerant,

    self configuring, self repairing, noisy, stochastic, basicswitching time ∽ 10−3 sec

    • Most capabilities of Brain are LEARNT.

    PR NPTEL course – p.57/123

  • Artificial Intelligence (AI)

    • ‘Understanding’ intelligence in computational terms.

    PR NPTEL course – p.58/123

  • Artificial Intelligence (AI)

    • ‘Understanding’ intelligence in computational terms.• Developing ‘machines’ that are ‘intelligent’.

    PR NPTEL course – p.59/123

  • Artificial Intelligence (AI)

    • ‘Understanding’ intelligence in computational terms.• Developing ‘machines’ that are ‘intelligent’.

    At least two distinct approaches

    PR NPTEL course – p.60/123

  • Artificial Intelligence (AI)

    • ‘Understanding’ intelligence in computational terms.• Developing ‘machines’ that are ‘intelligent’.

    At least two distinct approaches

    • Try to model intelligent behavior in terms ofprocessing structured symbols. (Resulting methods,algorithms etc may not resemble brain at theimplementation level)

    PR NPTEL course – p.61/123

  • Artificial Intelligence (AI)

    • ‘Understanding’ intelligence in computational terms.• Developing ‘machines’ that are ‘intelligent’.

    At least two distinct approaches

    • Try to model intelligent behavior in terms ofprocessing structured symbols. (Resulting methods,algorithms etc may not resemble brain at theimplementation level)

    • A second approach is based on mimicking humanbrain at architectural/implementation level

    PR NPTEL course – p.62/123

  • The symbolic AI approach

    • Brain is to be understood in computational terms only

    PR NPTEL course – p.63/123

  • The symbolic AI approach

    • Brain is to be understood in computational terms only• Physical symbol system hypothesis

    PR NPTEL course – p.64/123

  • The symbolic AI approach

    • Brain is to be understood in computational terms only• Physical symbol system hypothesis• A digital computer is a universal symbol manipulator

    and can be programmed to be intelligent

    PR NPTEL course – p.65/123

  • The symbolic AI approach

    • Brain is to be understood in computational terms only• Physical symbol system hypothesis• A digital computer is a universal symbol manipulator

    and can be programmed to be intelligent• Many useful engineering applications e.g. Expert

    systems

    PR NPTEL course – p.66/123

  • The symbolic AI approach

    • Brain is to be understood in computational terms only• Physical symbol system hypothesis• A digital computer is a universal symbol manipulator

    and can be programmed to be intelligent• Many useful engineering applications e.g. Expert

    systems

    An implicit faith: The architecture of Brain per se is irrele-

    vant for engineering intelligent artifacts

    PR NPTEL course – p.67/123

  • Artificial Neural Networks

    • Can be viewed as one approach towardsunderstanding brain/building intelligent machines

    PR NPTEL course – p.68/123

  • Artificial Neural Networks

    • Can be viewed as one approach towardsunderstanding brain/building intelligent machines

    • Computational architectures inspired by brain

    PR NPTEL course – p.69/123

  • Artificial Neural Networks

    • Can be viewed as one approach towardsunderstanding brain/building intelligent machines

    • Computational architectures inspired by brainComputational methods for ‘learning’dependencies in data stream

    PR NPTEL course – p.70/123

  • Artificial Neural Networks

    • Can be viewed as one approach towardsunderstanding brain/building intelligent machines

    • Computational architectures inspired by brainComputational methods for ‘learning’dependencies in data streame.g. Pattern Recognition, System identification

    PR NPTEL course – p.71/123

  • Artificial Neural Networks

    • Can be viewed as one approach towardsunderstanding brain/building intelligent machines

    • Computational architectures inspired by brainComputational methods for ‘learning’dependencies in data streame.g. Pattern Recognition, System identification

    • Characteristics: Emergent properties, learning, selfadaptation

    PR NPTEL course – p.72/123

  • Artificial Neural Networks

    • Can be viewed as one approach towardsunderstanding brain/building intelligent machines

    • Computational architectures inspired by brainComputational methods for ‘learning’dependencies in data streame.g. Pattern Recognition, System identification

    • Characteristics: Emergent properties, learning, selfadaptation

    • Modeling Biology?Mathematically purified neurons!!

    PR NPTEL course – p.73/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.

    PR NPTEL course – p.74/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.• A large network of interconnected units

    PR NPTEL course – p.75/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.• A large network of interconnected units• Each unit has simple input-output mapping

    PR NPTEL course – p.76/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.• A large network of interconnected units• Each unit has simple input-output mapping• Each interconnection has numerical weight attached

    to it

    PR NPTEL course – p.77/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.• A large network of interconnected units• Each unit has simple input-output mapping• Each interconnection has numerical weight attached

    to it• Output of unit depends on outputs and connection

    weights of units connected to it

    PR NPTEL course – p.78/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.• A large network of interconnected units• Each unit has simple input-output mapping• Each interconnection has numerical weight attached

    to it• Output of unit depends on outputs and connection

    weights of units connected to it• ‘Knowledge’ resides in the weights

    PR NPTEL course – p.79/123

  • Artificial Neural Networks

    Computing machines that try to mimic brain architecture.• A large network of interconnected units• Each unit has simple input-output mapping• Each interconnection has numerical weight attached

    to it• Output of unit depends on outputs and connection

    weights of units connected to it• ‘Knowledge’ resides in the weights• Problem solving ability is often through learning

    PR NPTEL course – p.80/123

  • Single neuron model

    PR NPTEL course – p.81/123

  • Single neuron model

    • xi are inputs into the (artificial) neuron and wi are thecorresponding weights. y is the output of the neuron

    PR NPTEL course – p.82/123

  • Single neuron model

    • xi are inputs into the (artificial) neuron and wi are thecorresponding weights. y is the output of the neuron

    • Net input : η =∑

    jwjxj

    • output: y = f(η), where f(.) is called activationfunction

    PR NPTEL course – p.83/123

  • Single neuron model

    • xi are inputs into the (artificial) neuron and wi are thecorresponding weights. y is the output of the neuron

    • Net input : η =∑

    jwjxj

    • output: y = f(η), where f(.) is called activationfunction(Perceptron, AdaLinE are such models).

    PR NPTEL course – p.84/123

  • Networks of neurons

    • We can connect a number of such units or neurons toform a network. Inputs to a neuron can be outputs ofother neurons (and/or external inputs).

    PR NPTEL course – p.85/123

  • Networks of neurons

    • We can connect a number of such units or neurons toform a network. Inputs to a neuron can be outputs ofother neurons (and/or external inputs).

    PR NPTEL course – p.86/123

  • Networks of neurons

    • We can connect a number of such units or neurons toform a network. Inputs to a neuron can be outputs ofother neurons (and/or external inputs).

    • Notation:yj – output of jth neuron;wij – weight of connection from neuron i to neuron j.

    PR NPTEL course – p.87/123

  • • Each neuron computes weighted sum of inputs andpasses it through its activation function, to computeoutput

    PR NPTEL course – p.88/123

  • • Each neuron computes weighted sum of inputs andpasses it through its activation function, to computeoutput

    • For example, output of neuron 5 is

    y5 = f5 (w35 y3 + w45 y4)

    PR NPTEL course – p.89/123

  • • Each neuron computes weighted sum of inputs andpasses it through its activation function, to computeoutput

    • For example, output of neuron 5 is

    y5 = f5 (w35 y3 + w45 y4)

    = f5 (w35 f3(w13y1 + w23y2) + w45 f4(w14y1 + w24y2))

    PR NPTEL course – p.90/123

  • • Each neuron computes weighted sum of inputs andpasses it through its activation function, to computeoutput

    • For example, output of neuron 5 is

    y5 = f5 (w35 y3 + w45 y4)

    = f5 (w35 f3(w13y1 + w23y2) + w45 f4(w14y1 + w24y2))

    • By convention, we take y1 = x1 and y2 = x2.

    PR NPTEL course – p.91/123

  • • Each neuron computes weighted sum of inputs andpasses it through its activation function, to computeoutput

    • For example, output of neuron 5 is

    y5 = f5 (w35 y3 + w45 y4)

    = f5 (w35 f3(w13y1 + w23y2) + w45 f4(w14y1 + w24y2))

    • By convention, we take y1 = x1 and y2 = x2.• Here, x1, x2 are inputs and y5, y6 are outputs.

    PR NPTEL course – p.92/123

  • • A single neuron ‘represents’ a class of functions fromℜm to ℜ.

    PR NPTEL course – p.93/123

  • • A single neuron ‘represents’ a class of functions fromℜm to ℜ.

    • Specific set of weights realise specific functions.

    PR NPTEL course – p.94/123

  • • A single neuron ‘represents’ a class of functions fromℜm to ℜ.

    • Specific set of weights realise specific functions.• By interconnecting many units/neurons, networks can

    represent more complicated functions from ℜm to ℜm′

    .

    PR NPTEL course – p.95/123

  • • A single neuron ‘represents’ a class of functions fromℜm to ℜ.

    • Specific set of weights realise specific functions.• By interconnecting many units/neurons, networks can

    represent more complicated functions from ℜm to ℜm′

    .• The architecture constrains the function class that can

    be represented. Weights define specific function inthe class.

    PR NPTEL course – p.96/123

  • • A single neuron ‘represents’ a class of functions fromℜm to ℜ.

    • Specific set of weights realise specific functions.• By interconnecting many units/neurons, networks can

    represent more complicated functions from ℜm to ℜm′

    .• The architecture constrains the function class that can

    be represented. Weights define specific function inthe class.

    • To form meaningful networks, nonlinearity ofactivation function is important.

    PR NPTEL course – p.97/123

  • Typical activation functions

    1. Hard limiter:

    f(x) = 1 if x > τ= 0 otherwise

    PR NPTEL course – p.98/123

  • Typical activation functions

    1. Hard limiter:

    f(x) = 1 if x > τ= 0 otherwise

    • We can keep the τ to be zero and add one more inputline to the neuron. An example of a single neuron withthis activation function is Perceptron.

    PR NPTEL course – p.99/123

  • Activation functions (cont).......

    2. Sigmoid function:

    f(x) =a

    1 + exp (−bx), a, b > 0

    PR NPTEL course – p.100/123

  • Activation functions (Contd.)

    3. tanh

    f(x) = atanh(bx), a, b > 0

    PR NPTEL course – p.101/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    PR NPTEL course – p.102/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    • Models can implement highly nonlinear functions.They are adaptive and can be trained.

    PR NPTEL course – p.103/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    • Models can implement highly nonlinear functions.They are adaptive and can be trained.

    Useful in many applications

    PR NPTEL course – p.104/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    • Models can implement highly nonlinear functions.They are adaptive and can be trained.

    Useful in many applicationsTime series prediction

    PR NPTEL course – p.105/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    • Models can implement highly nonlinear functions.They are adaptive and can be trained.

    Useful in many applicationsTime series predictionsystem identification and control

    PR NPTEL course – p.106/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    • Models can implement highly nonlinear functions.They are adaptive and can be trained.

    Useful in many applicationsTime series predictionsystem identification and controlpattern recognition and Regression

    PR NPTEL course – p.107/123

  • Why study such models?

    • A belief that the architecture of Brain is critical tointelligent behavior.

    • Models can implement highly nonlinear functions.They are adaptive and can be trained.

    Useful in many applicationsTime series predictionsystem identification and controlpattern recognition and Regression

    • Model can help us understand Brain functionComputational neuroscience

    PR NPTEL course – p.108/123

  • Many different models are possible

    PR NPTEL course – p.109/123

  • Many different models are possible

    • Evolution:• Discrete time / continuous time• synchronous / asynchronous• deterministic / stochastic

    PR NPTEL course – p.110/123

  • Many different models are possible

    • Evolution:• Discrete time / continuous time• synchronous / asynchronous• deterministic / stochastic

    • Interconnections:• Feedforward / having feedback

    PR NPTEL course – p.111/123

  • Many different models are possible

    • Evolution:• Discrete time / continuous time• synchronous / asynchronous• deterministic / stochastic

    • Interconnections:• Feedforward / having feedback

    • States or outputs of units:• binary / finitely many / continuous

    PR NPTEL course – p.112/123

  • Recurrent networks

    • The network we saw earlier has no feedback.

    PR NPTEL course – p.113/123

  • Recurrent networks

    • The network we saw earlier has no feedback.• Here is an example of a network with feedback

    PR NPTEL course – p.114/123

  • Recurrent networks

    • The network we saw earlier has no feedback.• Here is an example of a network with feedback

    • Can model a dynamical system:

    y(k) = f(y(k − 1), x1(k), x2(k))

    PR NPTEL course – p.115/123

  • • We will consider only feedforward networks whichprovide a general class of nonlinear functions.

    PR NPTEL course – p.116/123

  • • We will consider only feedforward networks whichprovide a general class of nonlinear functions.

    • These can always be organized as a layered network.

    PR NPTEL course – p.117/123

  • • We will consider only feedforward networks whichprovide a general class of nonlinear functions.

    • These can always be organized as a layered network.

    • This network represents a class of functions from ℜ2

    to ℜ2.PR NPTEL course – p.118/123

  • • Each unit can also have a ‘bias’ input.

    PR NPTEL course – p.119/123

  • • Each unit can also have a ‘bias’ input.• This is shown for a single unit below.

    PR NPTEL course – p.120/123

  • • Each unit can also have a ‘bias’ input.• This is shown for a single unit below.

    • One can always think of bias as an extra input

    y = f

    (

    d∑

    i=1

    wixi + w0

    )

    PR NPTEL course – p.121/123

  • • Each unit can also have a ‘bias’ input.• This is shown for a single unit below.

    • One can always think of bias as an extra input

    y = f

    (

    d∑

    i=1

    wixi + w0

    )

    = f

    (

    d∑

    i=0

    wixi

    )

    , x0 = +1

    PR NPTEL course – p.122/123

  • Multilayer feedforward networks

    • Here is a general multilayer feedforward network.

    PR NPTEL course – p.123/123

    Linear ModelsLinear ModelsBeyond linear modelsBeyond linear modelsBeyond linear modelsBeyond linear modelsBeyond linear modelsNeural network modelsNeural network modelsNeural network modelsNeural network modelsWhat is an Artificial Neural Network?What is an Artificial Neural Network?What is an Artificial Neural Network?What is an Artificial Neural Network?What is an Artificial Neural Network?What is an Artificial Neural Network?What is an Artificial Neural Network?The Human BrainThe Human BrainThe Human BrainThe Human BrainThe Human BrainThe Human BrainThe Human Brain A rough estimate of processing power: A rough estimate of processing power: A rough estimate of processing power: Artificial Intelligence (AI)Artificial Intelligence (AI)Artificial Intelligence (AI)Artificial Intelligence (AI)Artificial Intelligence (AI) The symbolic AI approach The symbolic AI approach The symbolic AI approach The symbolic AI approach The symbolic AI approach Artificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksArtificial Neural NetworksSingle neuron modelSingle neuron modelSingle neuron modelSingle neuron modelNetworks of neuronsNetworks of neuronsNetworks of neuronsTypical activation functionsTypical activation functionsActivation functions (cont).......Activation functions (Contd.) Why study such models? Why study such models? Why study such models? Why study such models? Why study such models? Why study such models? Why study such models?Many different models are possibleMany different models are possibleMany different models are possibleMany different models are possibleRecurrent networksRecurrent networksRecurrent networksMultilayer feedforward networks