Week1 Tutorial

Embed Size (px)

Citation preview

  • 8/18/2019 Week1 Tutorial

    1/7

    ECON 2101: Math Refresher (Tutorial W 1)Keiichi Kawai

    The goal of this note is refresh your memory on constrained optimiza-

    tion problem, i.e., maxx, y u (x, y) s.t.  g (x, y)   =  0. You studied the

    Lagrangian method in the first year. We start with the intuition as to

    why the Lagrangian method works. This may look like an unneces-sary torture, but it helps you nurture “economic intuition,” which you

    will need to master in Microeconomics  2. After reviewing why the

    Lagrangian method works, we look at a “cookbook” procedure in the

    end and apply it to a few examples. Therefore, this note is never meant

    to be comprehensive, and sacrifices “rigor” for some parts. In any case,

    if this note doesn’t refresh your memory, go back to the textbook/note

    you used in 1st year.

    Introduction

    Almost all math problems in Economics boil down to constrained

    optimization problem, i.e.,

    maxx, y

     f  ( x, y)

    s.t.  g (x, y) =  0

    For example, in case of “utility maximization,”   f (x, y) =   u(x, y)

    and g (x, y) =  px x + p y y − I .1   1 If you are interested in minimizingsome γ(x, y), then you can covertproblem into maximization problem bysetting   f (x, y) = −γ(x, y)Review of Single-Variable Optimization and First-Order Condi-

    tion (FOC)

    Let’s review how to deal with maximization problems when no “con-

    straint” exists. That is, your goal is to find the maximizer x∗ such

    that   f (x∗)  ≥   f (x) for all x  in domain. The set of maximizers is often

    denoted as arg max f (x).   2   2 Note that there can be multiple maxi-mizers.Suppose that  x∗ ∈   argmax f (x), and   f (x) is differentiable, you

    know that   f  (x∗)   =  0. This is the so-called first-order (necessary)

    condition of maximizers.3   3 By the way, can you formally statewhat a function is? What is a utility"function," profit "function" etc?

    To recall where we got this, you need to understand what a deriva-

    tive   f  (x), or  d f (x)

    dx   is.4 If you remember, the formal definition is   4 Note that the derivative of a (differen-

    tiable) function is also a function

     f 

    (x) =  limh→0

     f  (x + h) −  f  (x)

    h   .

    For an arbitrary h   =   0,  f (x+h)− f (x)

    h   measures the slope of lines

    that goes through (x, f  ( x))  and  (x + h, f  (x + h)). If you are not sure,

    draw some graphs on your own to check.

  • 8/18/2019 Week1 Tutorial

    2/7

    econ2101 :  math refresher   (tutorial w1)   2

    Now that we have refreshed our memory on the definition of 

    derivative, let’s see why (conditional on   f (x) being a differentiable

    function) we have

    x∗ ∈ argmax f (x) ⇒   f  (x∗) =  0

    Recall that x  such that   f 

    (x∗

    ) =  0 is called a  critical point of   f . So, theprevious statement can be rewritten as follows:

    x∗ ∈ argmax f (x) ⇒  xis a critical point of   f (x)

    Take an arbitrary  x̃ in the domain of   f (x).5   5 Do you remember the definition of thedomain of a function?If the “slope”   f  (x) is positive at  x̃, then it means you can increase

    the value of the function by increasing  x  slightly, i.e., for some  ∆x  >

    0,   f (x̃ + ∆x)  >   f (x̃). We can thus conclude that  x̃ is not a maximizer,

    i.e.,  x̃ ∈ argmax f (x).

    Similarly, if the “slope”   f  (x) is negative, then it means you can

    increase the value of function by decreasing  x  a little bit from  x̃.

    Combining those two observations, we have the conclusion that if function   f (x) is maximized at  x∗, then   f  (x∗) =  0.   6   6 Notice, this is NOT  equivalent to

    saying that “If   f  (x∗) =  0, then functionis maximized at  x∗ .”Indeed, if   f (x) =0, then so is − f (x). Thus, if thisstatement is true, the maximizer has to

     be the minimizer too, i.e., function hasto be constant.

    Again, there may be many points such that   f  (x∗)   =  0. To pin

    down which one of those critical points are maximizers, you have to

    rely on other tools, e.g., second-order condition, comparing the value

    of objective functions at critical points. But the first-order condition

    (foc) drastically simplifies your search for the maximizers (and for

    most of the problems you see in this course, foc gives you the “solu-

    tion” you need.)

    Example  1   If f  ( x)   =   −x (x − 2), then, f  (x)   =   −2 (x − 1). So

    argmaxx   f  ( x) =  {1} .   7   7 When arg max f (x) is singleton, i.e.,argmax  f (x) =   y for some y, it isconventional to write arg max f (x) =  yeven though it is a slight abuse of notation.

    Example  2   If f  ( x) =  ln x− px, then f  (x) =  1− px

    x   . So arg maxx   f  (x) =

    {1/ p}.

    Even when the maximization problem involves two or more vari-

    ables, the logic is the same. If   f (x, y) is differentiable, and  (x∗, y∗)  ∈

    argmaxx, y f (x, y) then ∂ f (x∗, y∗)/∂x =  0 and ∂ f (x∗, y∗)/∂ y =  0  8   8 If you are not sure what partial deriva-tive means, go back to the textbook andreview the topic.

    Constrained Optimization

    Now, let’s get onto the main topic. Suppose you are asked to maxi-

    mize the function   f  ( x, y) by choosing  x  and  y. But you cannot freelychoose x  and  y. You have to choose  x  and  y  so that the constraint

     g (x, y) =  0 is satisfied. This problem is often written as

    maxx, y

     f  ( x, y)

    s.t.  g (x, y) =  0.

    ©Keiichi KAWAI (ver 2016.1)

  • 8/18/2019 Week1 Tutorial

    3/7

    econ2101 :  math refresher   (tutorial w1)   3

    How do you solve this type of question? Remember, many eco-

    nomic problems fall into this category. (Again, utility maximization

    problem is a canonical example.) The issue here is that you cannot

    choose x  and  y  freely. If you choose certain x , you have to choose y

    so that g(x, y) =  0. In other words, the way you can choose  y  is a

    function γ(x) of  x .Sometimes, finding this function γ(x) such that y   =   γ(x) is

    straightforward, or more formally,  g(x, y) =   0 defines y  as an ex-

     plicit function of  x . But sometimes not, or more formally,  g(x, y) =  0

    defines y  as an  implicit function of  x .

    Let’s start with a simple case, where we can explicitly define  y  as

    a function of  x  from  g(x, y) =  0. For example suppose  g (x, y)   =

    x − γ ( y) =  0. That is, if you choose  x , then you have to choose  y  so

    that y  =  γ (x).9   9 Indeed, for most of the problems thatyou see in Economics, you actually cando this.

    Then, this problem becomes an unconstrained single-variable

    optimization problem:

    maxx

    φ (x) =  maxx

     f  ( x,γ (x)) .

    Therefore, if  x∗, y∗ is the solution to the original problem,  y∗ =

    γ(x∗), and φ (x∗) =  0.

    So the biggest challenge now is finding out  φ (x). Recall that

    φ (x) × ∆x measures the overall change in the value of   f  when you

    change x  by  ∆x.10   10 Recall that l im∆x→0φ(x+∆x)−φ(x)

    ∆x   =φ (x)Notice that a change in  x  affects   f  (x, γ (x))  through two channels:

    φ (x) =  ∂ f  ( x, y)

    ∂x

       (i)+ ∂ f  ( x,γ (x))

    ∂ y

       (ii)× γ (x)

       (iii).

    The first channel is the direct one. If you change x  by  ∆x, then

    it has the “direct” effect on the value of the objective function by∂ f (x, y)

    ∂x   ∆x, as captured by term (i) above. The second channel is the

    indirect one that comes through the change in  y. If you change  x  by

    ∆x, then y  changes by  ∆ y   =   γ (x) × ∆x. For such a change in  y,   f 

    changes by  ∂ f (x,γ(x))

    ∂ y   × ∆ y, as represented by (ii) and (iii).

    So the overall change in   f   is

    ∂ f  ( x, y)

    ∂x  +

     ∂ f  (x,γ (x))

    ∂ y  γ (x) .

    This is the so-called chain-rule you studied.11   11 If you do care about formality, and/orare aiming for honours program,then here is the formal statement: Let f   :   Rn →   R and let  a   :   R  → Rn

     be C1. Then, the composite function g (t) =   F (a (t)) is a C1 function fromR → R and

     g (t) = ∑  j

    ∂ f  (a (t))

    ∂x j× a j (t)

    =  D f  (a (t)) · a (t) .

    So to sum up, if  x∗, y∗ is the solution to the following problem:

    maxx, y

     f  ( x, y)

    s.t.  y =  γ (x) .

    ©Keiichi KAWAI (ver 2016.1)

  • 8/18/2019 Week1 Tutorial

    4/7

    econ2101 :  math refresher   (tutorial w1)   4

    then,∂ f  (x∗,γ (x∗))

    ∂x  +

     ∂ f  (x∗,γ (x∗))

    ∂ y  γ (x∗) =  0 (1)

    Example  3   Suppose f  ( x, y) =  ln x + ln y and γ (x) =  1 − x. Then,

    ∂ f  (x, y)

    ∂x   +

     ∂ f  ( x, γ (x))

    ∂ y   γ

    (x) =

      1

    x  +

      1

    1− x × (−1)

    Therefore, x∗ = 1/2  and y∗ = 1/2.

    Example  4   Consider a utility maximization problem, g (x, y)   =   px x +

     p y y − I. Then, γ (x) =  px x−I 

     p y, γ (x) =

      px p y

    . Therefore, the corresponding

    FOC becomes∂u (x, y)

    ∂x  +

     ∂u (x, y)

    ∂ y  ×

     px p y

    = 0.

    So what we can learn from this exercise is that  y  does not neces-

    sarily have to be an explicit function of  x  to covert “constrained op-

    timization problem” into “unconstrained one.” All we need to know

    is the change in  y  that arises from the change in  x , i.e.,  γ (x). No-

    tice that we can find  γ (x) even when the constraint only implicitly

    defines y  as a function of  x, as in  g (x,γ (x)).

    To see this, notice that

    d ( g (x, γ (x)))

    dx  = 0.

    By the definition of  g (x,γ (x)), you can only choose  x  so that

     g (x,γ (x))   =  0. This means even if you change  x, the value of 

     g (x,γ (x))  cannot change, i.e.,  d( g(x,γ(x)))

    dx   = 0.12   12 Notice that

      d( g(x,γ(x)))dx   =

      ∂( g(x,γ(x)))∂x   .

    Since  13   13 Again, we are using the chain-rulehere.∂ g (x, y)

    ∂x   + ∂ g (x, y)

    ∂ y   γ (x) =  0,

    we obtain

    γ (x) =  −∂ g (x, y) /∂x

    ∂ g (x, y) /∂ y

    conditional on ∂ g (x, y) /∂ y = 0.

    Therefore, the counterpart of  (1) becomes

    ∂ f  ( x∗, y∗)

    ∂x  +

     ∂ f  ( x∗, y∗)

    ∂ y

    −∂ g (x∗, y∗) /∂x

    ∂ g (x∗, y∗) /∂ y

     =  0.

    Or equivalently, for λ  =  −∂ f (x∗, y∗)/∂ y∗

    ∂ g(x∗, y∗)/∂ y∗ ,

    ∂ f  ( x∗, y∗)

    ∂x  + λ

    ∂ g (x∗, y∗)

    ∂x

     =  0

    ∂ f  ( x∗, y∗)

    ∂ y  + λ

    ∂ g (x∗, y∗)

    ∂ y

     =  0

     g (x∗, y∗) =  0

    ©Keiichi KAWAI (ver 2016.1)

  • 8/18/2019 Week1 Tutorial

    5/7

    econ2101 :  math refresher   (tutorial w1)   5

    Since we have three unknowns (x∗, y∗,λ) and three equations, (for

    most of cases), this system of equations have a solution.

    So to sum up, the solution  (x∗, y∗) of the following problem

    maxx, y

     f  ( x, y)

    s.t.  g (x, y) =  0.

    has to satisfy

    ∂ f  ( x∗, y∗)

    ∂x  + λ

    ∂ g (x∗, y∗)

    ∂x

     =  0

    ∂ f  ( x∗, y∗)

    ∂ y  + λ

    ∂ g (x∗, y∗)

    ∂ y

     =  0

     g (x∗, y∗) =  0

    Notice that the set of conditions here is the same as the maximiza-

    tion problem of 

    max f (x, y) + λ g(x, y).

    This is the so-called Lagrangian Theorem.14 In sum, (under some   14 Again, this is only for those who careabout formality, and/or are aiming forhonours program. But here’s the formalstatement:

    Theorem 1   Let f   :   Rn →   R and g :  Rn →  Rk  be C 1  functions. Suppose x∗

    is a local optimum of f on the set

    D =  U ∩ {x| g (x) =  0} ,

    where U   ⊂   Rn is open. Supposedim (Dg (x∗))   =  k. Then, there existsa vector λ∗ ∈ Rk  such that

    D f  (x∗) +k 

    ∑ i=1

    λ∗i Dg i (x∗) =  0.

    • The condition dim (Dg (x∗))   =   k is the (general version of) constraintqualification.

    • This condition enables us to use the(generalized version of) implicit func-tion theorem.

    mild conditions that are usually satisfied for most of economic prob-

    lems), we can convert constrained optimization problems into uncon-

    strained optimization ones.

    Example  5   Suppose you want to maximize the utility function u (x, y) =

    ln x + ln y, and you face the budget constraint x  + y  =  1. Then, f  (x, y) =

    ln x + ln y and g (x, y) =  x + y− 1. Since, the solution  (x∗, y∗) is a critical

     point of the Lagrangian L,

    L =   f (x, y) + λ g(x, y),

    x∗ and y∗ have to satisfy the following conditions:

    ∂ f  (x∗, y∗)

    ∂x  + λ

    ∂ g (x∗, y∗)

    ∂x

     =

      1

    x∗ + λ =  0

    ∂ f  (x∗, y∗)

    ∂ y  + λ

    ∂ g (x∗, y∗)

    ∂ y

     =

      1

     y∗ + λ =  0

    x∗ + y∗ = 1

    Solving, we get x∗ = y∗ = 1/2.

    This is the basic logic behind so-called Lagrangian Method. Thiscan be generalized for the case where there are more than 2 variables.

    ©Keiichi KAWAI (ver 2016.1)

  • 8/18/2019 Week1 Tutorial

    6/7

    econ2101 :  math refresher   (tutorial w1)   6

    “Cookbook” Procedure

    Now let’s summarize what we have reviewed in a form of “cook-

     book.” Suppose you are asked to solve the problem

    maxx∈

    Rn f  (x)

    s.t.  g (x) =  0.

    1. We set up a function L  : Rn ×R → R, called the Lagrangian

    L (x,λ) =   f  ( x) + λ g (x) .

    The scholar λ  is called the Lagrangian multiplier.

    2. We find the set of all critical points of  L (x,λ). That is, all points

    (x, λ) such that ∂L (xi,λ) /∂xi = 0 for all  x i, and ∂L (x,λ) =  0.

    Since x  ∈  Rn and λ  ∈  R, this results in a system of  (n + 1) equa-

    tions in the (n + 1) unknowns:

    ∂L

    ∂x j(x,λ) =  0, j =  1, · · ·  , n

    ∂L

    ∂λ (x,λ) =  0.

    3. Let  M  be the set of all solutions to these equations. We evaluate   f 

    at each point x  in this set  M . “Usually” the value of  x  that maxi-

    mizes   f  over this set are also the solution of the constrained maxi-

    mization problem we started with. In case  M  is singleton, i.e., con-

    sists of only one point, then check carefully if it is the maximizer

    or minimizer by comparing the value of the objective function atx ∈  M and  y  = x.

    Example   1: Simple Numerical Example:

    Consider the problem of maximizing and minimizing   f  (x, y)   =

    x2 − y2 subject to g (x, y) =  1 − x2 − y2 = 0.

    1. Now set up the Lagrangian:

    L (x, y,λ) =  x2 − y2 + λ

    1− x2 − y2

    .

    The critical points of  L  are the solutions (x, y,λ) ∈ R3 to

    2x − 2λx =  0

    −2 y − 2λ y =  0

    x2 + y2 = 1

    ©Keiichi KAWAI (ver 2016.1)

  • 8/18/2019 Week1 Tutorial

    7/7

    econ2101 :  math refresher   (tutorial w1)   7

    2. From the first partial derivative, 2x (1− λ)   =  0, and from the

    second partial derivative 2 y (1 + λ) = 0. If  λ  = ±1, then these can

    hold only when (x, y) =  0. So λ  =  ±1. Hence, there are only four

    possibilities:

    (x, y,λ) =

    (1,0,1)

    (−1,0,1)(0,1,−1)

    (0,−1,−1)

    3. Evaluating   f  at those points, we see   f  (1,0,1)  =   f  (−1,0,1)  =  1,

    and   f  (0,1,−1) =   f  (0,−1,−1) =  −1.

    4. Since the critical points of  L  contain global maximizers and min-

    imaizers of   f , the first two points must be the solutions we are

    after.

    Example   2: Utility Maximization:

    Consider the following utility maximization problem:

    max x1x2

    s.t.   p1x1 + p2x2  =  I 

    1. Set up the Lagrangian:

    L (x1, x2,λ) =  x1x2 + λ (I −  p1x1 −  p2x2) .

    2. The critical points of  L  are the solutions

    x∗1 , x∗2 ,λ

     ∈ R2++ ×R to:

    x2 − λ p1  =  0

    x1 − λ p2  =  0

    I −  p1x1 − p2x2  =  0

    Let’s check if there is a solution such that λ   =  0. If  λ   =  0, then

    x1  =  x2  =  0, which violates the third equation.

    So, suppose λ  =  0. Then we have λ  =   x2 p1 =  x1 p2

    . Thus,  x1  =  p2 x2

     p1.

    Using the third equation, we obtain

    (x∗1 , x∗2 , λ

    ∗) =

      I 

    2 p1,

      I 

    2 p2,

      I 

    2 p1 p2

    .

    3. Notice that (x1, x2) = (0, I / p2) satisfies the constraint, and theresulting value of objective function at this point is zero. Since the

    value of objective function at

    x∗1 , x∗2 ,λ

      =

      I 2 p1

    ,   I 2 p2 ,  I 

    2 p1 p2

    . is

    positive, we can conclude that

      I 2 p1

    ,   I 2 p2

     is the solution we are

    after.

    ©Keiichi KAWAI (ver 2016.1)