MSc Quantitative Techniques Mathematics Weeks 1 and 2 · 2018-08-29 · Chapter 1 Preliminaries 1.1 Sets A set is any well-speciﬁed collection of elements. A set can be speciﬁed

MSc Quantitative TechniquesMathematics

Weeks 1 and 2

DEPARTMENT OF ECONOMICS, MATHEMATICS AND STATISTICSLONDON WC1E 7HX

September 2018

For MSc Programmes in Economics and Finance

Who is this course for?

This part of the MSc Quantitative Techniques course reviews basic mathematical tech-niques for most of our MSc programmes in economics and finance.

The course is taught three days a week (Mondays, Tuesdays and Thursdays) overSeptember.

Course Aims and Objectives

On completing the course successfully, you should be able to

• know the basics of sets and functions, including standard mathematical notation;

• understand the basics of linear algebra and the use of matrices;

• find the constrained optima of multivariate functions;

• compute definite and indefinite integrals;

• solve simple difference and differential equations;

• use these techniques to solve simple problems.

Assessment

Performance in this course is assessed through a sequence of in-class tests.

Textbooks

We do not recommend any particular text, but in the past students have found the follow-ing useful.

• Chiang, Alpha and Kevin Wainwright, Fundamental Methods of Mathematical Economics,McGraw Hill, 4th ed, 2005.

• Hoy, M., J. Livernois, C. McKenna, R. Rees and T Stengos, Mathematics for Economics,2nd edition, MIT Press, 2001.

• Simon, K. and L. Blume, Mathematics for Economists, WW Norton, 1994.

1

Contents

1 Preliminaries 3

1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 The set of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Properties of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.7 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Matrix algebra 23

2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 Linear independence and rank . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.5 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.7 Characteristic roots and vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.8 Matrix representation of quadratic forms . . . . . . . . . . . . . . . . . . . . 45

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Chapter 1

Preliminaries

1.1 Sets

• A set is any well-specified collection of elements.

• A set can be specified by listing its elements. For example

A = London, Paris, Athens

• Or considerB = 1, 3, 5, 7, 9

• Alternatively, we can define a rule that determines what is in the set and what is not:

B = x is an odd number: 0 < x < 10

• If x is an element of a set S, we say that x belongs to S, written as x ∈ S.

• If z does not belong to S, we write z /∈ S.

Cardinality of a set

The cardinality of a set refers to the number of elements it has.

For example, the set B = 1, 3, 5, 7, 9 has cardinality 5.

• A set may contain finitely many or infinitely many elements.

• A set with no elements is called the empty set (or the null set) and is denoted by thesymbol ∅.

3

Subsets, unions and intersection

• Given sets S and T, we say S is a subset of T if every element of S is also an elementof T. This is denoted as S ⊂ T.

• Given sets A and B, their union A ∪ B is the set of elements that are either in A or B.

A ∪ B = x : x ∈ A or x ∈ B

• Given sets A and B, their intersection A ∩ B is the set of elements that are both in Aor B.

A ∩ B = x : x ∈ A and x ∈ B

1.2 The set of real numbers

• Numbers such as 1, 2, 3, . . . are called natural numbers.

• Integers include zero and negative numbers too: . . . ,−2,−1, 0, 1, 2, 3, . . ..

• Numbers that can be expressed as a ratio of two integers – that is, of the form ab where

a and b are integers, and b 6= 0 – are said to be rational.

• Numbers such as√

2, π, e cannot be expressed as a ratio of integers: they are said tobe irrational.

• The set of real numbers includes both rational and irrational numbers. It is some-times helpful to think of real numbers as points on a ‘number line’. The set of realnumbers is usually denoted by R.

• It is common to use R+ to denote the set of non-negative real numbers, and R++ forstrictly positive real numbers.

1.2.1 Inequalities

Given any two real numbers a and b, there are three mutually exclusive possibilities

• a > b (a is greater than b)

• a < b (a is less than b)

• a = b (a is equal to b)

The inequality in the first two cases above is said to be strict.

The case where ‘a is greater than b or a is equal to b’ is denoted as a ≥ b.

4

The case where ‘a is less than b or a is equal to b’ is denoted as a ≤ b.

In these cases, the inequalities are said to be weak.

Some simple but useful relations:

• If a > b and b > c then a > c.

• If a > b then a + c > b + c for any c.

• If a > b then ac > bc for any positive c.

• If a > b then ac < bc for any negative c.

Note that multiplying through with a negative number reverses the inequality.

1.2.2 Absolute value

The distance between any two real numbers a and b is given by

|a− b| =

a− b if a ≥ bb− a if a < b.

The expression |a|, also called the absolute value of a, denotes the distance of a from 0.

|a| =

a if a ≥ 0−a if a < 0.

1.2.3 Bounded sets

A set S of real numbers is bounded above if there exists a real number H that is greaterthan or equal to every element of the set. That is, for some H we have x ≤ H for all x ∈ S.The number H, if it exists, is called the upper bound of the set S.

A set of real numbers is bounded below if there exists a real number h that is less than orequal to every element of the set. That is, x ≥ h for all x ∈ S. The number h, if it exists, iscalled the lower bound of the set S.

A set that is bounded below and bounded above is called a bounded set.

5

1.2.4 Maximum and minimum

If a set S has a largest element M, we call M the maximum element of the set: M = max S.

If a set S has a smallest element m, we call m the minimum element of the set: m =min S.

It is easy to see that if a set has a maximum element, then it is bounded above.

The converse is not true. For instance, the set x : x < 2 has an upper bound but nomaximum.

1.2.5 Intervals

If a and b are two real numbers, the set of all numbers that lie in between a and b is calledan interval.

An open interval does not contain its boundary points. The following is an open interval.

(a, b) = x ∈ R : a < x < b

A closed interval contains its boundary points. The following is a closed interval.

[a, b] = x ∈ R : a ≤ x ≤ b

We may have a half-open interval.

[a, b) = x ∈ R : a ≤ x < b

Bounded intervals

The intervals listed above, [a, b], (a, b) and [a, b) were all bounded.

The following intervals are unbounded.

(a, ∞) = x ∈ R : a < x

[a, ∞) = x ∈ R : a ≤ x

(−∞, b) = x ∈ R : x < b

(−∞, b] = x ∈ R : x ≤ b

6

1.2.6 Space and distance

We can generalize the idea of an interval to that of a space. The n-dimensional Euclideanspace, Rn is given by

Rn = (x1, x2, . . . , xn) : xi ∈ R

To work with any space we need some notion of distance. For example, the Euclideandistance between any two points in Rn, call them a = (a1, a2, ..., an) and b = (b1, b2, ..., bn),is given as

||a− b|| =√

n

∑i=1

(ai − bi)2.

The ideas of closed and open sets, and bounded and unbounded sets, can be extendedto any space with a well-defined notion of distance.

1.3 Sequences

A sequence is an ordered list of numbers, a1, a2, a3, .... The notation 〈an〉 is often used todenote a sequence whose n-th term is an.

Formally, a sequence is a function that maps a number an to each natural number n =1, 2, ....

The following are examples of sequences:

〈2n〉 = 2, 4, 8, 16 . . .

〈1〉 = 1, 1, 1, 1 . . .

〈(−1)n〉 = −1, 1,−1, 1 . . .

Convergent sequences. A sequence 〈an〉 is said to converge to a limit L if an is arbitrarilyclose to L for all n sufficiently large.

Formally, for any ε > 0, however small, there is some value N such that | an − L |< εfor n > N. We denote this as an → L as n→ ∞, or that

limn→∞

an = L.

The following sequence is convergent

〈 1n 〉 = 1, 1

2 , 13 , . . .

but this one is not〈n〉 = 1, 2, 3, . . .

7

Subsequence. If we drop some terms from a sequence (while preserving the order ofremaining terms) we obtain a subsequence. So given sequence

〈an〉 = a1, a2, a3, a4, a5 . . .

we can construct subsequences, say by dropping every alternate term

〈a2n〉 = a2, a4, a6, a8 . . .

or even arbitrarily〈ai〉 = a2, a4, a5

The summation operator. For a sequence 〈xn〉, the summation operator ∑ defines the sumof specified terms of that sequence.

For instance,m

∑i=1

xi = x1 + x2 + · · ·+ xm

denotes the sum of the first m terms.

The summation operator allows compact expression of various constructs in economicsand finance. Suppose an investor buys n different shares: x1 units of the first share at pricep1, and x2 units of the second share at price p2, and so on, till xn units of the n-th share atprice pn. Investment in share i is worth pixi, and total investment across n shares is writtenas

n

∑i=1

pixi.

The following properties are useful

n

∑i=1

(ai + bi) =n

∑i=1

ai +n

∑i=1

bi

n

∑i=1

cai = cn

∑i=1

ai

That is, the term c that does not vary with the index can be moved outside the summationoperator without affecting the value.

Double summation. More complicated operations can be expressed as double summa-tions.

m

∑i=1

n

∑j=1

xij =m

∑i=1

(xi1 + xi2 + · · ·+ xin)

= (x11 + x12 + . . . + x1n) + (x21 + x22 + · · ·+ x2n)

+ . . . + (xm1 + xm2 + · · ·+ xmn).

8

1.4 Series

A series refers to the sum of terms in a sequence. For an infinite sequence a1, a2, a3, a4, a5 . . . ,this refers to

∞

∑i=1

ai = a1 + a2 + a3 + . . .

Consider the partial sum

sN = a1 + a2 + · · ·+ aN =N

∑i=1

ai.

This is the sum of the first N terms of the original sequence. Note that s1, s2, . . . is itselfa sequence of partial sums.

1.4.1 Convergent series

The series ∑∞i=0 ai converges to a limit s if and only if the associated sequence of partial

sums sN converges to s.

Arithmetic series. If successive terms in the original sequence differ by a constant, as in

a, a + c, a + 2c, . . .

the associated series is said to be an arithmetic series.

Geometric series. If successive terms in the sequence are related by a constant multiplica-tive factor c, that is, the sequence is a, ac, ac2, ac3, . . . , , the associated series is said to bea geometric series.

Harmonic series have the form

1 +12+

13+ · · · 1

n+ · · ·

Evaluating a convergent geometric series

The following describes a standard technique for evaluating a convergent geometric series

sN =∞∑

n=1acn−1 assuming that c < 1.

Note we can write

sN = a + ac + ac2 + ac3 + · · ·+ acN−1

9

Multiplying both sides by c, we get

csN = ac + ac2 + · · ·+ acN−1 + acN

Subtracting the second equation from the first we get

(1− c)sN = a− acN

or that

sN =a(1− cN)

1− c

If c < 1, as N → ∞ we have cN → 0, so that

s =a

1− c.

1.5 Functions

Loosely speaking, a function is a rule that associates a unique value with any element ofa set. A function from a set A to set B defines a rule that assigns for each x ∈ A a uniqueelement y ∈ B. The set of all values that it ‘maps’ from is called the domain. The set ofvalues it maps into is called the range. The mapping can be denoted as f : A→ B.

Functions of a single variable

In the simplest class of functions, both A and B are the set of real numbers. For instance,consider the rule that maps temperature measurements from the Centigrade (metric) scaleto the Fahrenheit scale

y = 1.8x + 32,

where x is the Centigrade measurement and y the associated value in Fahrenheit. We coulddenote this as y = f (x).

Multivariate functions

Multivariate functions map from the n-dimensional space to the set of real numbers. Forinstance, if beer costs 2 euros a pint and chips cost 1 euro per packet, consider the rule thatevaluates the total expenditure on any bundle (xb, xc) of beers and chips.

e = 2xb + xc,

In general, we can have a mapping f : Rn→ R, which assigns a real number to n-dimensionalvariables (x1, x2, x3, . . . , xn).

10

Some standard classes of functions

Polynomials: These are functions of the sort

f (x) = a + bx + cx2.

f (x1, x2, x3) = ax1x2 + bx1x3 + cx23.

Linear functions: A function from A to B is said to be linear f if

f (x + y) = f (x) + f (y) and f (rx) = r f (x)

for all x and y in A, and r ∈ R.

Quadratic functions: A quadratic function on Rn is a real-valued function of the form

Q(x1, x2, . . . , xn) =n

∑i,j=1

aijxixj

Power Function f (x) = xa

1.5.1 Inverse of a function

A function f , defined on domain A, is one-to-one if f never has the same value for twodistinct points in A.

A one-to-one function has an inverse: for y = f (x), we can find f−1 in A, so that x =f−1(y).

Functions such as y = x2 do not possess an inverse since there are two values of x associ-ated with each y.

1.5.2 Monotonicity of functions

A function is weakly increasing if f (x′) ≥ f (x′′) for x′ > x′′ in its domain.

A function is strictly increasing if f (x′) > f (x′′) for x′ > x′′ in its domain.

A function is weakly decreasing if f (x′) ≤ f (x′′) for x′ > x′′ in its domain.

A function is strictly decreasing if f (x′) < f (x′′) for x′ > x′′ in its domain.

Functions that are either increasing or decreasing in their domain are said to be monotonic.

Strictly monotonic functions are one-to-one, so have inverse functions.

11

1.5.3 Exponents and logarithms

Consider any positive real number a. We define the exponential function as follows:

Exponential function to base a: f (x) = ax.

Since the exponential function is monotonic, it has an inverse.

Logarithmic function, to base a. If ax = y then x = loga y.

By construction, aloga(z) = z

Sometimes a particular irrational number e is used as the base for the exponential function.Let

e = limn→∞

(1 +

1n

)n

Exponential function to natural base e: f (x) = ex

This function has an inverse too:

Natural logarithmic function If ex = y then x = ln y.

Properties of exponents

• a0 = 1

• aman = am+n

• am

an = am−n

• a−m = 1am

• (am)n = amn

Properties of logarithms

• log 1 = 0

• log(m.n) = log m + log n

• log(mn ) = log m− log n

• log(1/m) = − log m

• log mn = n log m

12

1.6 Properties of functions

1.6.1 Concavity and convexity of functions

Put simply, concavity and convexity of a function refer to the curvature of its graph. Forinstance, the function y = ln x is concave; the function y = x2 is convex. We can describeconcavity and convexity of functions more formally, by first defining convex combinationsand convex sets.

Convex combination. Consider two distinct points x′ and x′′ in n-dimensional space.Their convex combination is given by

x = λx′ + (1− λ)x′′ for λ ∈ (0, 1).

For example, let x′ = (1, 2) and x′′ = (3, 4): we find that (2, 3) is a convex combination ofx′ and x′′ using λ = 0.5.

Convex sets A set X is convex if

x′ ∈ X and x′′ ∈ X implies (λx′ + (1− λ)x′′) ∈ X for all λ ∈ [0, 1].

That is, a set is convex if any convex combination of elements of the set lies in the set.

Convex function A function of a single variable is convex if the line joining two points onthe graph of the function lies above the graph.

This idea can be generalized to multivariate functions.

Consider a function f (x) = f (x1, x2, . . . , xn), defined on a convex set X. For any twopoints in x′ and x′′ in X, consider their convex combinations. The function f is said to beconvex if

f [λx′ + (1− λ)x′′] ≤ λ f (x′) + (1− λ) f (x′′)

Concave function A function of a single variable is concave if the line joining two pointson the graph of the function lies below the graph.

Consider a function f (x) = f (x1, x2, . . . , xn), defined on a convex set X. For any twopoints in x′ and x′′ in X, consider their convex combinations. The function f is said to beconcave if

f [λx′ + (1− λ)x′′] ≥ λ f (x′) + (1− λ) f (x′′)

Note that if a function f is convex, the negative of that function, − f , is concave.

13

Homogeneous functions

A function is said to be homogenous of degree λ if for y = f (x1, x2, ..., xn)

f (kx1, kx2, ..., kxn) = kλy.

For instance, y = x1x2 is homogeneous of degree 2, while y = x0.251 x0.75

2 is homogeneous ofdegree 1.

1.6.2 Continuity of functions

Graphically, the notion of continuity is easy to understand. All polynomial functions arecontinuous.

Definition 1 (Continuous functions) Let f be a function from Rk to R. Let x0 be a vector inRk and let y = f (x0) be its image. The function f is continuous at x0 if whenever xn∞

n=1 isa sequence in Rk which converges to x0, then the sequence f (xn)∞

n=1 converges to f (x0). Thefunction f is said to be continuous if it is continuous at each point in its domain.

As an example of a function that is not continuous consider

f (x) =

1, if x > 0,0, if x ≤ 0.

Continuity of composite functions: If both g and f are continuous functions, then g( f (x))is continuous.

1.7 Probability

Consider a process whose outcome cannot be predicted.

• The set Ω of all possible outcomes is called the sample space. For instance, the sam-ple space for a flip of a coin is heads, tails. The sample space for the roll of a die is1,2,3,4,5,6.

• An event A is a subset of Ω. For the latter example above, consider the event the rollof the die produces a number 3 or lower.

• The set of all events is given by the set of all subsets of Ω.

• For any event A, we can define its complement Ac as the set of elements of Ω thatare not in A.

• For any two events, Ai and Aj, their union, denoted as Ai ∪ Aj, defines the event thateither Ai or Aj or both happen.

14

• For any two events, Ai and Aj, their intersection, denoted as Ai ∩ Aj, defines theevent that both Ai and Aj happen.

• If the intersection of two events is the empty set, that is if Ai ∩ Aj = ∅, they are saidto be disjoint or mutually-exclusive.

Definition 2 (Probability function) A probability function P(.) assigns, to every event A, avalue P(A) ∈ [0, 1], so that the following axioms hold.

1. P(A) ≥ 0

2. P(Ω) = 1

3. For any disjoint events A1, A2, . . . , An, we have P(A1 ∪ A2 ∪ . . .) = ∑∞i=1 P(Ai).

These following results follow from the axioms:

• If P(A) is the probability that event A occurs, the probability of event A not happen-ing is 1− P(A).

• We know, from the axiom above, that if A and B are disjoint events, we have P(A ∪B) = P(A) + P(B). When A and B are not disjoint,

P(A ∪ B) = P(A) + P(B)− P(A ∩ B).

1.7.1 Independent events

Two events A and B are said to be independent if the probability of both happening is theproduct of their individual probabilities.

P(A ∩ B) = P(A)× P(B). (1.1)

1.7.2 Conditional probability

The probability of event A conditional on event B is defined as

P(A | B) =P(A ∩ B)

P(B). (1.2)

Note that if A and B are independent we have P(A | B) = P(A).

15

1.7.3 Bayes’ rule

Similarly, the probability of B happening given that A happens is:

P(B | A) =P(A ∩ B)

P(A). (1.3)

Multiply both sides of equation (1.2) by P(B) and both sides of (1.3) by P(A). Rear-ranging, we get

P(A ∩ B) = P(A | B)P(B) = P(B | A)P(A)

The joint probability is the product of the conditional probability and the marginalprobability. The last equality above leads to Bayes’ rule:

P(A | B) =P(B | A)P(A)

P(B).

Bayes’ rule can be re-written in different forms. Note that we can write P(B) = P(B|A)P(A)+P(B|Ac)P(Ac), so that

P(A | B) =P(B | A)P(A)

P(B|A)P(A) + P(B|Ac)P(Ac).

1.7.4 Example

A patient goes to see a doctor. The doctor performs a test with 95 percent reliability–thatis, 95 percent of people who are sick test positive and 95 percent of the healthy people testnegative. The doctor knows that only 1 percent of the people in the country are sick. Nowthe question is: if the patient tests positive, what are the chances the patient is sick?

We reason as follows

• Let A be the event that the patient is sick, and Ac be the event that the patient ishealthy

• Let B be the event that the patient’s test is positive.

• We must compute P(A|B) = P(B|A)P(A)P(B|A)P(A)+P(B|Ac)P(Ac)

• 95 percent of sick people test positive: so P(B|A) = 0.95

• 5 percent of healthy people test positive too: so P(B|Ac) = 0.05

• 99 percent of the population is healthy, so P(A) = 0.01 and P(Ac) = 0.99

16

• Using these values in the above expression, we get

P(A|B) = (0.95)(.01)(0.95)(.01) + (0.05)(0.99)

= 16.1%

An alternative way to reason through. In a population of 1 million, we would expect10,000 sick people and 990,000 healthy people. If all the sick people were tested, 95% ofthem – 9500 in total – would test positive. Of the 990000 healthy people 5% – a total of49500 – would test positive. Among the people who test positive – which is 9500 + 49500= 59000 – only 9500 are actually sick, so that probability that someone who tests positive issick is 9500/59000, which equals about 16%.

1.7.5 Random variables

Random variable A random variable X is a function whose domain is the sample spaceand whose range is the set of real numbers.

Thus, a random variable assigns a real value (i.e., a number) to every outcome in the sam-ple space. The particular values are called realizations and are denoted as x. If the real-izations are countable, x1, x2, ..., the random variable is said to be discrete. In contrast, ifthere are are infinitely-many, uncountable realizations, the random variable is said to becontinuous.

We can specify a probability function for a discrete random variable as follows

P(X = xi) = p(xi) ≥ 0, ∑i

p(xi) = 1;

with the associated cumulative distribution P(X ≤ xj) = ∑ji=1 p(xi).

For a continuous random variable, the cumulative distribution function is

F(x) = P(X ≤ x) =∫ x

−∞f (x)d(x),

where

f (x) =dF(x)

dx

denotes the probability density function and∫ ∞−∞ f (x)d(x) = 1. Although f (x) is defined

at a point, P(X = x) = 0 for a continuous random variable.

The support of a distribution refers to range over which f (x) 6= 0 (i.e., there is a positiveprobability of the underlying event occurring).

17

1.7.6 Expected value

The expected value (or mean) of a random variable is

E(x) =

∑n

i=1 p(xi)xi, if discrete∫ ∞−∞ x f (x)dx, if continuous.

For example, suppose we have a sample of n equally-likely observations x1, x2, ..., xn.Using the fact that p(xi) = 1/n, we have

E(x) =n

∑i=1

1n

xi.

Expected Value of functions of random variables

Consider a random variable X with density function f (x). For any function g(X), we have

E(g(x)) =∫ ∞

−∞g(x) f (x)dx.

Note that, in general, E(g(x)) 6= g(E(x)).

Jensen’s Inequality

If g(x) is a concave function, we have

E(g(x)) ≤ g(E(x)).

If g(x) is a linear function, we have E(g(x)) = g(E(x)). Specifically, let g(x) = a + bx.Here E(a + bx) = a + bE(x).

Joint distributions

P(X = x1, Y = y1) = p(x1, y1) for discreteP(X ≤ x, Y ≤ y) = F(x, y) for continuous

with joint pdf f (x, y) and∫ ∫

f (x, y)dxdy = 1.

Marginal distribution

p(xi) = ∑j

p(xi, yj)

f (x) =∫

f (x, y)dy

18

Conditional distribution

p(xi | yj) =p(xi, yj)

p(yj)

f (x | y) =f (x, y)f (y)

Notice this implies f (x, y) = f (x | y) f (y) = f (y | x) f (x): the joint distribution can bewritten as the product of the conditional and marginals.

We can define expectations for marginal and conditional distributions. For instance, theconditional expectation (regression) function, gives the expected value of y as a functionof x :

E(yj | xi) = ∑j

yj f (yj | xi)

E(y | x) =∫

yy f (y | x)dy.

1.7.7 Some common probability distributions

• A uniform distribution defined over interval [a, b] has density function

f (x) =

1

b−a , if a ≤ x < b0, otherwise.

• A normal distribution with mean µ and variance σ2, written x ∼ N(µ, σ2) has densityfunction

f (x) =1√

2πσ2exp

[−1

2(

x− µ

σ)2]

.

• Linear functions of normal variates are also normal. For instance, if x is normallydistributed with mean µ and variance σ2, then

z =

(x− µ

σ

)is normally distributed with mean 0 and variance 1. This standard normal distribu-tion is usually denoted as N(0, 1).

19

Problems

For the following questions, identify the best answer.

1. If |3x− 4| = 1 then x equals

(a) x = 1

(b) x = 43

(c) x = 1 or x = 53

(d) x = −1 or x = − 53

2. For what values of x is |x− 2| ≤ 1?

(a) x lies between 0 and 1

(b) x lies between -1 and 1

(c) x lies between 1 and 3

(d) x lies between -3 and 3

3. The Euclidean distance between a = (1, 2) and b = (3, 3) is

(a) (2, 1)

(b)√

3

(c)√

5

(d) 1

4. The function f (x) = 2x− 3y is

(a) strictly increasing in x

(b) strictly decreasing in x

(c) increasing in x and y

(d) increasing in y

5. The function f (x) = x2 is

(a) strictly increasing in x

(b) strictly decreasing in x

(c) neither increasing nor decreasing in x

(d) weakly increasing in x

20

6. The function f (x) = 2x− 3y is

(a) concave in x

(b) convex in x

(c) neither concave nor convex in x

(d) both concave and convex in x

7. The function f (x) = −x2 is

(a) concave in x

(b) convex in x



8. The function f (x) = x3 is

(a) concave in x

(b) convex in x



9. Consider the geometric series s = 1 + 0.25 + 0.252 + . . . .

(a) s = 2

(b) s = 4

(c) s = 4/3

(d) s = 3/4

10. The value of an asset that pays dividends Dt is given by

V0 =∞

∑t=1

Dt

(1 + r)t

where r is the interest rate and Dt = D(1 + g)t where g is the rate of dividends.Under what conditions does this series converge?

(a) g < r

(b) r < g

(c) r = g

(d) t > 1

21

Solve the following

11. If x and y are positive, prove that x < y if and only if x2 < y2.

12. Evaluate the series: x + 2x2 + 3x3 + . . . for 0 < x < 1.

13. Evaluate the series: x− x2 + x3 − x4 + . . . for 0 < x < 1.

14. Consider a random variable X that can take one of values 1, 2, 3 or 4 with equal prob-ability. Evaluate E(x), the expected value of X. Then, evaluate E(y), the expectedvalue of Y = −2X2. Use your findings to verify Jensen’s inequality.

15. Simplify the following:

(a) x1−bxa−1

(b)[

x(1−a−b)] [

x(a−b−0.5)]−2

(c) xy√

6−12x√6x2y2−24x4y2

22

Chapter 2

Matrix algebra

2.1 Vectors

A vector is an ordered set of numbers. These could be expressed as a row

[ 3 2 . . . 0 ],

or as a column 32

. . .0

.

Vectors are a commonly used construct in economics and finance. For instance, a con-sumption vector can represent the consumption level of distinct commodities; a portfoliomay be represented as a vector of asset holdings.

The number of elements in a vector is referred to as its dimension. An n-dimensionalvector can be represented as a row vector

a = [ a1 a2 . . . an ],

or as a column vector a1a2. . .an

.

2.1.1 Operations on Vectors

In what follows let a, b, and c be n-dimensional vectors.

23

Equality of vectors

Two n-dimensional vectors a and b are equal if and only if the corresponding componentsare equal: that is, if ai = bi for all i.

Addition of vectors

The sum of two n-dimensional vectors a = [ai] and b = [bi] is defined as an n-dimensionalvector c whose typical component ci = ai + bi.

For instance,[ 3 2 0 ] + [ 1 1 1 ] = [ 4 3 1 ]

Properties of vector addition. Addition of vectors is

• commutative: a + b = b + a, and

• associative: (a + b) + c = a + (b + c).

Scalar multiplication

Multiplying a vector by a scalar involves multiplying each element by that scalar. For anyvector a = [ai] and real number α, we have

αa = [αai]

For instance,3[

1 3 2]=[

3 9 6]

Difference between vectors

The difference a− b can be written as a + (−1)b.

For instance,[ 3 2 0 ]− [ 1 1 1 ] = [ 2 1 −1 ]

Two special vectors

A null vector is a vector whose elements are all zero:

0 = [ 0 0 . . . 0 ]

The difference between any vector and itself yields the null vector.

A unit vector is a vector whose elements are all 1:

i = [ 1 1 . . . 1 ]

24

Linear combination of vectors

Given two n-vectors a and b and scalars γ and δ, the vector (γa + δb) is said to be a linearcombination of a and b.

Specifically, for column vectors we have

γa + δb = γ

a1a2. . .an

+ δ

b1b2. . .bn

=

γa1 + δb1γa2 + δb2

. . .γan + δbn

Inner product of two vectors

Given two n-vectors

a =

a1a2...

an

and b =

b1b2...

bn

their inner product (sometimes called the dot product) is given by

a · b = a1b1 + a2b2 + · · ·+ anbn =n

∑i=1

aibi.

Note that a · b = b · a, so that the operation is commutative.

Inner product of vector and a unit vector

Let i be an n-dimensional unit vector, that is, a vector whose elements are all 1. The innerproduct of any n-dimensional vector a with the n-dimensional unit vector equals the sumof the components of a. That is,

i · a =n

∑i=1

ai.

Orthogonality of vectors

Two vectors are said to be orthogonal if their inner product is zero.

The following are examples or orthogonal pairs:

Example 1: [1 1] and [1 − 1],

as [1 1] · [1 − 1] = 1− 1 = 0

Example 2: [1 2 − 1] and [3 1 5],

as [1 2 − 1] · [3 1 5] = 0

25

2.2 Matrices

A matrix is a rectangular array of numbers

A = [aij] =

a11 a12 · · · a1na21 a22 · · · a2n...

...am1 am2 · · · amn

.

The notational subscripts in the typical element aij refer to its row and column locationin the array: specifically, aij is the element in the i-th row and the j-th column.

This matrix has m rows and n columns, so is said to be of dimension m× n (commonlyreferred to as ‘m by n’).

A matrix can be viewed as a set of column vectors, or alternatively as a set of rowvectors. Alternatively, a vector can be viewed as a matrix with only one row or column.

Some special matrices

A matrix with the same number of rows as columns is said to be a square matrix.

Matrices are that are not square are said to be rectangular matrices.

A null matrix is composed of all 0s and can be of any dimension. Consider the followingnull matrix [

0 0 0 00 0 0 0

]

Identity matrix

An identity matrix is a square matrix with 1s on the main diagonal, and all other elementsequal to 0. Formally, we have aii = 1 for all i and aij = 0 for all i 6= j.

Identity matrices are often denoted by the symbol I (or sometimes as In where n de-notes the dimension).

The two-dimensional identity matrix is

I2 =

[1 00 1

]More generally, the identity matrix of dimension n is

In =

1 0 . . . 00 1 . . . 0...0 0 . . . 1

26

Symmetric matrix

A square matrix A = [aij] is said to be symmetric if aij = aji. For example

1 2 52 1 05 0 3

Diagonal matrix

A diagonal matrix is a square matrix A = [aij] whose non-diagonal entries are all zero.That is, aij = 0 for i 6= j. For example

1 0 00 3 00 0 2

Upper-triangular matrix

An upper-triangular matrix (usually a square matrix) in which all entries below the diag-onal are 0. For A = [aij] we have aij = 0 for i > j. For example

1 4 10 3 00 0 2

Lower-triangular matrix

An lower-triangular matrix (usually a square matrix) in which all entries above the diag-onal are 0. For A = [aij] we have aij = 0 for i < j. For example

1 0 03 3 01 2 2

2.2.1 Matrix operations

An algebra is a system of sets and operations on these sets where the sets satisfy certainconditions and the operations satisfy some rules.

27

Equality of matrices

Matrices A and B are equal if and only if they have the same dimensions and if eachelement of A equals the corresponding element of B. That is, if

aij = bij for all i and j .

Transpose of a matrix

For any matrix A, the transpose, denoted by A′ (or sometimes A>), is obtained by inter-changing rows and columns.

If A =[aij]

, then A′ =[aji]

That is, the i-th row of the original matrix forms the i-th column of the transpose matrix.For example, if

A =

[2 3 14 1 2

]then

A′ =

2 43 11 2

Note that if A is of dimension m× n, its transpose is of dimension n×m.

Transpose of a symmetric matrix. If A is symmetric, A′ = A.

Transpose of a transpose. The transpose of a transpose of a matrix yields the originalmatrix. We have

(A′)′ = A.

Matrix addition

We can add two matrices as long as they are of the same dimension. Consider A = [aij]and B = [bij], both of dimension m× n. Their sum is defined as an m× n matrix,

C = A + B = [aij + bij].

For instance, [a11 a12a21 a22

]+

[b11 b12b21 b22

]=

[a11 + b11 a12 + b12a21 + b21 a22 + b22

]

Properties of matrix addition. Matrix addition is commutative:

A + B = B + A

28

and associative(A + B) + C = A + (B + C).

Addition of null matrix. For any matrix the addition of the null matrix leaves the originalmatrix unchanged. For any B,

B + 0 = B

where 0 is the null matrix with the same dimension as B.

Transpose of sum. The transpose of a sum of matrices is the sum of the transpose matrices

(A + B)′ = A′ + B′.

Scalar multiplication

Multiplying the matrix by a scalar involves multiply each element by that scalar. If A =[aij], for any real number λ, we have

λA = [λaij]

For instance,

3[

1 3 21 0 1

]=

[3 9 63 0 3

]

Matrix multiplication

Matrix multiplication is an operation on pairs of matrices that satisfy certain restrictions.The restriction is that first matrix must have the same number of columns as the num-ber of rows in the second matrix. When this condition holds the matrices are said to beconformable under multiplication.

Let A = [aij] be an m× n matrix and B = [bij] be an n× p matrix. As the number ofcolumns in the first matrix and the number of rows in the second both equal n, the matricesare conformable.

It is important to distinguish between pre-multiplication and post-multiplication. Inthe product AB, the matrix A is post-multiplied by B (or, equivalently, B is pre-multipliedby A).

The product matrixC = AB

is an m× p matrix whose ij-th element equals the inner product of the i-th row vector ofmatrix A and the j-th column vector of matrix B. Formally,

cij = ∑k

aikbkj.

29

For instance,

[2 1 21 2 3

] 1 32 11 0

=

[2(1) + 1(2) + 2(1) 2(3) + 1(1) + 2(0)1(1) + 2(2) + 3(1) 1(3) + 2(1) + 3(0)

]=

[6 78 5

]

Properties of matrix multiplication

Matrix multiplication is not commutative. Here is why.

• Even when matrices A and B are conformable so that AB exists BA may not exist.For instance, if A is 3× 2 and B is 2× 2, AB exists but BA is not defined.

• even when both product matrices exist, they may not have the same dimensions. Forinstance, if A is 2× 3 and B is 3× 2, AB is of order 2× 2 while BA is of order 3× 3.Consider, the numerical example above, for instance.

• even when both product matrices are of the same dimension, they may not be equal.For instance [

1 23 1

] [1 00 2

]=

[1 43 2

]while [

1 00 2

] [1 23 1

]=

[1 26 2

]Further AB= 0 does not imply either A= 0 or B = 0.

Consider [1 −1−2 2

] [1 01 0

]

Also AB= AC and A 6= 0 does not imply B= C.

Consider [1 11 1

] [1 23 5

]and [

1 11 1

] [2 42 3

]

However matrix multiplication is associative

(AB)C = A(BC).

30

and is distributive across sums of matrices

A(B+C) = AB+AC(B+C)A = BA+CA

Transpose of a product. The transpose of the product

(AB)′ = B′A′

Multiplication with the identity matrix. Pre-multiplying or post-multiplying any matrixA with the identity matrix (of conformable dimension) yields the original matrix

IA = AI = A.

Multiplication with the null matrix. Multiplication of a matrix by a conformable nullmatrix produces a null matrix.

Multiplying a matrix by itself

Notation: For a square matrix A, we write

• A2 = AA and

• An = AA . . . A︸︷︷︸n times

Idempotent matrix

A square matrix A is said to be idempotent if AA = A.

Consider [3 6−1 −2

]

Elementary operations on a matrix

The following operations on a matrix are described as elementary row operations.

1. interchange of two rows

2. changing a row by adding to it the multiple of another row

3. multiplying each element of a row by the same non-zero number

31

Row-echelon form. A matrix has the row-echelon form if each row has more leadingzeros than the one preceding it.

Consider 3 6 60 0 −20 0 0

2.3 Determinants

The determinant is an operation defined on square matrices. It maps the set of squarematrices to the set of real numbers.

2.3.1 Determinants of order 2

The determinant of a 2× 2 matrix A, usually denoted as |A|, is defined as

|A| =∣∣∣∣ a11 a12

a21 a22

∣∣∣∣ = a11a22 − a12a21

2.3.2 Determinants of order 3

For the 3× 3 matrix, the determinant is defined as

|A| = a11(a22a33 − a23a32)− a12(a21a33 − a23a31) + a13(a21a32 − a22a31)

2.3.3 Higher-order determinants

These operations can be represented more conveniently using the notion of minors.

Minors. For any square matrix A, consider the sub-matrix A(ij) formed by deleting the i-throw and j-th column of A. The determinant of the sub-matrix A(ij) is called the (i, j)-thminor of the matrix (or sometimes the minor of element aij). We denote this as Mij.

For instance, the minors associated with the first row of a 3× 3 matrix are

M11 =

∣∣∣∣ a22 a23a32 a33

∣∣∣∣ , M12 =

∣∣∣∣ a21 a23a31 a33

∣∣∣∣ , M13 =

∣∣∣∣ a21 a22a31 a32

∣∣∣∣ .

Recalling how we specified determinants of order 2 and 3, we see that

|A| = a11M11 − a12M12 + a13M13.

Note the alternating positive and negative signs. To express this, sometimes we useco-factors rather than minors.

32

Cofactors. A cofactor associated with element aij, denoted by Cij, is the minor with aprescribed algebraic sign (−1)i+j. Put simply, the sign is positive for elements whose rowand column indices add up to an even number, and negative otherwise. Thus

Cij ≡ (−1)i+j Mij,

so thatC11 = (−1)2M11, C12 = (−1)3M12, C13 = (−1)4M13.

In terms of co-factors, |A| can be written as

|A| = a11C11 + a12C12 + a13C13

These recursive operations can be used to define the determinant of any order, and aregenerally referred to as the Laplace expansion.

2.3.4 Properties of determinants

1. The transpose operation (interchanging rows with columns) does not affect the valueof the determinant. Hence, |A| = |A′|.

2. The interchange of two rows or two columns will change the sign of the determinantbut not the numerical value.

3. The multiplication of one row or one column in A by a scalar k will change the valueof the determinant to k|A|.

4. The addition (subtraction) of a multiple of any row to (from) another row will leavethe value of the determinant unchanged. The same applies to columns.

5. If one row (column) is a multiple of another row (column), the value of the determi-nant is zero.

6. If A is a triangular matrix (or a diagonal matrix) then |A| = a11a22 . . . ann.

7. |A.B| = |A||B|

2.4 Linear independence and rank

2.4.1 Linear independence

A set of vectors is linearly dependent if any of the vectors in the set can be written as alinear combination of the others.

Consider the vectors [−a − b] and [2a 2b]. Here the second row is a multiple of thefirst row, so that the vectors are linearly dependent. Another way to express this: a linear

33

combination of the vectors – in particular, two times the first vector added to the secondvector – equals a null vector [0 0].

This suggests a definition of linear dependence.

Linear Dependence. Vectors v1, v2, ..., vn are linearly dependent if and only if there existsscalars k1, k2, ..., kn, not all zero, such that

k1v1 + k2v2 + ... + knvn = 0.

Alternatively, we can have an equivalent definition of linear independence.

Linear Independence. Vectors v1, v2, ..., vn are linearly independent if and only if

k1v1 + k2v2 + ... + knvn = 0

for scalars k1, k2, ..., kn implies k1 = k2 = ... = kn = 0.

Example. To check if the row vectors of the matrix below are linearly dependent.

A =

3 4 50 1 26 8 10

=

v1v2v3

where vi denote the row vectors. Note v3 = 2v1. If we take k1 = 2, k2 = 0, and k3 = −1,we get

2v1 + 0− v3 = 0

As we have found coefficients, not all zero, such that the linear combination is the nullvector, the vectors are linearly dependent.

Determinant and linear dependence of rows of a square matrix

Consider a 2× 2 matrix where the vectors are linearly dependent:∣∣∣∣ a bka kb

∣∣∣∣ = kab− kab = 0

Linear dependence turns out to be equivalent to the determinant of the matrix being equalto zero. This holds more generally, for any square matrix.

2.4.2 Rank of a matrix

Rank is defined as the order of the largest non-zero determinant that can be obtained fromthe elements of a matrix. This definition applies to both square and rectangular matrices.

34

Thus a non-zero matrix A has rank r if at least one of its r-square minor is differentfrom zero while every (r + 1) or larger square minor, if any, is equal to zero.

The rank of the matrix A can be found by starting with the largest determinants oforder m, and evaluating them to ascertain if one of them is non-zero. If so, rank(A) = m.If all the determinants of order m are equal to zero, we start evaluating determinants oforder m− 1. Continuing in this fashion, we eventually find the rank r of the matrix, beingthe order of the largest non-zero determinants.

Example 1. Find rank(A) where A =

[6 23 1

].

Note |A| = 0. Then rank(A) = 1, since the largest non-zero minor of A is of order 1. (Inthis example there are four non-zero minors of order 1).

Example 2. Find rank(A) where A =

[6 2 33 1 3

].

Consider the minor obtained by deleting the second column. We have∣∣∣∣ 6 33 3

∣∣∣∣ 6= 0,

so that rank(A) = 2 in this case.

Clearly if A is (n×m) and n 6= m, then rank(A) ≤ min(n, m).

2.5 Systems of linear equations

Matrices provide a compact way to represent a system of linear equations. Consider

a11x1 + a12x2 + · · ·+ a1nxn = d1

a21x1 + a22x2 + · · ·+ a2nxn = d2

· · ·am1x1 + am2x2 + · · ·+ amnxn = dm

This can be written more compactly as Ax = d where

A =

a11 a12 · · · a1na21 a22 · · · a2n· · ·am1 am2 · · · amn

x =

x1x2· · ·xn

d =

d1d2· · ·dm

A solution to this system refers to a set of values x∗1 , x∗2 , . . . , x∗n that satisfies all equations

simultaneously. In general, the existence of a solution cannot be guaranteed: a system

35

of equations may have no solution. On the other hand there may be multiple solutions(including, possibly, an infinite number of solutions).

The system of equations is said to be homogenous if d = 0. Otherwise it is said to benon-homogeneous.

The issue of existence can be discussed using simple examples. We will confine atten-tion to the case where the number of equations is the same as the number of variables.

Example 1: Consider, first,

2x1 + 4x2 = 03x1 + x2 = 0

This system has a unique solution, x∗1 = 0; x∗2 = 0.

Example 2: Next, consider

2x1 + 4x2 = 0x1 + 2x2 = 0

This system has an infinite number of solutions.

Example 3: Consider, next,

2x1 + 4x2 = 83x1 + x2 = 7

This system has a unique solution, x∗1 = 2; x∗2 = 1.

Example 4: Consider, next,

2x1 + 4x2 = 8x1 + 2x2 = 7

This system has no solution: the equations are inconsistent.

Example 5: Last, consider

2x1 + 4x2 = 8x1 + 2x2 = 4

This system has an infinite number of solutions.

36

2.5.1 The coefficient matrix and existence of solutions

Analysis of the coefficient matrix helps us to discover general principles about the exis-tence of solutions. Consider a system with two linear equations in two unknowns.

a11x1 + a12x2 = d1

a21x1 + a22x2 = d2

which can be written as Ax = d where

A =

[a11 a12a21 a22

]x =

[x1x2

]d =

[d1d2

]We can write vector d as a linear combination of the columns of matrix A

x1

[a11a21

]+ x2

[a12a22

]=

[d1d2

]If d = 0, then linear independence of the column vectors implies that the only solution isthe trivial one: that x1 = x2 = 0.

2.5.2 Existence of a solution

When does a solution exist? We distinguish between two cases.

• A homogeneous system, where d = 0. For this case x = 0 is an obvious (or trivial)solution: see Example 1 above. A non-trivial solution exists only if A has less thanfull rank: is, if |A| = 0, which generally leads to an infinite number of solutions, asin Example 2.

• A non-homogeneous system, where d 6= 0. This has a unique non-trivial solutiononly if A has full rank: that is, if |A| 6= 0: see Example 3. If A has less than full rank,we may have no solutions (Example 4) or an infinite number of solutions (Example5).

2.6 Inverse matrix

For a square matrix A, there may exist a matrix B such that

AB = BA = I

An inverse, if it exists is usually denoted as A−1, so that the above definition can be writtenas

AA−1 = A−1A = I

If an inverse does not exist for a matrix, the matrix is said to be singular.

37

If an inverse exists, the matrix is said to be non-singular.

Singularity, rank, and determinant. Singularity of the matrix is closely tied to the valueof the determinant. We can show that the following statements are equivalent

• |A| 6= 0

• all rows or columns in A are linearly independent,

• A is non-singular

• there exists a unique inverse A−1.

Properties of inverse matrices. As long as the defined inverses exist,

1. (A−1)−1 = A.The inverse of an inverse recovers the original matrix.

2. (AB)−1 = B−1A−1.The inverse of a product is the product of inverses with order switched.

3. (A′)−1 = (A−1)′.The inverse of a transpose is the transpose of the inverse.

4. If A is a diagonal matrix, then A−1 is also diagonal, with diagonal elements 1/aii.

2.6.1 Using the inverse matrix to solve a system of equations

Consider a system of n linear equations in n unknowns, which can be written as

Ax = d

so where A is a square matrix with dimension n× n, and x and d are n× 1 vectors.

If an inverse exists for square matrix A then pre-multiplying the previous expressionwith A−1 we get

A−1Ax = A−1d

orx = A−1d.

2.6.2 Computing the inverse matrix

We now describe a method for finding the inverse matrix. We begin by setting up somefurther constructs.

38

The cofactor matrix

For any element aij of a square matrix A, the cofactor is given by

Cij = (−1)i+j|Mij|.

The cofactor matrix C is obtained by replacing each element the matrix A by its corre-sponding cofactor, Cij.

Example: Find the cofactor matrix for

A =

[3 24 −1

].

The co-factors areC11 = (−1)1+1M11 = −1C12 = (−1)1+2M12 = −4C21 = (−1)2+1M21 = −2C22 = (−1)2+2M22 = 3

C =

[C11 C12C21 C22

]=

[−1 −4−2 3

]

The adjoint matrix

For any square matrix A, the adjoint of A is given by the transpose of the co-factor matrix.Denoting the associated co-factor matrix as C, we have

adj A = C′.

In the previous example, the adjoint is given by

adj A =

[C11 C21C12 C22

]=

[−1 −2−4 3

]

The inverse

For any square matrix A, the inverse A−1 is given by

A−1 =1|A| adj A,

which is defined as long as |A| 6= 0.

39

Why do these steps work? (only if you really want to know!)

To see why, we multiply an arbitrary 2× 2 matrix A by its adjoint matrix, adj A. Let theproduct matrix be given by B. Thus

[a11 a12a21 a22

] [C11 C21C12 C22

]=

[b11 b12b21 b22

]

b11 = a11C11 + a12C12b12 = a11C21 + a12C22b21 = a21C11 + a22C12b22 = a21C21 + a22C22

But

C = adj A =

[a22 −a12−a21 a11

]then

b11 = a11a22 − a12a21 = |A|b12 = a11(−a12) + a12a11 = 0b21 = a21a22 + a22(−a21) = 0b22 = a21(−a12) + a22a11 = |A|

Hence

A adjA =

[|A| 00 |A|

]= |A|

[1 00 1

]= |A|I.

For the general n× n matrix, if the elements of a row are multiplied by the co-factorsof a different row and the products are summed, the result is zero. This ensures all theoff-diagonal elements of the product of A and adj A are zero. Also, the elements on theprincipal diagonal are equal to |A|. Thus for an n× n matrix A we have

A× adj A =

|A| 0 · · · 00 |A| · · · 0... 0 |A|

...0 0 · · · |A|

= |A|I.

Then, as long as |A| 6= 0,

Aadj A|A| = I

This yields:adj A|A| = A−1

40

2.6.3 Cramer’s rule

This method of matrix inversion enables us to describe a convenient procedure for solvinga system of linear equation.

Consider a system of n linear equations in n unknowns

Ax = d,

where A is an n× n matrix, and x and d are n× 1 vectors. As long as an inverse exists (thatis, as long as A is non-singular

x = A−1d =adj A|A| d.

This can be written asx1x2..xn

=1|A|

C11 C21 . . Cn1C12 C22 . . Cn2. . . . .. . . . .C1n C2n . . Cnn

d1d2..dn

where Cij = (−1)i+j|Mij|. Rewrite this as

x1x2..xn

=1|A|

C11d1 + C21d2 + . . . + Cn1dnC12d1 + C22d2 + . . . + Cn2dn. . .. . .C1nd1 + C2nd2 + . . . + Cnndn

Compare the i-th element

xi =1|A| (C1id1 + C2id2 + ... + Cnidn)

of this expression, with the Laplace expansion for the evaluation of |A|:

|A| = (C1ia1i + C2ia2i + ... + Cniani)

We can see that in compared to the previous equation the elements a1i, a2i, ..., ani have beenreplaced by d1, d2, ..., dn. So (C1id1 +C2id2 + ...+Cnidn) is the determinant, expanded downthe i-th column, of the following matrix, which we will call Di:

Di =

a11 a12 . . d1 . . a1na21 a22 . . d2 . . a2n. . . . . . . .. . . . . . . .an1 an2 . . dn . . ann

41

(note: the i-th column has been replaced by b. To summarize, we can write

xi =|Di||A| .

This procedure is referred to as Cramer’s rule for solving the system of equations Ax=d.

Example: Find x1, using Cramer’ s rule, where[6 −3−2 6

] [x1x2

]=

[5035

]Now, |A| = 36− 6 = 30 6= 0. Then

x1 =|D1||A| =

∣∣∣∣ 50 −335 6

∣∣∣∣30

=50(6)− (−3)35

30= 13.5

2.7 Characteristic roots and vectors

Characteristic roots (sometimes called latent roots or eigenvalues) and characteristic vec-tors (latent vectors or eigenvectors) are used for stability analysis of dynamic economicmodels and in econometrics.

Definition: If A is a n× n square matrix, and if a scalar λ and a (n× 1) vector x 6= 0 satisfy

Ax = λx,

then λ is a characteristic root of A and x is the associated characteristic vector.

If x = 0, then any λ would give Ax = λx and the problem is trivial. Hence we excludex = 0.

Note also that characteristic vectors are not unique: if x is an characteristic vector, thenµ(Ax) = µ(λx) for any µ 6= 0. Thus A(µx) = λ(µx), so that µx is also an characteristicvector. For this reason the characteristic vector is said to be only determined up to a scalarmultiple.

2.7.1 Finding the characteristic roots

Rewrite equationAx = λx

asAx− λx = 0

or

42

[A− λI]x = 0

which is a homogeneous system of equations. If there is a non-trivial solution to a homoge-neous system, then the matrix must be singular, i.e.

|A− λI| = 0

Thus we must choose values for λ such that the determinant

|A− λI| =

∣∣∣∣∣∣∣∣∣∣a11 − λ a12 . . a1na21 a22 − λ . . a2n. . . . .. . . . .an1 an2 . . ann − λ

∣∣∣∣∣∣∣∣∣∣= 0.

If [A− λI] is a 2× 2 matrix, the value of the determinant

|A− λI| = [(a11 − λ)(a22 − λ)− a12a21]

is a polynomial of degree 2

|A− λI| = (a11a22 − a12a21)− (a11 + a22)λ + λ2

This characteristic equation sets the value of the polynomial to zero, and the character-istic roots are the solutions to this equation: that is, values of λ that, substituted into thelast equation yield 0.

Example: find the characteristic roots of the (2× 2) matrix

G =

[2 22 −1

]The characteristic polynomial is

|G− λI| =∣∣∣∣ 2− λ 2

2 −1− λ

∣∣∣∣ = −(1 + λ)(2− λ)− 4

so that the characteristic equation is

λ2 − λ− 6 = 0

with characteristic roots λ1 = 3 and λ2 = −2.

More generally, for an n× n matrix, the determinant will be a polynomial of degree n

b0 + b1λ + b2λ2 + ... + bn−1λn−1 + bnλn

An n-th order polynomial has up to n different solutions. However, two or more roots maycoincide, so that we get fewer than n distinct values; some roots may involve imaginarysquare roots of negative numbers, giving complex roots.

43

2.7.2 Calculation of characteristic vectors

For λi, we have

[A− λiI]xi = 0

where xi is the characteristic vector corresponding to λi.

In the previous Example, consider first the case where λ1 = 3. To find the characteristicvector associated with λ1 = 3, we solve

[A− λ1I]x1 = 0

that is [2− 3 2

2 −1− 3

] [x11x12

]=

[−1 2

2 −4

] [x11x12

]=

[00

]

Noting that rank (A− λ1I) is 1, we find that x11 = 2x12. Choosing x12 arbitrarily as 1,we have x11 = 2, thus we obtain the characteristic vector x1 = [ 2 1 ]′.

To find the characteristic vector associated with λ2 = −2, we solve [A − λ2 I]x2 = 0;that is: [

2− λ2 22 −1− λ2

] [x21x22

]=

[4 22 1

] [x21x22

]=

[00

]Noting that rank (A− λ2I) is 1, we find that x21 = −1/2x22. Choosing x22 arbitrarily as2 to eliminate the fraction, we find x21 = −1; the associated characteristic vector x2 =[ −1 2 ]′.

2.7.3 Trace of a matrix

The trace of a square matrix is the sum of its diagonal elements:

tr(A) = ∑i

aii.

Example: The trace of the matrix

G =

[2 22 −1

]equals 2− 1 = 1

The following results hold:tr(cA) = c.tr(A)

tr(A) = tr(A′)

tr(A + B) = tr(A) + tr(B)

44

If both matrix products AB and BA are defined

tr(AB) = tr(BA)

We state the following without proof:

Let λ1, λ2, . . . , λn be the characteristics roots of a square matrix A. Then

• λ1 + λ2 + . . . + λn = tr(A)

• λ1 · λ2 · . . . λn = |A|.

2.8 Matrix representation of quadratic forms

A quadratic function on Rn is a real-valued function of the form

Q(x1, x2, . . . , xn) =n

∑i,j=1

aijxixj

A quadratic form has a matrix representation. Consider a quadratic function on R2. Wecan write

a11x21 + a12x1x2 + a22x2

2 = [x1 x2]

[a11 a12

0 a22

] [x1x2

],

or even as

a11x21 + a12x1x2 + a22x2

2 = [x1 x2]

[a11

12 a12

12 a12 a22

] [x1x2

].

The latter representation offers a way to express any quadratic form Q(x) as x′A x, whereA is a symmetric matrix. The quadratic form

Q(x1, x2, . . . , xn) = Σi≤jaijxixj

can be written as

[x1 x2 . . . xn]

a11

12 a12 . . . 1

2 a1n12 a12 a22 . . . 1

2 a2n...

.... . .

...12 a1n

12 a2n . . . 1

2 ann

x1x2...xn

,

which is of the form x′A x.

Orthogonal matrix

A matrix P which satisfies the condition that P−1 = P′ (or equivalently that P′P = I) is saidto be orthogonal.

45

2.8.1 Sign definiteness of quadratic forms

Given an n× n symmetric matrix A quadratic form xT Ax is said to be

• positive definite if xT Ax > 0 for all x 6= 0 in Rn;

• positive semi-definite if xT Ax ≥ 0 for all x 6= 0 in Rn;

• negative definite if xT Ax < 0 for all x 6= 0 in Rn;

• negative semi-definite if xT Ax ≤ 0 for all x 6= 0 in Rn;

• indefinite if xT Ax > 0 for some x in Rn, and xT Ax < 0 for some other x in Rn

There are two ways of establishing the sign definiteness of a quadratic form: the deter-minantal and the eigenvalue test.

Determinantal test for sign definiteness

Consider the quadratic form xT Ax. The determinant of the n× n matrix A, namely |A|, iscalled the discriminant of the quadratic form. The sign definiteness of the quadratic formdepends on the principal minors of its discriminant. Consider a symmetric n× n matrixA.

A =

a11 a12 ... a1na21 a22 ... a2n

an1 an1 ... ann

, where aij = aji

The principal minors of this matrix are given as

|A1| = |a11|, |A2| =∣∣∣∣ a11 a12

a21 a22

∣∣∣∣ , ..., |An| = |A|.

The quadratic form is

• positive definite if and only if all of its n principal minors are positive;

• negative definite if and only if its n principal minors alternate in sign as follows:

|A1| < 0, |A2| > 0, |A3| < 0, etc.

(The k-th order principal minor should have the sign (−1)k.)

If the above inequalities are weak, we get the corresponding conditions for semi-definiteness.

46

The eigenvalue test for sign-definiteness

The quadratic form xT Ax is:

• positive (semi) definite if and only if all the eigenvalues of A are strictly positive(non-negative);

• negative (semi) definite if and only if all the eigenvalues of A are strictly negative(non-positive).

2.8.2 Decomposing matrices

The numerical solution of a system of linear equations Ax = b can sometimes be simplifiedif the matrix of coefficients, A can be written as a product of a lower-triangular matrix Land upper triangular matrix U.

If A is a symmetric and positive definite matrix, we have what is called the Choleskydecomposition.

47

Problems

1. Let the matrices A and B be given by

A =

[4 92 1

]B =

[2 41 7

](a) Calculate AB and demonstrate that the commutative law of multiplication AB =

BA does not hold under matrix multiplication.

(b) Calculate the determinant of A.

(c) Find the inverse of matrix A.

2. Given

D =

[7 25 4

], E =

[24

], F =

[3 7

](a) Determine for each of the following whether the products DE, EF and DF are

defined. If so, indicate the dimensions of the product matrices.

(b) Find EF and FE.

(c) Calculate (DE)F and D(EF).

3. Consider the equation system

4x1 + 6x2 + 8x3 = 2x1 + x2 + x3 = 1

4x1 + 3x2 + 2x3 = 1

(a) Write the system in matrix notation, as Ax = b.

(b) How many solutions does this system have? Find them.

4. Let

A =

1 1 10 1 10 0 1

(a) Compute A′ (the transpose of A).

(b) Compute AA’.

(c) Find the inverse A−1.

5. Given

B =

−2 1 3−1 1 β+1 2 1

For what value(s) of β is B not invertible?

48

6. A model is specified by the following relations

IS curve

Y = C + IC = 100 + 0.7YI = 180− 125r

LM curve

MD = MS = 255MD = 220 + 0.2Y− 175r

Find values Y∗ and r∗ that satisfy these relations, using Cramer’s rule.

7. Let

A =

3 8 10 4 30 3 4

(a) Find the characteristic roots and characteristic vectors of A.

(b) Verify that the trace of the matrix equals the sum of the characteristic roots.

(c) Verify that the determinant of the matrix equals the product of the characteristicroots.

8. Show that if λ is an characteristic root of a square matrix A, then λ2 is an characteristicroot of A2.

9. Given a 2× 4 matrix X’, define P = X(X’X)−1X′ and M=I-P.

(a) Show that M and P are idempotent.

(b) Show that MP=0.

49

Documents

MSc Quantitative Techniques Mathematics Weeks 1 and 2 · 2018-08-29 · Chapter 1 Preliminaries 1.1 Sets A set is any well-speciﬁed collection of elements. A set can be speciﬁed