Math (P)refresher for Political Scientists · 3.4 Probability I: Probability Theory ... (P)refresher for Political Scientists Wednesday, August 22 - Thursday, August 30 ... William

Math (P)refresher for Political Scientists∗

Harvard University

2012

∗The documents in this booklet are the product of generations of Math (P)refresher Instructors: Curt Signorino1996-1997; Ken Scheve 1997-1998; Eric Dickson 1998-2000; Orit Kedar 1999; James Fowler 2000-2001; Kosuke Imai2001-2002; Jacob Kline 2002; Dan Epstein 2003; Ben Ansell 2003-2004; Ryan Moore 2004-2005; Mike Kellermann2005-2006; Ellie Powell 2006-2007; Jen Katkin 2007-2008; Patrick Lam 2008-2009; Viridiana Rios 2009-2010; JenniferPan 2010-2011; Konstantin Kashin 2011-2012

1

Contents

1 Math (P)refresher Schedule 3

2 Syllabus 6

3 Lecture Notes 11

3.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Calculus I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Calculus II: An Integral Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Probability I: Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.5 Probability II: Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Linear Algebra I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.7 Linear Algebra II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.8 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.9 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4 Computing Handouts 82

4.1 Starting/Connecting to a VNC Session . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.2 Configuring Your Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3

1 Math (P)refresher Schedule

Schedule 4

Wednesday, 22 August9:00 - 9:30, Room K3541, Breakfast.9:30 - 11:30, Room K354, Intro. and Lecture (Notation and Functions).11:30 - 12:00, Room K354, Setting Up Computing Software.Bring laptops if you’d like help setting up software.12:00 - 1:00, Room K354, Lunch with Prof. Gary King.1:00 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:30.4:00 - 5:00, CGIS Knafel Cafe, Happy Hour!

Thursday, 23 August9:00 - 9:30, Room K354, Breakfast.9:30 - 12:00, Room K354, Lecture (Calculus I).12:00 - 1:00, Room K354, Lunch with Prof. Claudine Gay (Director of Graduate Studies).1:00 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:30.

Friday, 24 August9:00 - 9:30, Room K354, Breakfast.9:30 - 12:00, Room K354, Lecture (Calculus II).12:00 - 1:00, Lunch with Prof. Jorge Dominguez.1:00 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:30.

Saturday, 25 August9:00 - 9:30, Room K354, Breakfast.9:30 - 12:00, Room K354, Lecture (Probability I)12:00 - 1:00, Room K354, Lunch on your own.1:00 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:30.

Monday, 27 August9:00 - 9:30, Room K354, Breakfast.9:30 - 12:00, Room K354, Lecture (Probability II)12:00 - 1:00, Room K354, Lunch with Prof. Beth Simmons1:00 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:30.

1This location refers to Room K354 in CGIS Knafel, 1737 Cambridge St.

Schedule 5

Tuesday, 28 August9:00 - 9:30, Room K354, Breakfast.9:30 - 12:00, Room K354, Lecture (Linear Algebra I).12:00 - 1:00, Room K354, Lunch with other graduate students.1:00 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:30.

Wednesday, 29 August9:00 - 10:30, Room K354, Early Lecture (Linear Algebra II).10:30 - 1:00, GSAS Orientation and Lunch.1:00 - 4:00, DudleyFest1:30 - 4:00, Room K354 or CGIS Computer Lab, Computing/Problem SetSession. Students will switch at 2:45.

Thursday, 30 August9:00 - 9:30, Room K354, Breakfast.9:30 - 12:00, Room K354, Lecture (Optimization I –optional).12:00 - 1:00, Room K354, Lunch on your own1:00 - 2:30, Room K354 Computing Wrap up.2:30 - 4:00, Room K354, Lecture (Optimization II –optional).

6

2 Syllabus

Syllabus 7

Math (P)refresher for Political Scientists

Wednesday, August 22 - Thursday, August 30, 2012Breakfast 9am - 9:30am

Lecture 9:30am - 12:00pmSection 1:00pm - 4:00pm

(No Class on Sunday, August 26)

Konstantin Kashin (Instructor) Andy Hall (Teaching Fellow)[email protected] [email protected]

Gary King (Faculty Adviser) Mailing [email protected] [email protected]

PURPOSE: Not only do the quantitative and formal modeling courses at Harvard require mathe-matics and computer programming — it’s becoming increasingly difficult to take courses in politicaleconomy, American politics, comparative politics, or international relations without encounteringgame-theoretic models or statistical analyses. One need only flip through the latest issues of the toppolitical science journals to see that mathematics have entered the mainstream of political science.Even political philosophy has been influenced by mathematical thinking. Unfortunately, most un-dergraduate political science programs have not kept up with this trend — and first-year graduatestudents often find themselves lacking in basic technical skills. This course is not intended to bean introduction to game theory or quantitative methods. Rather, it introduces basic mathematicsand computer skills needed for quantitative and formal modeling courses offered at Harvard.

PREREQUISITES: None. Students for whom the topics in this syllabus are completely foreignshould not be scared off. They have the perfect background for this course — the ones in mostneed of a “prefresh”ing before they take further courses with technical content. Students who havepreviously had some of this material, but have not used it in a while, should take this course to“refresh” their knowledge of the topics.

STRUCTURE & REQUIREMENTS: The class will meet twice a day, 9:00am – 12:00pm and1:00pm – 4:00pm. This course is not for credit and has no exams. No one but the student willknow how well he or she did. However, it still requires a significant commitment from students.Students are expected to do the reading assignments before the classes. Lectures will focus on majormathematical topics that are used in statistical and formal modeling in political science. Sectionswill be divided into two parts. During problem-solving sections, students are given exercises to workon (or as homework if not finished then), which should be handed in the following day. Studentsare encouraged to work on the exercises in groups of two or three. You learn more quickly wheneveryone else is working on the same problems! The exercises will be checked for errors to givestudents an indication of how well they are assimilating the material. During computing sections,we will introduce you to the computing environment and software packages that are used in thedepartmental methods sequence. Math isn’t a spectator sport — you have to do it to learn it.

COMPUTING: All of the methods courses in the department, and increasingly courses in formaltheory as well, make extensive use of the computational resources available at Harvard. Studentswill be introduced to Latex (a typesetting language useful for producing documents with mathe-matical content)f and R (the statistical computing language/environment used in the department’smethod courses). These resources are very powerful, but have something of a steep learning curve;one of the goals of the prefresher is to give students a head start on these programs.

Syllabus 8

TEXTBOOKS: The required text for this course is the textbook by Jeff Gill. The course willuse the second or third printing of this text. Please bring $35 on the first day of the prefresher toreceive a copy of the textbook.

1. Gill, Jeff. 2006. Essential Mathematics for Political and Social Research. Cambridge, Eng-land: Cambridge University Press.

There are several optional/recommended texts that you may wish to consult during the course.These texts will be available on reserve, but you may want to purchase some of them for your ownfuture reference. In particular, Simon and Blume is useful for those who will be taking formalmodeling courses in the Government or Economics departments.

2. Simon, Carl P. and Lawrence Blume. 1994. Mathematics for Economists. New York: Norton.

3. Wackerly, Dennis, William Mendenhall, and Richard Scheaffer. 1996. Mathematical Statisticswith Applications, 5th edition.

4. Hahn, Harley. 1996. Harley Hahn’s Student Guide to Unix, 2nd edition.

Syllabus 9

Lecture Schedule:

• Lecture 1: Notation and Functions (Wednesday, August 22; CGIS Knafel K354)

Topics: R1 and Rn. Interval Notation for R1.Neighborhoods. Open/Closed/Compact Sets.Introduction to Functions. Domain and Range.Some General Types of Functions.Log, Ln, and e. Solving for Variables.Finding Roots. Limits of Functions. Continuity.

Required Reading: Gill Ch. 1, 5.2Further Reading: SB 2.1-2, 12.3-5, 10.1, 13.1-2, 5.1-4

• Lecture 2: Calculus I (Thursday, August 23; CGIS Knafel K354)

Topics: Sequences. Limit of a Sequence.Derivatives. Higher-Order Derivatives.Maxima and Minima. Composite Functions.The Chain Rule. Derivatives of Exp and Ln.L’Hospital’s Rule.

Required Reading: Gill Ch. 5.3-4, 6.4Further Reading: SB 12.1-2, 2.3-6, 3.1-2, 3.5, 4.1-2, 5.5

• Lecture 3: Calculus II (Friday, August 24; CGIS Knafel K354)

Topics: Partial DerivativesThe Indefinite Integral: The Antiderivative.The Definite Integral: The Area under the Curve.Integration by Substitution.Integration by Parts.

Required Reading: Gill Ch. 6.2-3, 5.5-6Further Reading: SB 14.1, 14.3-4, Appendix 4.1-3

• Lecture 4: Probability I (Saturday, August 25; CGIS Knafel K354)

Topics: Counting. Sets. Probability.Conditional Probability and Bayes’ Rule.Independence.

Required Reading: Gill Ch. 7Further Reading: WMS 2.1-11

Syllabus 10

• Lecture 5: Probability II (Monday, August 27; CGIS Knafel K354)

Topics: Levels of Measurement. Discrete Distributions.Continuous Distributions. Joint Distributions.Expectation. Special Discrete Distributions.Special Continuous Distributions.Summarizing Observed Data.

Required Reading: Gill Ch. 8Further Reading: WMS 3.1-4, 3.8, 4.1-5, 4.8

• Lecture 6: Linear Algebra I (Tuesday, August 28; CGIS Knafel K354)

Topics: Working with Vectors. Linear Independence.Matrix Algebra. Systems of Linear Equations.Method of Substitution.Gaussian Elimination. Gauss-Jordan Elimination.

Required Reading: Gill Ch. 3Further Reading: SB 10.1-4, 11.1, 8.1-3. 6.1, 7.1

• Lecture 7: Linear Algebra II (Wednesday, August 29; CGIS Knafel K354)

Topics: Matrix Methods. Rank. Existence of Solutions.Inverse of a Matrix. Linear Systems and Inverses.Determinants. Determinant Formula for an Inverse.Cramer’s Rule.

Required Reading: Gill Ch. 4Further Reading: SB 7.2-4, 8.4, 9.1-2, WG Appendix A

• Lectures 8 and 9: Optimization (Thursday, August 30; CGIS Knafel K354)

Optimization 1Topics: Quadratic Forms. Definiteness of Quadratic Forms.

Maxima and Minima in Rn. First Order Conditions.Second Order Conditions. Global Maxima and Minima.

Required Reading: Gill Ch. 4.9, 6.7Further Reading: SB 16.1-2, 17.1-4

Optimization 2Topics: Constrained Optimization. Equality Constraints.

Matrix Representation. Inequality Constraints:Kuhn-Tucker Conditions.Methods of Proof. Direct Proof.Proof by Contradiction. Proof by Induction.

Required Reading: Gill Ch. 6.8Further Reading: SB 18.1-6, A1.3

11

3 Lecture Notes

Lecture 1: Functions 12

3.1 Functions

Today’s Topics2: • R1 and Rn • Interval Notation for R1 • Neighborhoods: Intervals, Disks,and Balls • Open/Closed/Compact Sets • Introduction to Functions • Domain and Range/Image• Some General Types of Functions • Log, Ln, and e • Graphing Functions • Solving for Variables• Finding Roots • Summation and Product Notation • Limit of a Function • Continuity

3.1.1 R1 and Rn

• R1 is the set of all real numbers extending from −∞ to +∞ — i.e., the real number line.

• Rn is an n-dimensional space (often referred to as Euclidean space), where each of the n axesextends from −∞ to +∞.

• Examples:

1. R1 is a line.

2. R2 is a plane.

3. R3 is a 3-D space.

4. R4 could be 3-D plus time.

• Points in Rn are ordered n-tuples, where each element of the n-tuple represents the coordinatealong that dimension.

3.1.2 Interval Notation for R1

• Open interval: (a, b) ≡ {x ∈ R1 : a < x < b}

• Closed interval: [a, b] ≡ {x ∈ R1 : a ≤ x ≤ b}

• Half open, half closed: (a, b] ≡ {x ∈ R1 : a < x ≤ b}

3.1.3 Neighborhoods: Intervals, Disks, and Balls

• In many areas of math, we need a formal construct for what it means to be “near” a point c inRn. This is generally called the neighborhood of c and is represented by an open interval,disk, or ball, depending on whether Rn is of one, two, or more dimensions, respectively. Giventhe point c, these are defined as

1. ε-interval in R1: {x : |x− c| < ε}The open interval (c− ε, c+ ε).

2. ε-disk in R2: {x : ||x− c|| < ε}The open interior of the circle centered at c with radius ε.

3. ε-ball in Rn: {x : ||x− c|| < ε}The open interior of the sphere centered at c with radius ε.

2Much of the material and examples for this lecture are taken from Simon & Blume (1994) Mathematics forEconomists, Boyce & Diprima (1988) Calculus, and Protter & Morrey (1991) A First Course in Real Analysis


3.1.4 Sets, Sets, and More Sets

• Interior Point: The point x is an interior point of the set S if x is in S and if there issome ε-ball around x that contains only points in S. The interior of S is the collection ofall interior points in S. The interior can also be defined as the union of all open sets in S.Example: The interior of the set {(x, y) : x2 + y2 ≤ 4} is {(x, y) : x2 + y2 < 4} .

• Boundary Point: The point x is a boundary point of the set S if every ε-ball around xcontains both points that are in S and points that are outside S. The boundary is thecollection of all boundary points.Example: The boundary of {(x, y) : x2 + y2 ≤ 4} is {(x, y) : x2 + y2 = 4}.

• Open: A set S is open if for each point x in S, there exists an open ε-ball around x completelycontained in S.Example: {(x, y) : x2 + y2 < 4}

• Closed: A set S is closed if it contains all of its boundary points.Example: {(x, y) : x2 + y2 ≤ 4}

• Note: a set may be neither open nor closed.Example: {(x, y) : 2 < x2 + y2 ≤ 4}

• Complement: The complement of set S is everything outside of S.Example: The complement of {(x, y) : x2 + y2 ≤ 4} is {(x, y) : x2 + y2 > 4}.

• Closure: The closure of set S is the smallest closed set that contains S.Example: The closure of {(x, y) : x2 + y2 < 4} is {(x, y) : x2 + y2 ≤ 4}

• Bounded: A set S is bounded if it can be contained within an ε-ball.Examples: Bounded: any interval that doesn’t have ∞ or −∞ as endpoints; any disk in aplane with finite radius. Unbounded: the set of integers in R1; any ray.

• Compact: A set is compact if and only if it is both closed and bounded.

3.1.5 Introduction to Functions

• A function (in R1) is a rule or relationship or mapping or transformation that assigns oneand only one number in R1 to each number in R1.

• Mapping notation examples

1. Function of one variable: f : R1 → R1

2. Function of two variables: f : R2 → R1

• Examples:

1. f(x) = x+ 1For each x in R1, f(x) assigns the number x+ 1.

2. f(x, y) = x2 + y2

For each ordered pair (x, y) in R2, f(x, y) assigns the number x2 + y2.

• Often use one variable x as input and another y as output.Example: y = x+ 1

• Input variable also called independent variable. Output variable also called dependentvariable.


3.1.6 Domain and Range/Image

• Some functions are defined only on proper subsets of Rn.

• Domain: the set of numbers in X at which f(x) is defined.

• Range: elements of Y assigned by f(x) to elements of X, or

f(X) = {y : y = f(x), x ∈ X}

Most often used when talking about a function f : R1 → R1.

• Image: same as range, but more often used when talking about a function f : Rn → R1.

• Examples:

1. f(x) = 31+x2

Domain X =Range f(X) =

2. f(x) =

x+ 1, 1 ≤ x ≤ 20, x = 01− x, −2 ≤ x ≤ −1


3. f(x) = 1/x


4. f(x, y) = x2 + y2

Domain X,Y =Image f(X,Y ) =

3.1.7 Some General Types of Functions

• Monomials: f(x) = axk

a is the coefficient. k is the degree.Examples: y = x2, y = −1

2x3


• Polynomials: sum of monomials.Examples: y = −1

2x3 + x2, y = 3x+ 5

The degree of a polynomial is the highest degree of its monomial terms. Also, it’s often agood idea to write polynomials with terms in decreasing degree.

• Rational Functions: ratio of two polynomials.Examples: y = x

2 , y = x2+1x2−2x+1

• Exponential Functions: Example: y = 2x

• Trigonometric Functions: Examples: y = cos(x), y = 3 sin(4x)

• Linear: polynomial of degree 1.Example: y = mx+ b, where m is the slope and b is the y-intercept.

• Nonlinear: anything that isn’t constant or polynomial of degree 1.Examples: y = x2 + 2x+ 1, y = sin(x), y = ln(x), y = ex

3.1.8 Log, Ln, and e

• Relationship of logarithmic and exponential functions:

y = loga(x) ⇐⇒ ay = x

The log function can be thought of as an inverse for exponential functions. a is referred to asthe “base” of the logarithm.

• The two most common logarithms are base 10 and base e.

1. Base 10: y = log10(x) ⇐⇒ 10y = xThe base 10 logarithm is often simply written as “log(x)” with no base denoted.

2. Base e: y = loge(x) ⇐⇒ ey = xThe base e logarithm is referred to as the “natural” logarithm and is written as “ln(x)”.

• loga(ax) = x and aloga(x) = x

• Examples:

1. log(√

10) =


2. log(1) =

3. log(10) =

4. log(100) =

5. ln(1) =

6. ln(e) =

• Properties of exponential functions:

1. axay = ax+y

2. a−x = 1/ax

3. ax/ay = ax−y

4. (ax)y = axy

5. a0 = 1

• Properties of logarithmic functions (any base):

1. log(xy) = log(x) + log(y)

2. log(1/x) = − log(x)

3. log(x/y) = log(x)− log(y)

4. log(xy) = y log(x)

5. log(1) = 0

• Use the change of base formula to switch bases as necessary: logb(x) = loga(x)/ loga(b)

3.1.9 Graphing Functions

• Know your function. How? Graph your function.

• A picture is worth a thousand words.

1. Is the function increasing or decreasing? Over what part of the domain?

2. How “fast” does it increase or decrease?

3. Are there global or local maxima and minima? Where?

4. Are there inflection points?

5. Is the function continuous?

6. Is the function differentiable?

7. Does the function tend to some limit?

8. Other questions related to the substance of the problem at hand.


3.1.10 Solving for Variables

• Sometimes we’re given a function y = f(x) and we want to find how x varies as a function ofy.

• If f is a one-to-one mapping, then it has an inverse.

• Use algebra and relationships identified above to move x to the LHS of the equation and sothat the RHS is only a function of y.

• Examples: (we want to solve for x)

1. y = 3x+ 2 =⇒ y − 2 = 3x =⇒ x = 13(y − 2)

2. y = 3x− 4z + 2 =⇒ y + 4z − 2 = 3x =⇒ x = 13(y + 4z − 2)

3. y = ex + 4 =⇒ y − 4 = ex =⇒ ln(y − 4) = ln(ex) =⇒ x = ln(y − 4)

• Sometimes (often?) the inverse does not exist.

• Example: We’re given the function y = x2 (a parabola). Solving for x, we get x =√y and

x = −√y — for each value of y, there are two values of x.

3.1.11 Finding Roots

• Solving for variables is especially important when we want to find the roots of an equation:those values of variables that cause an equation to equal zero.

• Especially important in finding equilibria and in doing maximum likelihood estimation.

• Procedure: Given y = f(x), set y = 0. Solve for x.

• There may be multiple roots.

• For quadratic equations ax2 + bx+ c = 0, use x = −b±√b2−4ac

2a .

• Examples:

1. f(x) = 3x+ 2

2. f(x) = e−x − 10

3. f(x) = x2 + 3x− 4 = 0

3.1.12 Summation and Product Notation

• Summation:n∑i=1

xi = x1 + x2 + x3 + · · ·+ xn

1.n∑i=1

cxi = cn∑i=1

xi

2.n∑i=1

(xi + yi) =n∑i=1

xi +n∑i=1

yi


3.n∑i=1

c = nc

• Product:n∏i=1

xi = x1x2x3 · · ·xn

1.n∏i=1

cxi = cnn∏i=1

xi

2.n∏i=1

(xi + yi) = a mess

3.n∏i=1

c = cn

• Use logs to go between sum, product notation:

log(n∏i=1

cxi) =n∑i=1

log(cxi) = n log(c) +n∑i=1

log(xi)

3.1.13 The Limit of a Function

• We’re often interested in determining if a function f approaches some number L as its inde-pendent variable x moves to some number c (usually 0 or ±∞). If it does, we say that f(x)approaches L as x approaches c, or limx→c f(x) = L.

• Limit of a function. Let f be defined at each point in some open interval containingthe point c, although possibly not defined at c itself. Then lim

x→cf(x) = L if for any (small

positive) number ε, there exists a corresponding number δ > 0 such that if 0 < |x − c| < δ,then |f(x)− L| < ε.

• Examples:

1. limx→c

k =

2. limx→c

x =

3. limx→0|x| =

4. limx→0

(1 + 1

x2

)=

• Uniqueness: limx→c

f(x) = L and limx→c

f(x) = M =⇒ L = M

• Properties: Let f and g be functions with limx→c

f(x) = A and limx→c

g(x) = B.

1. limx→c

[f(x) + g(x)] = limx→c

f(x) + limx→c

g(x) = A+B


2. limx→c

αf(x) = α limx→c

f(x) = αA

3. limx→c

f(x)g(x) = [limx→c

f(x)][limx→c

g(x)] = AB

4. limx→c

f(x)g(x) =

limx→c

f(x)

limx→c

g(x) = AB , provided B 6= 0

• Examples:

1. limx→2

(2x− 3) =

2. limx→c

xn =

• Other types of limits:

1. Right-hand limit: limx→c+

f(x) = L, if c < x < c+ δ =⇒ |f(x)− L| < ε

Example: limx→0+

√x = 0

2. Left-hand limit: limx→c−

f(x) = L, if c− δ < x < c =⇒ |f(x)− L| < ε

3. Infinity: limx→∞

f(x) = L, if x > N =⇒ |f(x)− L| < ε

4. −Infinity: limx→−∞

f(x) = L, if x < −N =⇒ |f(x)− L| < ε

Example: limx→∞

1/x = limx→−∞

1/x = 0

3.1.14 Continuity

• Continuity: Suppose that the domain of the function f includes an open interval containingthe point c. Then f is continuous at c if lim

x→cf(x) exists and if lim

x→cf(x) = f(c). Further, f is

continuous on an open interval (a, b) if it is continuous at each point in the interval.

• Examples: Continuous functions.

f(x) =√x f(x) = ex

• Examples: Discontinuous functions.

f(x) = floor(x) f(x) = 1 + 1x2


• Properties:

1. If f and g are continuous at point c, then f + g, f − g, fg, |f |, and αf are continuous.f/g is continuous, provided g(c) 6= 0.

2. Boundedness: If f is continuous on the closed bounded interval [a, b], then there is anumber K such that |f(x)| ≤ K for each x in [a, b].

3. Max/Min: If f is continous on the closed bounded interval [a, b], then f has a maximumand a minimum on [a, b], possibly at the end points.

4. The image of a closed bounded interval [a, b] under a continuous function f is also aclosed bounded interval [m,M ].

Lecture 2: Calculus I 21

3.2 Calculus I

Today’s Topics3: • Sequences • Limit of a Sequence • Derivatives • Higher-Order Derivatives• Maxima and Minima • Composite Functions • The Chain Rule • Derivatives of Exp and Ln •L’Hospital’s Rule

3.2.1 Sequences

• A sequence {yn} = {y1, y2, y3, . . . , yn} is an ordered set of real numbers, where y1 is thefirst term in the sequence and yn is the nth term. Generally, a sequence is infinite, that isit extends to n =∞. We can also write the sequence as {yn}∞n=1.

• Examples:

1. {yn} ={

2− 1n2

}={

1, 74 ,

179 ,

3116 , . . .

}0 5 10 151

1.5

2

2. {yn} ={n2+1n

}={

2, 52 ,

103 , . . .

}0 5 10 15

10

20

3. {yn} ={

(−1)n(1− 1

n

)}= {0, 1

2 ,−23 ,

34 , . . .} 0 20 40

1

1

• Think of sequences like functions. Before, we had y = f(x) with x specified over some domain.Now we have {yn} = {f(n)} with n = 1, 2, 3, . . ..

• Three kinds of sequences:

1. Sequences like 1 above that converge to a limit.

2. Sequences like 2 above that increase without bound.

3. Sequences like 3 above that neither converge nor increase without bound — alternatingover the number line.

• Boundedness and monotonicity:

1. Bounded: if |yn| ≤ K for all n

2. Monotone Increasing: yn+1 > yn for all n

3. Monotone Decreasing: yn+1 < yn for all n

• Subsequence: choose an infinite collection of entries from {yn}, retaining their order.

3Much of the material and examples for this lecture are taken from Simon & Blume (1994) Mathematics forEconomists and from Boyce & Diprima (1988) Calculus


3.2.2 The Limit of a Sequence

• We’re often interested in whether a sequence converges to a limit. Limits of sequences areconceptually similar to the limits of functions addressed in the previous lecture.

• Definition: (Limit of a sequence). The sequence {yn} has the limit L, that is limn→∞

yn = L,

if for any ε > 0 there is an integer N (which depends on ε) with the property that |yn−L| < εfor each n > N . {yn} is said to converge to L. If the above does not hold, then {yn} diverges.

• Examples:

1. limn→∞

{2− 1

n2

}= 2

2. limn→∞

{4n

n!

}= 0

0 5 10 15

5

10

15

• Uniqueness: If {yn} converges, then the limit L is unique.

• Properties: Let limn→∞

yn = A and limn→∞

zn = B. Then

1. limn→∞

[αyn + βzn] = αA+ βB

2. limn→∞

ynzn = AB

3. limn→∞

ynzn

= AB , provided B 6= 0

• Finding the limit of a sequence in Rn is similar to that in R1.

• Limit of a sequence of vectors. The sequence of vectors {yn} has the limit L, that islimn→∞

yn = L, if for any ε there is an integer N where ||yn − L|| < ε for each n > N . The

sequence of vectors {yn} is said to converge to the vector L — and the distances between yn

and L converge to zero.

• Think of each coordinate of the vector yn as being part of its own sequence over n. Then asequence of vectors in Rn converges if and only if all n sequences of its components converge.Examples:

1. The sequence {yn} where yn =(

1n , 2−

1n2

)converges to (0, 2).

2. The sequence {yn} where yn =(

1n , (−1)n

)does not converge, since {(−1)n} does not

converge.

• Bolzano-Weierstrass Theorem: Any sequence contained in a compact (i.e., closed andbounded) subset of Rn contains a convergent subsequence.

Example: Take the sequence {yn} = {(−1)n}, which has twoaccumulating points, but no limit. However, it is both closedand bounded.

0 10 20

1

1


The subsequence of {yn} defined by taking n = 1, 3, 5, . . .does have a limit: −1.

0 10 20

1

1

As does the subsequence defined by taking n = 2, 4, 6, . . .,whose limit is 1.

0 10 20

1

1

3.2.3 Series

• The sum of the terms of a sequence is a series. As there are both finite and infinite sequences,there are finite and infinite series.

• The series associated with the sequence {yn} = {y1, y2, y3, . . . , yn} = {yn}∞n=1 is∑∞

n=1 yn.The nth partial sum Sn is defined as Sn =

∑nk=1 yk,the sum of the first n terms of the

sequence.

• A series∑yn converges if the sequence of partial sums {S1, S2, S3, ...} converges, that is has

a finite limit.

• A geometric series is a series that can be written as∑∞

n=0 rn, where r is called the ratio. A

geometric series converges to 11−r if |r| < 1 and diverges otherwise. For example,

∑∞n=0

12n = 2.

• Examples of other series:

1.∑∞

n=01n! = 1 + 1

1! + 12! + 1

3! + · · · = e

2.∑∞

n=11n = 1

1 + 12 + 1

3 + · · · =∞ (harmonic series)

3.2.4 Derivatives

• The derivative of f at x is its rate of change at x — i.e., how much f(x) changes with achange in x.– For a line, the derivative is the slope.– For a curve, the derivative is the tangent at x.

• Derivative: Let f be a function whose domain includes an open interval containing the pointx. The derivative of f at x is given by

f ′(x) = limh→0

f(x+ h)− f(x)

(x+ h)− x

= limh→0

f(x+ h)− f(x)

h


• If f ′(x) exists at a point x, then f is said to be differentiable at x. Similarly, if f ′(x) existsfor every point along an interval, then f is differentiable along that interval. For f to bedifferentiable at x, f must be both continuous and “smooth” at x. The process of calculatingf ′(x) is called differentiation.

• Notation for derivatives:

1. y′, f ′(x) (Prime or Lagrange Notation)

2. Dy, Df(x) (Operator Notation)

3. dydx , df

dx(x) (Leibniz’s Notation)

• Examples:

1. f(x) = c

2. f(x) = x

3. f(x) = x2

4. f(x) = x3

• Properties of derivatives: Suppose that f and g are differentiable at x and that α is a constant.Then the functions f ± g, αf , fg, and f/g (provided g(x) 6= 0) are also differentiable at x.Additionally,Power rule: [xk]′ = kxk−1

Sum rule: [f(x)± g(x)]′ = f ′(x)± g′(x)Constant rule: [αf(x)]′ = αf ′(x)Product rule: [f(x)g(x)]′ = f ′(x)g(x) + f(x)g′(x)

Quotient rule: [f(x)/g(x)]′ = f ′(x)g(x)−f(x)g′(x)[g(x)]2

, g(x) 6= 0

• Examples:

1. f(x) = 3x2 + 2x1/3

2. f(x) = (x3)(2x4)


3. f(x) = x2+1x2−1

3.2.5 Higher-Order Derivatives

• We can keep applying the differentiation process to functions that are themselves derivatives.The derivative of f ′(x) with respect to x, would then be

f ′′(x) = limh→0

f ′(x+ h)− f ′(x)

h

and so on. Similarly, the derivative of f ′′(x) would be denoted f ′′′(x).

• First derivative: f ′(x), y′, df(x)dx , dy

dx

Second derivative: f ′′(x), y′′, d2f(x)dx2

, d2ydx2

nth derivative: dnf(x)dxn , dny

dxn

• Example: f(x) = x3, f ′(x) = 3x2, f ′′(x) = 6x, f ′′′(x) = 6, f ′′′′(x) = 0

3.2.6 Applications of the Derivative: Maxima and Minima

• The first derivative f ′(x) identifies whether the function f(x) at the point x is

1. Increasing: f ′(x) > 0

2. Decreasing: f ′(x) < 0

3. Extremum/Saddle: f ′(x) = 0

• Examples:

1. f(x) = x2 + 2, f ′(x) = 2x

2. f(x) = x3 + 2, f ′(x) = 3x2

• The second derivative f ′′(x) identifies whether the function f(x) at the point x is

1. Concave down: f ′′(x) < 0

2. Concave up: f ′′(x) > 0

• Maximum (Minimum): x0 is a local maximum (minimum) if f(x0) > f(x) (f(x0) < f(x))for all x within some open interval containing x0. x0 is a global maximum (minimum) iff(x0) > f(x) (f(x0) < f(x)) for all x in the domain of f .


• Critical points: Given the function f defined over domain D, all of the following are criticalpoints:

1. Any interior point of D where f ′(x) = 0.

2. Any interior point of D where f ′(x) does not exist.

3. Any endpoint that is in D.

The maxima and minima will be a subset of the critical points.

• Combined, the first and second derivatives can tell us whether a point is a maximum orminimum of f(x).

Local Maximum: f ′(x) = 0 and f ′′(x) < 0Local Minimum: f ′(x) = 0 and f ′′(x) > 0Need more info: f ′(x) = 0 and f ′′(x) = 0

• Global Maxima and Minima. Sometimes no global max or min exists — e.g., f(x) notbounded above or below. However, three situations where we can fairly easily identify globalmax or min.

1. Functions with only one critical point. If x0 is a local maximum of f and it is theonly critical point, then it is a global maximum.

2. Globally concave up or concave down functions. If f ′′ is never zero, then there isat most one critical point, which is a global maximum if f ′′ < 0 and a global minimumif f ′′ > 0.

3. Functions over closed and bounded intervals must have both a global maximumand a global minimum.

• Examples:

1. f(x) = x2 + 2f ′(x) = 2xf ′′(x) = 2

2. f(x) = x3 + 2f ′(x) = 3x2

f ′′(x) = 6x

3. f(x) = |x2 − 1|, x ∈ [−2, 2]

f ′(x) =

{2x −2 < x < −1, 1 < x < 2−2x −1 < x < 1

f ′′(x) =

{2 −2 < x < −1, 1 < x < 2−2 −1 < x < 1


3.2.7 Composite Functions and the Chain Rule

• Composite functions are formed by substituting one function into another and are denotedby

(f ◦ g)(x) = f [g(x)]

To form f [g(x)], the range of g must be contained (at least in part) within the domain off . The domain of f ◦ g consists of all the points in the domain of g for which g(x) is in thedomain of f .

• Examples:

1. f(x) = lnx,g(x) = x2

(f ◦ g)(x) = lnx2,(g ◦ f)(x) = [lnx]2,Notice that f ◦ g and g ◦ f are not the same functions.

2. f(x) = 4 + sinx,g(x) =

√1− x2,

(f ◦ g)(x) = 4 + sin√

1− x2,(g ◦ f)(x) does not exist, since the range of f , [3, 5], has no points in common with thedomain of g.

• Chain Rule: Let y = f(z) and z = g(x). Then, y = (f ◦ g)(x) = f [g(x)] and the derivativeof y with respect to x is

d

dx{f [g(x)]} = f ′[g(x)]g′(x)

which can also be written asdy

dx=dy

dz

dz

dx(Note: the above does not imply that the dz’s cancel out, as in fractions. They are part ofthe derivative notation and have no separate existence.) The chain rule can be thought ofas the derivative of the “outside” times the derivative of the “inside,” remembering that thederivative of the outside function is evaluated at the value of the inside function.

• Generalized Power Rule: If y = [g(x)]k, then dy/dx = k[g(x)]k−1g′(x).

• Examples:

1. Find dy/dx for y = (3x2 + 5x− 7)6. Let f(z) = z6 and z = g(x) = 3x2 + 5x− 7. Then,y = f [g(x)] and

dy

dx=

=

=

2. Find dy/dx for y = sin(x3 +4x). (Note: the derivative of sinx is cosx.) Let f(z) = sin zand z = g(x) = x3 + 4x. Then, y = f [g(x)] and

dy

dx=

=

=


3.2.8 Derivatives of Exp and Ln

• Derivatives of Exp:

1. ddxαe

x = αex

2. dn

dxnαex = αex

3. ddxe

u(x) = eu(x)u′(x)

• Examples: Find dy/dx for

1. y = e−3x

2. y = ex2

3. y = esin 2x

• Derivatives of Ln:

1. ddx lnx = 1

x

2. ddx lnxk = d

dxk lnx = kx

3. ddx lnu(x) = u′(x)

u(x) (by the chain rule)

• Examples: Find dy/dx for

1. y = ln(x2 + 9)

2. y = ln(lnx)

3. y = (lnx)2

4. y = ln ex

• For any positive base b, ddxb

x = (ln b) (bx).

3.2.9 L’Hospital’s Rule

• In studying limits, we saw that limx→c

f(x)/g(x) =(

limx→c

f(x))/(

limx→c

g(x))

, provided that

limx→c

g(x) 6= 0, which will cause the limit to be unbounded.

• If both limx→c

f(x) = 0 and limx→c

g(x) = 0, then we get an indeterminate form of the type 0/0

as x→ c. However, we can still analyze such limits using L’Hospital’s rule.

• L’Hospital’s Rule: Suppose f and g are differentiable on a < x < b and that either

1. limx→a+

f(x) = 0 and limx→a+

g(x) = 0, or

2. limx→a+

f(x) = ±∞ and limx→a+

g(x) = ±∞

Suppose further that g′(x) is never zero on a < x < b and that

limx→a+

f ′(x)

g′(x)= L

then

limx→a+

f(x)

g(x)= L


• Examples: Use L’Hospital’s rule to find the following limits:

1. limx→0+

ln(1+x2)x3

2. limx→0+

e1/x

1/x

3. limx→2

x−2(x+6)1/3−2

Lecture 3: Calculus II 30

3.3 Calculus II: An Integral Topic

Today’s Topics4: • Partial Derivatives • The Indefinite Integral: The Antiderivative • TheDefinite Integral: The Area under the Curve • Integration by Substitutions • Integration by Parts

3.3.1 Differentiation in Several Variables

• Suppose we have a function f now of two (or more) variables and we want to determine therate of change relative to one of the variables. To do so, we would find it’s partial derivative,which is defined similar to the derivative of a function of one variable.

• Partial Derivative: Let f be a function of the variables (x1, . . . , xn). The partial derivativeof f with respect to xi is

∂f

∂xi(x1, . . . , xn) = lim

h→0

f(x1, . . . , xi + h, . . . , xn)− f(x1, . . . , xi, . . . , xn)

h

Only the ith variable changes — the others are treated as constants.

• We can take higher-order partial derivatives, like we did with functions of a single variable,except now we the higher-order partials can be with respect to multiple variables.

• Examples:

1. f(x, y) = x2 + y2

∂f∂x (x, y) =∂f∂y (x, y) =∂2f∂x2

(x, y) =∂2f∂x∂y (x, y) =

2. f(x, y) = x3y4 + ex − ln y∂f∂x (x, y) =∂f∂y (x, y) =∂2f∂x2

(x, y) =∂2f∂x∂y (x, y) =

3.3.2 The Indefinite Integral: The Antiderivative

• So far, we’ve been interested in finding the derivative g = f ′ of a function f . However,sometimes we’re interested in exactly the reverse: finding the function f for which g is itsderivative. We refer to f as the antiderivative of g.

• Let DF be the derivative of F . And let DF (x) be the derivative of F evaluated at x. Thenthe antiderivative is denoted by D−1 (i.e., the inverse derivative). If DF = f , then F = D−1f .

• Indefinite Integral: Equivalently, if F is the antiderivative of f , then F is also called theindefinite integral of f and written F (x) =

∫f(x)dx.

• Examples:

4Much of the material and examples for this lecture are taken from Simon & Blume (1994) Mathematics forEconomists and from Boyce & Diprima (1988) Calculus


1.∫

1x2dx =

2.∫

3e3xdx =

3.∫

(x2 − 4)dx =

• Notice from these examples that while there is only a single derivative for any function, thereare multiple antiderivatives: one for any arbitrary constant c. c just shifts the curve up ordown on the y-axis. If more info is present about the antiderivative — e.g., that it passesthrough a particular point — then we can solve for a specific value of c.

• Common rules of integration:

1.∫af(x)dx = a

∫f(x)dx

2.∫

[f(x) + g(x)]dx =∫f(x)dx+

∫g(x)dx

3.∫xndx = 1

n+1xn+1 + c

4.∫exdx = ex + c

5.∫

1xdx = lnx+ c

6.∫ef(x)f ′(x)dx = ef(x) + c

7.∫

[f(x)]nf ′(x)dx = 1n+1 [f(x)]n+1 + c

8.∫ f ′(x)

f(x) dx = ln f(x) + c

• Examples:

1.∫

3x2dx = 3∫x2dx =

2.∫

(2x+ 1)dx =

3.∫exee

xdx =

3.3.3 The Definite Integral: The Area under the Curve

• Riemann Sum: Suppose we want to determine the area A(R) of a region R defined by acurve f(x) and some interval a ≤ x ≤ b. One way to calculate the area would be to dividethe interval a ≤ x ≤ b into n subintervals of length ∆x and then approximate the region witha series of rectangles, where the base of each rectangle is ∆x and the height is f(x) at themidpoint of that interval. A(R) would then be approximated by the area of the union of therectangles, which is given by

S(f,∆x) =

n∑i=1

f(xi)∆x

and is called a Riemann sum.

• As we decrease the size of the subintervals ∆x, making the rectangles “thinner,” we wouldexpect our approximation of the area of the region to become closer to the true area. Thisgives the limiting process

A(R) = lim∆x→0

n∑i=1

f(xi)∆x


• Riemann Integral: If for a given function f the Riemann sum approaches a limit as ∆x→ 0,then that limit is called the Riemann integral of f from a to b. Formally,

b∫a

f(x)dx = lim∆x→0

n∑i=1

f(xi)∆x

• Definite Integral: We use the notationb∫af(x)dx to denote the definite integral of f from

a to b. In words, the definite integralb∫af(x)dx is the area under the “curve” f(x) from x = a

to x = b.

• First Fundamental Theorem of Calculus: Let the function f be bounded on [a, b] andcontinuous on (a, b). Then the function

F (x) =

x∫a

f(s)ds, a ≤ x ≤ b

has a derivative at each point in (a, b) and

F ′(x) = f(x), a < x < b

This last point shows that differentiation is the inverse of integration.

• Second Fundamental Theorem of Calculus: Let the function f be bounded on [a, b] andcontinuous on (a, b). Let F be any function that is continuous on [a, b] such that F ′(x) = f(x)on (a, b). Then

b∫a

f(x)dx = F (b)− F (a)

• Procedure to calculate a “simple” definite integralb∫af(x)dx:

1. Find the indefinite integral F (x).

2. Evaluate F (b)− F (a).

• Examples:

1.3∫1

3x2dx =

2.2∫−2

exeexdx =

• Properties of Definite Integrals:

1.a∫af(x)dx = 0 There is no area below a point.

2.b∫af(x)dx = −

a∫b

f(x)dx Reversing the limits changes the sign of the integral.


3.b∫a

[αf(x) + βg(x)]dx = αb∫af(x)dx+ β

b∫ag(x)dx

4.b∫af(x)dx+

c∫b

f(x)dx =c∫af(x)dx

• Examples:

1.1∫1

3x2dx =

2.4∫0

(2x+ 1)dx =

3.0∫−2

exeexdx+

2∫0

exeexdx =

3.3.4 Integration by Substitutions

• Sometimes the integrand doesn’t appear integrable using common rules and antiderivatives.A method one might try is integration by substitutions, which is related to the ChainRule.

• Suppose we want to find the indefinite integral∫g(x)dx and assume we can identify a function

u(x) such that g(x) = f [u(x)]u′(x). Let’s refer to the antiderivative of f as F . Then thechain rule tells us that d

dxF [u(x)] = f [u(x)]u′(x). So, F [u(x)] is the antiderivative of g. Wecan then write ∫

g(x)dx =

∫f [u(x)]u′(x)dx =

∫d

dxF [u(x)]dx = F [u(x)] + c

• Procedure to determine the indefinite integral∫g(x)dx by the method of substitions:

1. Identify some part of g(x) that might be simplified by substituting in a single variableu (which will then be a function of x).

2. Determine if g(x)dx can be reformulated in terms of u and du.

3. Solve the indefinite integral.

4. Substitute back in for x

• Substitution can also be used to calculate a definite integral. Using the same procedure asabove,

b∫a

g(x)dx =

d∫c

f(u)du = F (d)− F (c)

where c = u(a) and d = u(b).

• Examples:

1.∫x2√x+ 1dx

The problem here is the√x+ 1 term. However, if the integrand had

√x times some


polynomial, then we’d be in business. Let’s try u = x+ 1. Then x = u− 1 and dx = du.Substituting these into the above equation, we get∫

x2√x+ 1dx =

∫(u− 1)2√udu

=

∫(u2 − 2u+ 1)u1/2du

=

∫(u5/2 − 2u3/2 + u1/2)du

We can easily integrate this, since it’s just a polynomial. Doing so and substitutingu = x+ 1 back in, we get∫

x2√x+ 1dx = 2(x+ 1)3/2

[1

7(x+ 1)2 − 2

5(x+ 1) +

1

3

]+ c

2. For the above problem, we could have also used the substitution u =√x+ 1. Then

x = u2 − 1 and dx = 2udu. Substituting these in, we get∫x2√x+ 1dx =

∫(u2 − 1)2u2udu

which when expanded is again a polynomial and gives the same result as above.

3.1∫0

5e2x

(1+e2x)1/3dx

When an expression is raised to a power, it’s often helpful to use this expression asthe basis for a substitution. So, let u = 1 + e2x. Then du = 2e2xdx and we canset 5e2xdx = 5du/2. Additionally, u = 2 when x = 0 and u = 1 + e2 when x = 1.Substituting all of this in, we get

1∫0

5e2x

(1 + e2x)1/3dx =

5

2

1+e2∫2

du

u1/3

=5

2

1+e2∫2

u−1/3du

=15

4u2/3

∣∣∣∣1+e2

2

= 9.53

3.3.5 Integration by Parts

• Another useful integration technique is integration by parts, which is related to the ProductRule of differentiation. The product rule states that

d

dx(uv) = u

dv

dx+ v

du

dx

Integrating this and rearranging, we get∫udv

dxdx = uv −

∫vdu

dxdx


or ∫u(x)v′(x)dx = u(x)v(x)−

∫v(x)u′(x)dx

More frequently remembered as ∫udv = uv −

∫vdu

where du = u′(x)dx and dv = v′(x)dx.

• For definite integrals:b∫au dvdxdx = uv|ba −

b∫av dudxdx

• Our goal here is to find expressions for u and dv that, when substituted into the aboveequation, yield an expression that’s more easily evaluated.

• Examples:

1.∫xeaxdx

Let u = x and dv = eaxdx. Then du = dx and v = (1/a)eax. Substituting this into theintegration by parts formula, we obtain∫

xeaxdx = uv −∫vdu

= x

(1

aeax)−∫

1

aeaxdx

=1

axeax − 1

a2eax + c

2.∫xneaxdx


3.∫x3e−x

2dx

Lecture 4: Probability I 37

3.4 Probability I: Probability Theory

Today’s Topics5: • Counting rules • Sets • Probability • Conditional Probability and Bayes’Rule • Independence

3.4.1 Counting rules

• Fundamental Theorem of Counting: If there are k characteristics, each with nk alterna-tives, there are

∏ki=1 nk possible outcomes.

• We often need to count the number of ways to choose a subset from some set of possiblities.The number of outcomes depends on two characteristics of the process: does the order matterand is replacement allowed?

• If there are n objects and we select k < n of them, how many different outcomes are possible?

1. Ordered, with replacement: nk

2. Ordered, without replacement: n!(n−k)!

3. Unordered, with replacement: (n+k−1)!(n−1)!k! =

(n+ k − 1

k

)4. Unordered, without replacement: n!

(n−k)!k! =

(nk

)

3.4.2 Sets

• Set: A set is any well defined collection of elements. If x is an element of S, x ∈ S.

• Types of sets:

1. Countably finite: a set with a finite number of elements, which can be mapped ontopositive integers.S = {1, 2, 3, 4, 5, 6}

2. Countably infinite: a set with an infinite number of elements, which can still be mappedonto positive integers.S = {1, 1

2 ,13 , . . . }

3. Uncountably infinite: a set with an infinite number of elements, which cannot be mappedonto positive integers.S = {x : x ∈ [0, 1]}

4. Empty: a set with no elements.S = {∅}

• Set operations:

1. Union: The union of two sets A and B, A∪B, is the set containing all of the elementsin A or B.

5Much of the material and examples for this lecture are taken from Gill (2006) Essential Mathematics for Politicaland Social Research, Wackerly, Mendenhall, & Scheaffer (1996) Mathematical Statistics with Applications, Degroot(1985) Probability and Statistics, Morrow (1994) Game Theory for Political Scientists, King (1989) Unifying PoliticalMethodology, and Ross (1987) Introduction to Probability and Statistics for Scientists and Engineers.


2. Intersection: The intersection of sets A and B, A ∩ B, is the set containing all of theelements in both A and B.

3. Complement: If set A is a subset of S, then the complement of A, denoted AC , is theset containing all of the elements in S that are not in A.

• Properties of set operations:

1. Commutative: A ∪B = B ∪A, A ∩B = B ∩A2. Associative: A ∪ (B ∪ C) = (A ∪B) ∪ C, A ∩ (B ∩ C) = (A ∩B) ∩ C3. Distributive: A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)

4. de Morgan’s laws: (A ∪B)C = AC ∩BC , (A ∩B)C = AC ∪BC

• Disjointness: Sets are disjoint when they do not intersect, such that A ∩ B = {∅}. Acollection of sets is pairwise disjoint if, for all i 6= j, Ai ∩Aj = {∅}. A collection of sets form

a partition of set S if they are pairwise disjoint and they cover set S, such that⋃ki=1Ai = S.

3.4.3 Probability

• Probability: Many events or outcomes are random. In everyday speech, we say that we areuncertain about the outcome of random events. Probability is a formal model of uncertaintywhich provides a measure of uncertainty governed by a particular set of rules. A differentmodel of uncertainty would, of course, have a different set of rules and measures. Our focus onprobability is justified because it has proven to be a particularly useful model of uncertainty.

• Sample Space: A set or collection of all possible outcomes from some process. Outcomes inthe set can be discrete elements (countable) or points along a continuous interval (uncount-able).

• Examples:

1. Discrete: the numbers on a die, the number of possible wars that could occur each year,whether a vote cast is republican or democrat.

2. Continuous: GNP, arms spending, age.

• Probability Distribution: A probability function on a sample space S is a mapping Pr(A)from events in S to the real numbers that satisfies the following three axioms (due to Kol-mogorov).

• Axioms of Probability: Define the number Pr(A) correponding to each event A in thesample space S such that

1. Axiom: For any event A, Pr(A) ≥ 0.

2. Axiom: Pr(S) = 1

3. Axiom: For any sequence of disjoint events A1, A2, . . . (of which there may be infinitelymany),

Pr

(k⋃i=1

Ai

)=

k∑i=1

Pr(Ai)

• Basic Theorems of Probability: Using these three axioms, we can define all of the commontheorems of probability.


1. Pr(∅) = 0

2. Pr(AC) = 1− Pr(A)

3. For any event A, 0 ≤ Pr(A) ≤ 1.

4. If A ⊂ B, then Pr(A) ≤ Pr(B).

5. For any two events A and B, Pr(A ∪B) = Pr(A) + Pr(B)− Pr(A ∩B)

6. For any sequence of n events (which need not be disjoint) A1, A2, . . . , An,

Pr

(n⋃i=1

Ai

)≤

n∑i=1

Pr(Ai)

• Examples: Let’s assume we have an evenly-balanced, six-sided die. Then,

1. Sample space S =

2. Pr(1) = · · · = Pr(6) =

3. Pr(∅) = Pr(7) =

4. Pr ({1, 3, 5}) =

5. Pr({1, 2}

)= Pr ({3, 4, 5, 6}) =

6. Let B = S and A = {1, 2, 3, 4, 5} ⊂ B. Then Pr(A) = < Pr(B) = .

7. Let A = {1, 2, 3} and B = {2, 4, 6}. Then A ∪B = {1, 2, 3, 4, 6}, A ∩B = {2}, and

Pr(A ∪B) =

=

=

3.4.4 Conditional Probability and Bayes Law

• Conditional Probability: The conditional probability Pr(A|B) of an event A is the prob-ability of A, given that another event B has occurred. It is calculated as

Pr(A|B) =Pr(A ∩B)

Pr(B)

• Example: Assume A and B occur with the following frequencies:

A A

B nab nabB nab nab

and let nab + nab + nab + nab = N . Then

1. Pr(A) ≈2. Pr(B) ≈3. Pr(A ∩B) ≈

4. Pr(A|B) ≈

5. Pr(B|A) ≈

• Example: A six-sided die is rolled. What is the probability of a 1, given the outcome is an oddnumber?


• Multiplicative Law of Probability: The probability of the intersection of two events Aand B is

Pr(A ∩B) = Pr(A) Pr(B|A) = Pr(B) Pr(A|B)

which follows directly from the definition of conditional probability.

• Calculating the Probability of an Event Using the Event-Composition Method:The event-composition method for calculating the probability of an event A involves express-ing A as a composition involving the unions and/or intersections of other events. Then usethe laws of probability to to find Pr(A). The steps used in the event-composition method are:

1. Define the experiment.

2. Identify the general nature of the sample points.

3. Write an equation expressing the event of interest A as a composition of two or moreevents, using unions, intersections, and/or complements.

4. Apply the additive and multiplicative laws of probability to the compositions obtainedin step 3 to find Pr(A).

• Law of Total Probability: Let S be the sample space of some experiment and let thedisjoint k events B1, . . . , Bk partition S. If A is some other event in S, then the eventsAB1, AB2, . . . , ABk will form a partition of A and we can write A as

A = (AB1) ∪ · · · ∪ (ABk)

Since the k events are disjoint,

Pr(A) =k∑i=1

Pr(ABi)

=

k∑i=1

Pr(Bi) Pr(A|Bi)

Sometimes it is easier to calculate the conditional probabilities and sum them than it is tocalculate Pr(A) directly.

• Bayes Rule: Assume that events B1, . . . , Bk form a partition of the space S. Then

Pr(Bj |A) =Pr(ABj)

Pr(A)=

Pr(Bj) Pr(A|Bj)k∑i=1

Pr(Bi) Pr(A|Bi)

If there are only two states of B, then this is just

Pr(B1|A) =Pr(B1) Pr(A|B1)

Pr(B1) Pr(A|B1) + Pr(B2) Pr(A|B2)

• Bayes rule determines the posterior probability of a state or type Pr(Bj |A) by calculating theprobability Pr(ABj) that both the event A and the state Bj will occur and dividing it by theprobability that the event will occur regardless of the state (by summing across all Bi).

• Often Bayes’ rule is used when one wants to calculate a posterior probability about the “state”or type of an object, given that some event has occurred. The states could be something likeNormal/Defective, Normal/Diseased, Democrat/Republican, etc. The event on which oneconditions could be something like a sampling from a batch of components, a test for adisease, or a question about a policy position.


• Prior and Posterior Probabilities: In the above, Pr(B1) is often called the prior proba-bility, since it’s the probability of B1 before anything else is known. Pr(B1|A) is called theposterior probability, since it’s the probability after other information is taken into account.

• Examples:

1. A test for cancer correctly detects it 90% of the time, but incorrectly identifies a personas having cancer 10% of the time. If 10% of all people have cancer at any given time,what is the probability that a person who tests positive actually has cancer?

2. In Boston, 30% of the people are conservatives, 50% are liberals, and 20% are indepen-dents. In the last election, 65% of conservatives, 82% of liberals, and 50% of independentsvoted. If a person in Boston is selected at random and we learn that s/he did not votelast election, what is the probability s/he is a liberal?

3.4.5 Independence

• Independence: If the occurrence or nonoccurrence of either events A and B have no effecton the occurrence or nonoccurrence of the other, then A and B are independent. If A and Bare independent, then

1. Pr(A|B) = Pr(A)

2. Pr(B|A) = Pr(B)

3. Pr(A ∩B) = Pr(A) Pr(B)

• Pairwise independence: A set of more than two events A1, A2, . . . , Ak is pairwise indepen-dent if Pr(Ai ∩ Aj) = Pr(Ai) Pr(Aj), ∀i 6= j. Note that this does not necessarily imply that

Pr(⋂ki=1Ai) =

∏Ki=1 Pr(Ai).

• Conditional independence: If the occurrence of A or B conveys no information about theoccurrence of the other, once you know the occurrence of a third event C, then A and B areconditionally independent (conditional on C):

1. Pr(A|B ∩ C) = Pr(A|C)

2. Pr(B|A ∩ C) = Pr(B|C)

3. Pr(A ∩B|C) = Pr(A|C) Pr(B|C)

Lecture 5: Probability II 42

3.5 Probability II: Random Variables

Today’s Topics6:

• Levels of Measurement • Discrete Distributions • Continuous Distributions • Joint Distributions• Expectation • Special Discrete Distributions • Special Continuous Distributions • SummarizingObserved Data

3.5.1 Levels of Measurement

• In empirical research, data can be classified along several dimensions. We have alreadydistinguished between discrete (countable) and continuous (uncountable) data. We can alsolook at the precision with which the underlying quantities are measured.

• Nominal: Discrete data are nominal if there is no way to put the categories representedby the data into a meaningful order. Typically, this kind of data represents names (hence‘nominal’) or attributes, like Republican or Democrat.

• Ordinal: Discrete data are ordinal if there is a logical order to the categories representedby the data, but there is no common scale for differences between adjacent categories. Partyidentification is often measured as ordinal data.

• Interval: Discrete or continuous data are interval if there is an order to the values and thereis a common scale, so that differences between two values have substantive meanings. Datesare an example of interval data.

• Ratio: Discrete or continuous data are ratio if the data have the characteristics of intervaldata and zero is a meaningful quantity. This allows us to consider the ratio of two values aswell as difference between them. Quantities measured in dollars, such as per capita GDP, areratio data.

3.5.2 Discrete Distributions

• Random Variable: A random variable is a real-valued function defined on the sample spaceS; it assigns a real number to every outcome s ∈ S.

• Discrete Random Variable: Y is a discrete random variable if it can assume only a finiteor countably infinite number of distinct values.

• Examples: number of wars per year, heads or tails, voting Republican or Democrat, numberon a rolled die.

• Probability Mass Function: For a discrete random variable Y , the probability mass func-tion (pmf)7 p(y) = Pr(Y = y) assigns probabilities to a countable number of distinct y valuessuch that

1. 0 ≤ p(y) ≤ 1

2.∑yp(y) = 1

6Much of the material and examples for this lecture are taken from Gill (2006) Essential Mathematics for Politicaland Social Research, Wackerly, Mendenhall, & Scheaffer (1996) Mathematical Statistics with Applications, Degroot(1985) Probability and Statistics, Morrow (1994) Game Theory for Political Scientists, and Ross (1987) Introductionto Probability and Statistics for Scientists and Engineers.

7Also referred to simply as the “probability distribution.”


• Example: For a fair six-sided die, there is an equal probability of rollingany number. Since there are six sides, the probability mass function is thenp(y) = 1/6 for y = 1, . . . , 6. Each p(y) is between 0 and 1. And, the sum ofthe p(y)’s is 1.

1 2 3 4 5 60

0.05

0.1

0.15

0.2

• Cumulative Distribution: The cumulative distribution F (y) or Pr(Y ≤ y) is the proba-bility that Y is less than or equal to some value y, or

Pr(Y ≤ y) =∑i≤y

p(i)

. The CDF must satisfy these properties:

1. F (y) is non-decreasing in y.

2. limy→−∞ F (y) = 0 and limy→∞ F (y) = 1

3. F (y) is right-continuous.

• Example: For a fair die, Pr(Y ≤ 1) = , Pr(Y ≤ 3) = , andPr(Y ≤ 6) = .

1 2 3 4 5 60

0.25

0.5

0.75

1

3.5.3 Continuous Distributions

• Continuous Random Variable: Y is a continuous random variable if there exists a non-negative function f(y) defined for all real y ∈ (−∞,∞), such that for any interval A,

Pr(Y ∈ A) =

∫A

f(y)dy

• Examples: age, income, GNP, temperature

• Probability Density Function: The function f above is called the probability densityfunction (pdf) of Y and must satisfy

1. f(y) ≥ 0

2.∞∫−∞

f(y)dy = 1

Note also that Pr(Y = y) = 0 — i.e., the probability of any point y is zero.

• f(y) = 1, 0 ≤ y ≤ 1

0 0.5 10

0.5

1

1.5


• Cumulative Distribution: Because the probability that a continuous random variable willassume any particular value is zero, we can only make statements about the probability of acontinuous random variable being within an interval. The cumulative distribution gives theprobability that Y lies on the interval (−∞, y) and is defined as

F (y) = Pr(Y ≤ y) =

y∫−∞

f(s)ds

Note that F (y) has similar properties with continuous distributions as it does with dis-crete - non-decreasing, continuous (not just right-continuous), and limy→−∞ F (y) = 0 andlimy→∞ F (y) = 1.

Similarly, we can also make probability statements about Y falling in an interval a ≤ y ≤ b.

Pr(a ≤ y ≤ b) =

b∫a

f(y)dy

• Example: f(y) = 1, 0 < y < 1. Find F (y) and Pr(.5 < y < .75).

F (y) =

Pr(.5 < y < .75) =

0 0.5 10

0.5

1

1.5

• F ′(y) = dF (y)dy = f(y)

3.5.4 Joint Distributions

• Often, we are interested in two or more random variables defined on the same sample space.The distribution of these variables is called a joint distribution. Joint distributions can bemade up of any combination of discrete and continuous random variables.

• Example: Suppose we are interested in the outcomes of flipping a coin and rolling a 6-sideddie at the same time. The sample space for this process contains 12 elements:

{h1, h2, h3, h4, h5, h6, t1, t2, t3, t4, t5, t6}

We can define two random variables X and Y such that X = 1 if heads and X = 0 iftails, while Y equals the number on the die. We can then make statements about the jointdistribution of X and Y .

• Joint discrete random variables: If both X and Y are discrete, their joint probabilitymass function assigns probabilities to each pair of outcomes

p(x, y) = Pr(X = x, Y = y)


Again, p(x, y) ∈ [0, 1] and∑∑

p(x, y) = 1.

If we are interested in the marginal probability of one of the two variables (ignoring infor-mation about the other variable), we can obtain the marginal pmf by summing across thevariable that we don’t care about:

pX(x) =∑i

p(x, yi)

We can also calculate the conditional pmf for one variable, holding the other variable fixed.Recalling from the previous lecture that Pr(A|B) = Pr(A∩B)

Pr(B) , we can write the conditionalpmf as

pY |X(y|x) =p(x, y)

pX(x), pX(x) > 0

• Joint continuous random variables: If both X and Y are continuous, their joint proba-bility density function defines their distribution:

Pr((X,Y ) ∈ A) =

∫∫Af(x, y)dxdy

Likewise, f(x, y) ≥ 0 and∫∞−∞

∫∞−∞ f(x, y)dxdy = 1.

Instead of summing, we obtain the marginal probability density function by integrating outone of the variables:

fX(x) =

∫ ∞−∞

f(x, y)dy

Finally, we can write the conditional pdf as

fY |X(y|x) =f(x, y)

fX(x), fX(x) > 0

3.5.5 Expectation

• We often want to summarize some characteristics of the distribution of a random variable.The most important summary is the expectation (or expected value, or mean), in which thepossible values of a random variable are weighted by their probabilities.

• Expectation of Discrete Random Variable: The expected value of a discrete randomvariable Y is

E(Y ) =∑y

yp(y)

In words, it is the weighted average of the possible values y can take on, weighted by theprobability that y occurs. It is not necessarily the number we would expect Y to take on, butthe average value of Y after a large number of repetitions of an experiment.

• Example: For a fair die,

E(Y ) =


• Expectation of a Continuous Random Variable: The expected value of a continuousrandom variable is similar in concept to that of the discrete random variable, except thatinstead of summing using probabilities as weights, we integrate using the density to weight.Hence, the expected value of the continuous variable Y is defined by

E(Y ) =

∞∫−∞

yf(y)dy

• Example: Find E(Y ) for f(y) = 11.5 , 0 < y < 1.5.

E(Y ) =

• Expected Value of a Function:

1. Discrete: E[g(Y )] =∑yg(y)p(y)

2. Continuous: E[g(Y )] =∞∫−∞

g(y)f(y)dy

• Other Properties of Expected Values:

1. E(c) = c

2. E[E[Y ]] = E[Y ] (because the expected value of a random variable is a constant)

3. E[cg(Y )] = cE[g(Y )]

4. E[g(Y1) + · · ·+ g(Yn)] = E[g(Y1)] + · · ·+ E[g(Yn)]

• Variance: We can also look at other summaries of the distribution, which build on the ideaof taking expectations. Variance tells us about the “spread” of the distribution; it is theexpected value of the squared deviations from the mean of the distribution. The standarddeviation is simply the square root of the variance.

1. Variance: σ2 = Var(Y ) = E[(Y − E(Y ))2] = E(Y 2)− [E(Y )]2

2. Standard Deviation: σ =√

Var(Y )

• Covariance and Correlation: The covariance measures the degree to which two randomvariables vary together; if the covariance is positive, X tends to be larger than its mean whenY is larger than its mean. The covariance of a variable with itself is the variance of thatvariable.

Cov(X,Y ) = E[(X − E(X))(Y − E(Y ))] = E(XY )− E(X)E(Y )

The correlation coefficient is the covariance divided by the standard deviations of X and Y.It is a unitless measure and always takes on values in the interval [−1, 1].

ρ =Cov(X,Y )√

Var(X)Var(Y )=

Cov(X,Y )

SD(X)SD(Y )


• Conditional Expectation: With joint distributions, we are often interested in the expectedvalue of a variable Y if we could hold the other variable X fixed. This is the conditionalexpectation of Y given X = x:

1. Y discrete: E(Y |X = x) =∑

y ypY |X(y|x)

2. Y continuous: E(Y |X = x) =∫y yfY |X(y|x)dy

The conditional expectation is often used for prediction when one knows the value of Xbut not Y ; the realized value of X contains information about the unknown Y so long asE(Y |X = x) 6= E(Y )∀x.

3.5.6 Special Discrete Distributions

• Binomial Distribution: Y is distributed binomial if it represents the number of “successes”observed in n independent, identical “trials,” where the probability of success in any trial isp and the probability of failure is q = 1− p.

For any particular sequence of y successes and n − y failures, the probability of obtainingthat sequence is pyqn−y (by the multiplicative law and independence). However, there are(ny

)= n!

(n−y)!y! ways of obtaining a sequence with y successes and n − y failures. So thebinomial distribution is given by

p(y) =

(n

y

)pyqn−y, y = 0, 1, 2, . . . , n

with mean µ = E(Y ) = np and variance σ2 = V (Y ) = npq.

• Example: Republicans vote for Democrat-sponsored bills 2% of the time. What is the proba-bility that out of 10 Republicans questioned, half voted for a particular Democrat-sponsoredbill? What is the mean number of Republicans voting for Democrat-sponsored bills? Thevariance?

1. p(5) =

0 2 4 6 8 100

0.25

0.5

0.75

1

2. E(Y ) =

3. V (Y ) =

• Poisson Distribution: A random variable Y has a Poisson distribution if

p(y) =λy

y!e−λ, y = 0, 1, 2, . . . , λ > 0

The Poisson has the unusual feature that its expectation equals its variance: E(Y ) = V (Y ) =λ. The Poisson distribution is often used to model event counts: counts of the number ofevents that occur during some unit of time. λ is often called the “arrival rate.”

• Example: Border disputes occur between two countries at a rate of 2per month. What is the probability of 0, 2, and less than 5 disputesoccurring in a month? 0 2 4 6 8 100

0.25

0.5

0.75

1


1. p(0) =

2. p(2) =

3. Pr(Y < 5) =

3.5.7 Special Continuous Distributions

• Uniform Distribution: A random variable Y has a continuous uniform distribution on theinterval (α, β) if its density is given by

f(y) =1

β − α, α ≤ y ≤ β

The mean and variance of Y are E(Y ) = α+β2 and V (Y ) = (β−α)2

12 .

• Example: Y uniformly distributed over (1, 3).

1 1.4 1.8 2.2 2.6 30

0.25

0.5

0.75

1

• Normal Distribution: A random variable Y is normally distributed with mean E(Y ) = µand variance V (Y ) = σ2 if its density is

f(y) =1√2πσ

e−(y−µ)2

2σ2

• Example: Y normally distributed with mean µ = 0 and varianceσ2 = .1

2 1 0 1 20

0.375

0.75

1.125

1.5

3.5.8 Summarizing Observed Data

• So far, we’ve talked about distributions in a theoretical sense, looking at different properties ofrandom variables. We don’t observe random variables; we observe realizations of the randomvariable.

• Central tendency: The central tendency describes the location of the “middle” of theobserved data along some scale. There are several measures of central tendency.

1. Sample mean: This is the most common measure of central tendency, calculated bysumming across the observations and dividing by the number of observations.

x =1

n

n∑i=1

xi

The sample mean is an estimate of the expected value of a distribution.


2. Sample median: The median is the value of the “middle” observation. It is obtainedby ordering n data points from smallest to largest and taking the value of the n+ 1/2thobservation (if n is odd) or the mean of the n/2th and (n+ 1)/2th observations (if n iseven).

3. Sample mode: The mode is the most frequently observed value in the data:

mx = Xi : n(Xi) > n(Xj)∀j 6= i

When the data are realizations of a continuous random variable, it often makes senseto group the data into bins, either by rounding or some other process, in order to get areasonable estimate of the mode.

4. Exercise: Calculate the sample mean, median, and mode for the following two variables,X and Y.

X 6 3 7 5 5 5 6 4 7 2

Y 1 2 1 2 2 1 2 0 2 0

• Dispersion: We also typically want to know how spread out the data are relative to thecenter of the observed distribution. Again, there are several ways to measure dispersion.

1. Sample variance: The sample variance is the sum of the squared deviations from thesample mean, divided by the number of observations minus 1.

Var(X) =1

n− 1

n∑i=1

(xi − x)2

Again, this is an estimate of the variance of a random variable; we divide by n−1 insteadof n in order to get an unbiased estimate.

2. Standard deviation: The sample standard deviation is the square root of the samplevariance.

SD(X) =√

Var(X) =

√√√√ 1

n− 1

n∑i=1

(xi − x)2

3. Median absolute deviation (MAD): The MAD is a different measure of dispersion,based on deviations from the median rather than deviations from the mean.

MAD(X) = median(|xi −median(x)|)

4. Exercise: Calculate the sample variance, standard deviation, and MAD for the followingtwo variables, X and Y.

X 6 3 7 5 5 5 6 4 7 2

Y 1 2 1 2 2 1 2 0 2 0

• Covariance and Correlation: Both of these quantities measure the degree to which twovariables vary together, and are estimates of the covariance and correlation of two randomvariables as defined above.

1. Sample covariance: Cov(X,Y ) = 1n−1

∑ni=1(xi − x)(yi − y)

2. Sample correlation: r = Cov(X,Y )√Var(X)Var(Y )


3. Exercise: Calculate the sample covariance and correlation coefficient for the followingtwo variables, X and Y.

X 6 3 7 5 5 5 6 4 7 2

Y 1 2 1 2 2 1 2 0 2 0

Lecture 6: Linear Algebra I 51

3.6 Linear Algebra I

Today’s Topics8: • Working with Vectors • Linear Independence • Matrix Algebra • SquareMatrices • Systems of Linear Equations • Method of Substitution • Gaussian Elimination • Gauss-Jordan Elimination

3.6.1 Working with Vectors

• Vector: A vector in n-space is an ordered list of n numbers. These numbers can be repre-sented as either a row vector or a column vector:

v =(v1 v2 . . . vn

),v =

v1

v2...vn

We can also think of a vector as defining a point in n-dimensional space, usually Rn; eachelement of the vector defines the coordinate of the point in a particular direction.

• Vector Addition: Vector addition is defined for two vectors u and v iff they have the samenumber of elements:

u + v =(u1 + v1 u2 + v2 · · · uk + vn

)• Scalar Multiplication: The product of a scalar c and vector v is:

cv =(cv1 cv2 . . . cvn

)• Vector Inner Product: The inner product (also called the dot product or scalar product)

of two vectors u and v is again defined iff they have the same number of elements

u · v = u1v1 + u2v2 + · · ·+ unvn =

n∑i=1

uivi

If u · v = 0, the two vectors are orthogonal (or perpendicular).

• Vector Norm: The norm of a vector is a measure of its length. There are many differentnorms, the most common of which is the Euclidean norm (which corresponds to our usualconception of distance in three-dimensional space):

||v|| =√

v · v =√v1v1 + v2v2 + · · ·+ vnvn

3.6.2 Linear Dependence

• Linear combinations: The vector u is a linear combination of the vectors v1,v2, · · · ,vk if

u = c1v1 + c2v2 + · · ·+ ckvk

8Much of the material and examples for this lecture are taken from Gill (2006) Essential Mathematics for Politicaland Social Scientists, Simon & Blume (1994) Mathematics for Economists and Kolman (1993) Introductory LinearAlgebra with Applications.


• Linear independence: A set of vectors v1,v2, · · · ,vk is linearly independent if the onlysolution to the equation

c1v1 + c2v2 + · · ·+ ckvk = 0

is c1 = c2 = · · · = ck = 0. If another solution exists, the set of vectors is linearly dependent.

• A set S of vectors is linearly dependent iff at least one of the vectors in S can be written asa linear combination of the other vectors in S.

• Linear independence is only defined for sets of vectors with the same number of elements;any linearly independent set of vectors in n-space contains at most n vectors.

• Exercises: Are the following sets of vectors linearly independent?

1.

v1 =

100

,v2 =

101

,v3 =

111

2.

v1 =

32−1

,v2 =

−224

,v3 =

231

3.6.3 Matrix Algebra

• Matrix: A matrix is an array of mn real numbers arranged in m rows by n columns.

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

Note that you can think of vectors as special cases of matrices; a column vector of length kis a k× 1 matrix, while a row vector of the same length is a 1× k matrix. You can also thinkof larger matrices as being made up of a collection of row or column vectors. For example,

A =(a1 a2 · · · am

)• Matrix Addition: Let A and B be two m× n matrices. Then

A + B =

a11 + b11 a12 + b12 · · · a1n + b1na21 + b21 a22 + b22 · · · a2n + b2n

......

. . ....

am1 + bm1 am2 + bm2 · · · amn + bmn

Note that matrices A and B must be the same size, in which case they are conformable foraddition.


• Example:

A =

(1 2 34 5 6

), B =

(1 2 12 1 2

)A + B =

• Scalar Multiplication: Given the scalar s, the scalar multiplication of sA is

sA = s

a11 a12 · · · a1n

a21 a22 · · · a2n...

.... . .

...am1 am2 · · · amn

=

sa11 sa12 · · · sa1n

sa21 sa22 · · · sa2n...

.... . .

...sam1 sam2 · · · samn

• Example:

s = 2, A =

(1 2 34 5 6

)sA =

• Matrix Multiplication: If A is an m×k matrix and B is a k×n matrix, then their productC = AB is the m× n matrix where

cij = ai1b1j + ai2b2j + · · ·+ aikbkj

• Examples:

1.

a bc de f

(A BC D

)=

2.

(1 2 −13 1 4

)−2 54 −32 1

=

Note that the number of columns of the first matrix must equal the number of rows of thesecond matrix, in which case they are conformable for multiplication. The sizes of thematrices (including the resulting product) must be

(m× k)(k × n) = (m× n)

• Laws of Matrix Algebra:

1. Associative: (A + B) + C = A + (B + C)(AB)C = A(BC)

2. Commutative: A + B = B + A

3. Distributive: A(B + C) = AB + AC(A + B)C = AC + BC

• Commutative law for multiplication does not hold – the order of multiplication matters:

AB 6= BA


• Example:

A =

(1 2−1 3

), B =

(2 10 1

)AB =

(2 3−2 2

), BA =

(1 7−1 3

)• Transpose: The transpose of the m×n matrix A is the n×m matrix AT (sometimes written

A′) obtained by interchanging the rows and columns of A.

• Examples:

1. A =

(4 −2 30 5 −1

), AT =

4 0−2 53 −1

2. B =

2−13

, BT =(2 −1 3

)• The following rules apply for transposed matrices:

1. (A + B)T = AT + BT

2. (AT )T = A

3. (sA)T = sAT

4. (AB)T = BTAT

• Example of (AB)T = BTAT :

A =

(1 3 22 −1 3

), B =

0 12 23 −1

(AB)T =

(1 3 22 −1 3

)0 12 23 −1

T =

(12 75 −3

)

BTAT =

(0 2 31 2 −1

)1 23 −12 3

=

(12 75 −3

)

3.6.4 Square Matrices

• Square matrices have the same number of rows and columns; a k×k square matrix is referredto as a matrix of order k.

• The diagonal of a square matrix is the vector of matrix elements that have the same sub-scripts. If Ais a square matrix of order k, then its diagonal is [a11, a22, . . . , akk]

′.

• Trace: The trace of a square matrix Ais the sum of the diagonal elements:

tr(A) = a11 + a22 + · · ·+ akk

Properties of the trace operator: If Aand Bare square matrices of order k, then


1. tr(A + B) = tr(A) + tr(B)

2. tr(AT ) = tr(A)

3. tr(sA) = str(A)

4. tr(AB) = tr(BA)

• There are several important types of square matrix:

1. Symmetric Matrix: A matrix Ais symmetric if A = A′; this implies that aij = ajifor all i and j.

Examples:

A =

(1 22 1

)= A′, B =

4 2 −12 1 3−1 3 1

= B′

2. Diagonal Matrix: A matrix Ais diagonal if all of its non-diagonal entries are zero;formally, if aij = 0 for all i 6= j

Examples:

A =

(1 00 2

), B =

4 0 00 1 00 0 1

3. Triangular Matrix: A matrix is triangular one of two cases. If all entries below the

diagonal are zero (aij = 0 for all i > j), it is upper triangular. Conversely, if all entriesabove the diagonal are zero (aij = 0 for all i < j), it is lower triangular.

Examples:

ALT =

1 0 04 2 0−3 2 5

, AUT =

1 7 −40 3 90 0 −3

4. Identity Matrix: The n× n identity matrix In is the matrix whose diagonal elements

are 1 and all off-diagonal elements are 0. Examples:

I2 =

(1 00 1

), I3 =

1 0 00 1 00 0 1

3.6.5 Linear Equations

• Linear Equation: a1x1 + a2x2 + · · ·+ anxn = bai are parameters or coefficients. xi are variables or unknowns.

• Linear because only one variable per term and degree is at most 1.

1. R2: line x2 = ba2− a1

a2x1

2. R3: plane x3 = ba3− a1

a3x1 − a2

a3x2

3. Rn: hyperplane


3.6.6 Systems of Linear Equations

• Often interested in solving linear systems like

x − 3y = −32x + y = 8

• More generally, we might have a system of m equations in n unknowns

a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2

......

...am1x1 + am2x2 + · · · + amnxn = bm

• A solution to a linear system ofm equations in n unknowns is a set of n numbers x1, x2, · · · , xnthat satisfy each of the m equations.

1. R2: intersection of the lines.

2. R3: intersection of the planes.

3. Rn: intersection of the hyperplanes.

• Example: x = 3 and y = 2 is the solution to the above 2× 2 linear system. Notice from thegraph that the two lines intersect at (3, 2).

• Does a linear system have one, no, or multiple solutions?For a system of 2 equations in 2 unknowns (i.e., two lines):

1. One solution: The lines intersect at exactly one point.

2. No solution: The lines are parallel.

3. Infinite solutions: The lines coincide.

• Methods to solve linear systems:

1. Substitution

2. Elimination of variables

3. Matrix methods

3.6.7 Method of Substitution

• Procedure:

1. Solve one equation for one variable, say x1, in terms of the other variables in the equation.

2. Substitute the expression for x1 into the other m−1 equations, resulting in a new systemof m− 1 equations in n− 1 unknowns.

3. Repeat steps 1 and 2 until one equation in one unknown, say xn. We now have a valuefor xn.

4. Backward substitution: Substitute xn into the previous equation (which should be afunction of only xn). Repeat this, using the successive expressions of each variable interms of the other variables, to find the values of all xi’s.


• Exercises:

1. Using substitution, solvex − 3y = −32x + y = 8

2. Using substitution, solvex + 2y + 3z = 62x − 3y + 2z = 143x + y − z = −2

3.6.8 Elementary Equation Operations

• Elementary equation operations are used to transform the equations of a linear system, whilemaintaining an equivalent linear system — equivalent in the sense that the same values ofxj solve both the original and transformed systems. These operations are

1. Interchanging two equations,

2. Multiplying two sides of an equation by a constant, and

3. Adding equations to each other

• Interchanging Equations: Given the linear system

a11x1 + a12x2 = b1a21x1 + a22x2 = b2

we can interchange its equations, resulting in the equivalent linear system

a21x1 + a22x2 = b2a11x1 + a12x2 = b1

• Multiplying by a Constant: Suppose we had the following equation:

2 = 2

If we multiply each side of the equation by some number, say 4, we still have an equality:

2(4) = 2(4) =⇒ 8 = 8

More generally, we can multiply both sides of any equation by a constant and maintain anequivalent equation. For example, the following two equations are equivalent:

a11x1 + a12x2 = b1

ca11x1 + ca12x2 = cb1

• Adding Equations: Suppose we had the following two very simple equations:

3 = 3

7 = 7

If we add these two equations to each other, we get

7 + 3 = 7 + 3 =⇒ 10 = 10


Suppose we now have

a = b

c = d


a+ c = b+ d

Extending this, suppose we had the linear system

a11x1 + a12x2 = b1a21x1 + a22x2 = b2


(a11 + a21)x1 + (a12 + a22)x2 = b1 + b2

3.6.9 Method of Gaussian Elimination

• Gaussian Elimination is a method by which we start with some linear system ofm equationsin n unknowns and use the elementary equation operations to eliminate variables, until wearrive at an equivalent system of the form

a′11x1 + a′12x2 + a′13x3 + · · · + a′1nxn = b′1

a′22x2 + a′23x3 + · · · + a′2nxn = b′2

a′33x3 + · · · + a′3nxn = b′3...

...

a′mnxn = b′m

where a′ij denotes the coefficient of the jth unknown in the ith equation after the abovetransformation. Note that at each stage of the elimination process, we want to change somecoefficient of our system to 0 by adding a multiple of an earlier equation to the given equation.

The coefficients a′11 , a′22 , etc in boxes are referred to as pivots, since they are the terms

used to eliminate the variables in the rows below them in their respective columns.9 Once thelinear system is in the above reduced form, we then use back substitution to find the valuesof the xj ’s.

• Exercises:

1. Using Gaussian elimination, solve

x − 3y = −32x + y = 8

2. Using Gaussian elimination, solve

x + 2y + 3z = 62x − 3y + 2z = 143x + y − z = −2

9As we’ll see, pivots don’t need to be on the ij, i = j diagonal. Additionally, sometimes when we pivot, we willeliminate variables in rows above a pivot.


3.6.10 Method of Gauss-Jordan Elimination

• The method of Gauss-Jordan elimination takes the Gaussian elimination method onestep further. Once the linear system is in the reduced form shown in the preceding section,elementary row operations and Gaussian elimination are used to

1. Change the coefficient of the pivot term in each equation to 1 and

2. Eliminate all terms above each pivot in its column,

resulting in a reduced, equivalent system. For a system of m equations in m unknowns, atypical reduced system would be

x1 = b∗1x2 = b∗2

x3 = b∗3. . .

...xn = b∗m

which needs no further work to solve for the xj ’s.

• Exercises:

1. Using Gauss-Jordan elimination, solve

x − 3y = −32x + y = 8

2. Using Gauss-Jordan elimination, solve

x + 2y + 3z = 62x − 3y + 2z = 143x + y − z = −2

Lecture 7: Linear Algebra II 60

3.7 Linear Algebra II

Today’s Topics10:

• Matrix Methods for Linear Systems • Rank • Existence of Solutions • Inverse of a Matrix •Linear Systems and Inverses • Determinants • The Determinant Formula for an Inverse • Cramer’sRule

3.7.1 Matrices, Row Operations, & (Reduced) Row Echelon Form

• Matrices provide an easy and efficient way to represent linear systems such as

a11x1 + a12x2 + · · · + a1nxn = b1a21x1 + a22x2 + · · · + a2nxn = b2

......

...am1x1 + am2x2 + · · · + amnxn = bm

asAx = b

where

1. The m × n coefficient matrix A is an array of mn real numbers arranged in m rowsby n columns:

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

. . ....

am1 am2 · · · amn

2. The unknown quantities are represented by the vector x =

x1

x2...xn

.

3. The RHS of the linear system is represented by the vector b =

b1b2...bm

.

• Augmented Matrix: When we append b to the coefficient matrix A, we get the augmentedmatrix A = [A|b]

a11 a12 · · · a1n | b1a21 a22 · · · a2n | b2...

. . .... |

...am1 am2 · · · amn | bm

• Elementary Row Operations: Just as we conducted elementary equation operations, we

can conduct elementary row operations to transform some augmented matrix representationof a linear system into another augmented matrix that represents an equivalent linear system.Since we’re really operating on equations when we operate on the rows of the matrix, theserow operations correspond exactly to the equation operations:

10Much of the material and examples for this lecture are taken from Simon & Blume (1994) Mathematics forEconomists and Kolman (1993) Introductory Linear Algebra with Applications.


1. Interchanging two rows. =⇒ Interchanging two equations.

2. Multiplying a row by a constant. =⇒ Multiplying both sides of an equationby a constant.

3. Adding two rows to each other. =⇒ Adding two equations to each other.

• Interchanging Rows: Suppose we have the augmented matrix

A =

(a11 a12 | b1a21 a22 | b2

)If we interchange the two rows, we get the augmented matrix(

a21 a22 | b2a11 a12 | b1

)which represents a linear system equivalent to that represented by matrix A.

• Multiplying by a Constant: If we multiply the second row of matrix A by a constant c,we get the augmented matrix (

a11 a12 | b1ca21 ca22 | cb2


• Adding Rows: If we add the first row of matrix A to the second, we obtain the augmentedmatrix (

a11 a12 | b1a11 + a21 a12 + a22 | b1 + b2


• Row Echelon Form: We use the row operations to change coefficients in the augmentedmatrix to 0 — i.e., pivot to eliminate variables — and to put it in a matrix form representingthe final linear system of Gaussian elimination. An augmented matrix of the form

a′11 a′12 a′13 · · · a′1n | b′1

0 a′22 a′23 · · · a′2n | b′2

0 0 a′33 · · · a′3n | b′3

0 0 0. . .

... |...

0 0 0 0 a′mn | b′m

is said to be in row echelon form — each row has more leading zeros than the row precedingit.

• Reduced Row Echelon Form: Reduced row echelon form is the matrix representation ofa linear system after Gauss-Jordan elimination. For a system of m equations in m unknowns,with no all-zero rows, the reduced row echelon form would be

1 0 0 0 0 | b∗10 1 0 0 0 | b∗20 0 1 0 0 | b∗3

0 0 0. . . 0 |

...

0 0 0 0 1 | b∗m


• Exercises:

Using matrix methods, solve the following linear system by Gaussian elimination and thenGauss-Jordan elimination:

1.x − 3y = −32x + y = 8

2.x + 2y + 3z = 62x − 3y + 2z = 143x + y − z = −2

3.7.2 Rank — and Whether a System Has One, Infinite, or No Solutions

• We previously noted that a 2× 2 system had one, infinite, or no solutions if the two lines in-tersected, were the same, or were parallel, respectively. More generally, to determine whetherone, infinite, or no solutions exist, we can use information about (1) the number of equationsm, (2) the number of unknowns n, and (3) the rank of the matrix representing the linearsystem.

• Rank: The rank of a matrix is the number of nonzero rows in its row echelon form. Therank corresponds to the maximum number of linearly independent row or column vectors inthe matrix.

• Examples:

1.

1 2 30 4 50 0 6

Rank=

2.

1 2 30 4 50 0 0

Rank=

3.

1 2 3 | b10 4 5 | b20 0 0 | b3

, bi 6= 0 Rank=

• Let A be the coefficient matrix and A = [A|b] be the augmented matrix. Then

1. rank A ≤ rank A Augmenting A with b can never result in more zero rowsthan originally in A itself. Suppose row i in A is all zerosand that bi is non-zero. Augmenting A with b will yield anon-zero row i in A.

2. rank A ≤ rows A By definition of “rank.”

3. rank A ≤ cols A Suppose there are more rows than columns (otherwise theprevious rule applies). Each column can contain at most onepivot. By pivoting, all other entries in a column below thepivot are zeroed. Hence, there will only be as many non-zerorows as pivots, which will equal the number of columns.


• Existence of Solutions:

1. Exactly one solution: rank A = rank A = rows A = cols A

Necessary condition for a system to have a unique solution:that there be exactly as many equations as unknowns.

2. Infinite solutions: rank A = rank A and cols A > rank A

If a system has a solution and has more unknowns than equa-tions, then it has infinitely many solutions.

3. No solution: rank A < rank A

Then there is a zero row i in A’s reduced echelon that corre-sponds to a non-zero row i in A’s reduced echelon. Row i ofthe A translates to the equation

0xi1 + 0xi2 + · · ·+ 0xin = b′i

where b′i 6= 0. Hence the system has no solution.

• Exercises:

1.x − 3y = −32x + y = 8

2.x + 2y + 3z = 62x − 3y + 2z = 143x + y − z = −2

3.x + 2y − 3z = −42x + y − 3z = 4

4.x1 + 2x2 − 3x4 + x5 = 2x1 + 2x2 + x3 − 3x4 + x5 + 2x6 = 3x1 + 2x2 − 3x4 + 2x5 + x6 = 43x1 + 6x2 + x3 − 9x4 + 4x5 + 3x6 = 9

5.x + 2y + 3z + 4w = 5x + 3y + 5z + 7w = 11x − z − 2w = −6


3.7.3 The Inverse of a Matrix

• Inverse Matrix: An n× n matrix A is nonsingular or invertible if there exists an n× nmatrix A−1 such that

AA−1 = A−1A = In

A−1 is the inverse of A. If there is no such A−1, then A is singular or noninvertible.

• Example: Let

A =

(2 32 2

), B =

(−1 3

21 −1

)Since

AB = BA = In

we conclude that B is the inverse, A−1, of A and that A is nonsingular.

• Properties of the Inverse:

1. If the inverse exists, it is unique.

2. A nonsingular =⇒ A−1 nonsingular (A−1)−1 = A

3. A and B nonsingular =⇒ AB nonsingular (AB)−1 = B−1A−1

4. A nonsingular =⇒ (AT )−1 = (A−1)T

• Procedure to Find A−1: We know that if B is the inverse of A, then

AB = BA = In

Looking only at the first and last parts of this

AB = In

Solving for B is equivalent to solving for n linear systems, where each column of B is solved forthe corresponding column in In. In performing Gauss-Jordan elimination for each individualsystem, the same row operations will be performed on A regardless of the column of B andIn. Hence, we can solve the systems simultaneously by augmenting A with In and performingGauss-Jordan elimination on A. Note that for the square matrix A, Gauss-Jordan eliminationshould result in A becoming row equivalent to In. Therefore, if Gauss-Jordan elimination on[A|In] results in [In|B], then B is the inverse of A. Otherwise, A is singular.

To summarize: To calculate the inverse of A

1. Form the augmented matrix [A|In]

2. Using elementary row operations, transform the augmented matrix to reduced row ech-elon form.

3. The result of step 2 is an augmented matrix [C|B].

(a) If C = In, then B = A−1.

(b) If C 6= In, then C has a row of zeros. A is singular and A−1 does not exist.

• Exercise: Find the inverse of A =

1 1 10 2 35 5 1


3.7.4 Linear Systems and Inverses

• Let’s return to the matrix representation of a linear system

Ax = b

If A is an n × n matrix,then Ax = b is a system of n equations in n unknowns. SupposeA is nonsingular =⇒ A−1 exists. To solve this system, we can premultiply each side byA−1 and reduce it as follows:

A−1(Ax) = A−1b

(A−1A)x = A−1b

Inx = A−1b

x = A−1b

Hence, given A and b and given that A is nonsingular, then x = A−1b is a unique solutionto this system.

• Notice also that the requirements for A to be nonsingular correspond to the requirements fora linear system to have a unique solution: rank A = rows A = cols A.

3.7.5 Determinants

• Singularity: Determinants can be used to determine whether a square matrix is nonsingular.

– A square matrix is nonsingular iff its determinant is not zero.

• Determinants defined inductively:

1. Let A = a. We want the determinant to equal zero when the inverse does not exist.Since the inverse of a, 1/a, does not exist when a = 0, we let the determinant of a be

|a| = a

2. For a 2 × 2 matrix A =

(a11 a12

a21 a22

), A is nonsingular only if a11a22 − a12a21 6= 0

(check by doing Gauss-Jordan to find the inverse of a 2× 2 matrix). We then define thedeterminant of a 2× 2 matrix A as∣∣∣∣a11 a12

a21 a22

∣∣∣∣ = a11a22 − a12a21

= a11|a22| − a12|a21|

3. Extending this to a 3× 3 matrix, we get∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣4. Let’s extend this now to any n× n matrix. Let Aij be the (n− 1)× (n− 1) submatrix

of A obtained by deleting row i and column j. Let the (i, j)th minor of A be

Mij = |Aij |

Then for any n× n matrix A

|A| = a11M11 − a12M12 + · · ·+ (−1)n+1a1nM1n


• Example: Does the following matrix have an inverse?

A =

1 1 10 2 35 5 1

1. Calculate its determinant.

|A| = 1

∣∣∣∣2 35 1

∣∣∣∣− 1

∣∣∣∣0 35 1

∣∣∣∣+ 1

∣∣∣∣0 25 5

∣∣∣∣= 1(2− 15)− 1(0− 15) + 1(0− 10)

= −13 + 15− 10

= −8

2. Since |A| 6= 0, we conclude that A has an inverse.

• Triangular or Diagonal Matrices: For any upper-triangular, lower-triangular, or diagonalmatrix, the determinant is just the product of the diagonal terms.

• Example: Suppose we have the following square matrix in row echelon form (i.e., uppertriangular)

R =

r11 r12 r13

0 r22 r23

0 0 r33

Then

|R| = r11

∣∣∣∣r22 r23

0 r33

∣∣∣∣ = r11r22r33

• Properties of Determinants:

1. |A| = |AT |

2. If B results from A by interchanging tworows, then |B| = −|A|.

3. If two rows of A are equal, then |A| = 0. (Notice that in this case rank A 6=rows A, which was one of the conditionsfor the existence of a unique solution.)

4. If a row of A consists of all zeros, then|A| = 0.

(Same as 3.)

5. If B is obtained by multiplying a row ofA by a scalar s, then |B| = s|A|.

6. If B is obtained from A by adding to theith row of A the jth row (i 6= j) multi-plied by a scalar s, then |B| = |A|.

(i.e., If the row isn’t simply multiplied bya scalar and left, then the determinantremains the same.)

7. If no row interchanges and no scalar mul-tiplications of a single row are used tocompute the row echelon form R fromthe n×n coefficient matrix A, then |A| =|R|.

(Implied by the previous properties.)


8. A square matrix is nonsingular iff its de-terminant 6= 0.

(Implied by the previous properties.)

9. |AB| = |A||B|

10. If A is nonsingular, then |A| 6= 0 and|A−1| = 1

|A| .

3.7.6 Determinants: Formulas for Inverses and Solutions

• Thus far, we have a number of algorithms to

1. Find the solution of a linear system,

2. Find the inverse of a matrix

but these remain just that — algorithms. At this point, we have no way of telling how thesolutions xj change as the parameters aij and bi change, except by changing the values and“rerunning” the algorithms.

• With determinants, we can

1. Provide an explicit formula for the inverse, and

2. Provide an explicit formula for the solution of an n× n linear system.

Hence, we can examine how changes in the parameters and bi affect the solutions xj .

• The Determinant Formula for the Inverse:

– Define the (i, j)th cofactor Cij of A as (−1)i+jMij . Notice that it’s just the signed(i, j)th minor.

– Define the adjoint of A as the n×n matrix whose (i, j)th entry is Cji (notice the switchin indices!). We’ll refer to the adjoint of A as adj A.

Then the inverse of A is given by the formula

A−1 =1

|A|adj A =

C11|A|

C21|A| · · ·

Cn1|A|

C12|A|

C22|A| · · ·

Cn2|A|

......

. . ....

C1n|A|

C2n|A| · · ·

Cnn|A|

• Exercise: Find the inverse of A =

1 1 10 2 35 5 1

• Cramer’s Rule: The Determinant Formula for the Solution of a Linear System:

– Let Aj be the matrix obtained from A by replacing the jth column of A by b.


Example:

A1 =

b1 a12 · · · a1n

b2 a22 · · · a2n...

. . ....

bn an2 · · · ann

Then the unique solution x = (x1, · · · , xn) to the n× n system Ax = b is

xj =|Aj ||A|

• Exercise: Find the solution of the following system:

−2x1 + 3x2 − x3 = 1x1 + 2x2 − x3 = 4−2x1 − x2 + x3 = −3

Lecture 8: Unconstrained Optimization 69

3.8 Unconstrained Optimization

Today’s Topics11: • Taylor Series Approximation • Quadratic Forms • Definiteness of QuadraticForms • Maxima and Minima in Rn • First Order Conditions • Second Order Conditions • GlobalMaxima and Minima

3.8.1 Taylor Series Approximation

• Taylor series are used commonly to represent functions as infinite series of the function’sderivatives at some point a. One can thus approximate functions by using lower-order, finiteseries known as Taylor polynomials. If a = 0, the series is called a Maclaurin series.

• Specifically, a Taylor series of a real or complex function f(x) that is infinitely differentiablein the neighborhood of point a is:

∞∑n=0

f (n)(a)

n!(x− a)n = f(a) +

f ′(a)

1!(x− a) +

f ′′(a)

2!(x− a)2 + · · ·

• We can often approximate the curvature of a function f(x) at point a using a 2nd orderTaylor polynomial around point a:

f(x) = f(a) +f ′(a)

1!(x− a) +

f ′′(a)

2!(x− a)2 +R2

R2 is the Lagrange remainder and often treated as negligible, giving us:

f(x) ≈ f(a) + f ′(a)(x− a) +f ′′(a)

2(x− a)2

• Taylor series expansion is easily generalized to multiple dimensions.

3.8.2 Quadratic Forms

• Quadratic forms important because

1. Approximates local curvature around a point — e.g., used to identify max vs min vssaddle point.

2. Simple, so easy to deal with.

3. Have a matrix representation.

• Quadratic Form: A polynomial where each term is a monomial of degree 2:

Q(x1, · · · , xn) =∑i≤j

aijxixj

which can be written in matrix terms

Q(x) =(x1 x2 · · · xn

)a11

12a12 · · · 1

2a1n12a12 a22 · · · 1

2a2n...

.... . .

...12a1n

12a2n · · · ann

x1

x2...xn

11Much of the material and examples for this lecture are taken from Simon & Blume (1994) Mathematics for

Economists and Ecker & Kupferschmid (1988) Introduction to Operations Research.


orQ(x) = xTAx

• Examples:

1. Quadratic on R2:

Q(x1, x2) =(x1 x2

)( a1112a12

12a12 a22

)(x1

x2

)= a11x

21 + a12x1x2 + a22x

22

2. Quadratic on R3:

Q(x1, x2, x3) =(x1 x2 x3

) a1112a12

12a13

12a12 a22

12a23

12a13

12a23 a33

x1

x2

x3

= a11x

21 + a22x

22 + a33x

23 + a12x1x2 + a13x1x3 + a23x2x3

3.8.3 Definiteness of Quadratic Forms

• Definiteness helps identify the curvature of Q(x) at x.

• Definiteness: By definition, Q(x) = 0 at x = 0. The definiteness of the matrix A isdetermined by whether the quadratic form Q(x) = xTAx is greater than zero, less than zero,or sometimes both over all x 6= 0.

1. Positive Definite xTAx > 0, ∀x 6= 0 Min

2. Positive Semidefinite xTAx ≥ 0, ∀x 6= 0

3. Negative Definite xTAx < 0, ∀x 6= 0 Max

4. Negative Semidefinite xTAx ≤ 0, ∀x 6= 0

5. Indefinite xTAx > 0 for some x 6= 0 andxTAx < 0 for other x 6= 0

Neither

• Examples:

1. Positive Definite:

Q(x) = xT(

1 00 1

)x

= x21 + x2

2


2. Positive Semidefinite:

Q(x) = xT(

1 −1−1 1

)x

= (x1 − x2)2

3. Indefinite:

Q(x) = xT(

1 00 −1

)x

= x21 − x2

2

3.8.4 Test for Definiteness using Principal Minors

• Given an n × n matrix A, kth order principal minors are the determinants of the k × ksubmatrices along the diagonal obtained by deleting n− k columns and the same n− k rowsfrom A.

• Example: For a 3× 3 matrix A,

1. First order principle minors:|a11|, |a22|, |a33|

2. Second order principle minors:∣∣∣∣a11 a12

a21 a22

∣∣∣∣ , ∣∣∣∣a11 a13

a31 a33

∣∣∣∣ , ∣∣∣∣a22 a23

a32 a33

∣∣∣∣3. Third order principle minor: |A|

• Define the kth leading principal minor Mk as the determinant of the k × k submatrixobtained by deleting the last n− k rows and columns from A.

• Example: For a 3× 3 matrix A, the three leading principal minors are

M1 = |a11|, M2 =

∣∣∣∣a11 a12

a21 a22

∣∣∣∣ , M3 =

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣• Algorithm: If A is an n× n symmetric matrix, then

1. Mk > 0, k = 1, . . . , n =⇒ Positive Definite


2. Mk < 0, for odd k andMk > 0, for even k

=⇒ Negative Definite

3. Mk 6= 0, k = 1, . . . , n,but does not fit the pattern of1 or 2.

=⇒ Indefinite.

• If some leading principle minor is zero, but all others fit the pattern of the preceding conditions1 or 2, then

1. Every principal minor ≥ 0 =⇒ Positive Semidefinite

2. Every principal minor of oddorder ≤ 0 and every principalminor of even order ≥ 0

=⇒ Negative Semidefinite

3.8.5 Maxima and Minima in Rn

• Conditions for Extrema: The conditions for extrema are similar to those for functions onR1. Let f(x) be a function of n variables. Let B(x, ε) be the ε-ball about the point x. Then

1. f(x∗) > f(x), ∀x ∈ B(x∗, ε) =⇒ Strict Local Max

2. f(x∗) ≥ f(x), ∀x ∈ B(x∗, ε) =⇒ Local Max

3. f(x∗) < f(x), ∀x ∈ B(x∗, ε) =⇒ Strict Local Min

4. f(x∗) ≤ f(x), ∀x ∈ B(x∗, ε) =⇒ Local Min

3.8.6 First Order Conditions

• When we examined functions of one variable x, we found critical points by taking the firstderivative, setting it to zero, and solving for x. For functions of n variables, the critical pointsare found in much the same way, except now we set the partial derivatives equal to zero.12

• Given a function f(x) in n variables, the gradient ∇f(x) is a column vector, where the ithelement is the partial derivative of f(x) with respect to xi:

∇f(x) =

∂f(x)∂x1

∂f(x)∂x2...

∂f(x)∂xn

• x∗ is a critical point iff ∇f(x∗) = 0.

• Example: Find the critical points of f(x) = (x1 − 1)2 + x22 + 1

1. The partial derivatives of f(x) are

∂f(x)

∂x1= 2(x1 − 1)

∂f(x)

∂x2= 2x2

2. Setting each partial equal to zero and solving for x1 and x2, we find that there’s a criticalpoint at x∗ = (1, 0).

12We will only consider critical points on the interior of a function’s domain.


3.8.7 Second Order Conditions

• When we found a critical point for a function of one variable, we used the second derivativeas an indicator of the curvature at the point in order to determine whether the point was amin, max, or saddle. For functions of n variables, we use second order partial derivatives asan indicator of curvature.

• Given a function f(x) of n variables, the Hessian H(x) is an n×n matrix, where the (i, j)thelement is the second order partial derivative of f(x) with respect to xi and xj :

H(x) =

∂2f(x)∂x21

∂2f(x)∂x1∂x2

· · · ∂2f(x)∂x1∂xn

∂2f(x)∂x2∂x1

∂2f(x)∂x22

· · · ∂2f(x)∂x2∂xn

......

. . ....

∂2f(x)∂xn∂x1

∂2f(x)∂xn∂x2

· · · ∂2f(x)∂x2n

• Curvature and The Taylor Polynomial as a Quadratic Form: The Hessian is used in

a Taylor polynomial approximation to f(x) and provides information about the curvature off(x) at x — e.g., which tells us whether a critical point x∗ is a min, max, or saddle point.

1. The second order Taylor polynomial about the critical point x∗ is

f(x∗ + h) = f(x∗) +∇f(x∗)h +1

2hTH(x∗)h +R(h)

2. Since we’re looking at a critical point, ∇f(x∗) = 0; and for small h, R(h) is negligible.Rearranging, we get

f(x∗ + h)− f(x∗) ≈ 1

2hTH(x∗)h

3. The RHS is a quadratic form and we can determine the definiteness of H(x∗).

(a) If H(x∗) is positive definite, then the RHS is positive for all small h:

f(x∗ + h)− f(x∗) > 0 =⇒ f(x∗ + h) > f(x∗)

i.e., f(x∗) < f(x), ∀x ∈ B(x∗, ε), so x∗ is a strict local min.

(b) Conversely, if H(x∗) is negative definite, then the RHS is negative for all small h:

f(x∗ + h)− f(x∗) < 0 =⇒ f(x∗ + h) < f(x∗)

i.e., f(x∗) > f(x), ∀x ∈ B(x∗, ε), so x∗ is a strict local max.

• Summary of Second Order Conditions:

Given a function f(x) and a point x∗ such that ∇f(x∗) = 0,

1. H(x∗) Positive Definite =⇒ Strict Local Min

2. H(x) Positive Semidefinite∀x ∈ B(x∗, ε)

=⇒ Local Min


3. H(x∗) Negative Definite =⇒ Strict Local Max

4. H(x) Negative Semidefinite∀x ∈ B(x∗, ε)

=⇒ Local Max

5. H(x∗) Indefinite =⇒ Saddle Point

• Example: We found that the only critical point of f(x) = (x1− 1)2 + x22 + 1 is at x∗ = (1, 0).

Is it a min, max, or saddle point?

1. Recall that the gradient of f(x) is

∇f(x) =

(2(x1 − 1)

2x2

)Then the Hessian is

H(x) =

(2 00 2

)2. To check the definiteness of H(x∗), we could use either of two methods:

(a) Determine whether xTH(x∗)x is greater or less than zero for all x 6= 0:

xTH(x∗)x =(x1 x2

)(2 00 2

)(x1

x2

)= 2x2

1 + 2x22

For any x 6= 0, 2(x21 + x2

2) > 0, so the Hessian is positive definite and x∗ is a strictlocal minimum.

(b) Using the method of leading principal minors, we see that M1 = 2 and M2 = 4. Sinceboth are positive, the Hessian is positive definite and x∗ is a strict local minimum.

3.8.8 Global Maxima and Minima

• To determine whether a critical point is a global min or max, we can check the concavityof the function over its entire domain. Here again we use the definiteness of the Hessian todetermine whether a function is globally concave or convex:

1. H(x) Positive Semidefinite ∀x =⇒ Globally Convex

2. H(x) Negative Semidefinite ∀x =⇒ Globally Concave

Notice that the definiteness conditions must be satisfied over the entire domain.

• Given a function f(x) and a point x∗ such that ∇f(x∗) = 0,

1. f(x) Globally Convex =⇒ Global Min

2. f(x) Globally Concave =⇒ Global Max

• Note that showing that H(x∗) is negative semidefinite is not enough to guarantee x∗ is a localmax. However, showing that H(x) is negative semidefinite for all x guarantees that x∗ is aglobal max. (The same goes for positive semidefinite and minima.)

• Example: Take f1(x) = x4 and f2(x) = −x4. Both have x = 0 as a critical point. Unfortu-nately, f ′′1 (0) = 0 and f ′′2 (0) = 0, so we can’t tell whether x = 0 is a min or max for either.However, f ′′1 (x) = 12x2 and f ′′2 (x) = −12x2. For all x, f ′′1 (x) ≥ 0 and f ′′2 (x) ≤ 0 — i.e., f1(x)is globally convex and f2(x) is globally concave. So x = 0 is a global min of f1(x) and aglobal max of f2(x).


3.8.9 One More Example

• Given f(x) = x31 − x3

2 + 9x1x2, find any maxima or minima.

1. First order conditions. Set the gradient equal to zero and solve for x1 and x2.

∂f

∂x1= 3x2

1 + 9x2 = 0

∂f

∂x2= −3x2

2 + 9x1 = 0

We have two equations in two unknowns. Solving for x1 and x2, we get two criticalpoints: x∗1 = (0, 0) and x∗1 = (3,−3).

2. Second order conditions. Determine whether the Hessian is positive or negative definite.The Hessian is

H(x) =

(6x1 99 −6x2

)Evaluated at x∗1,

H(x∗1) =

(0 99 0

)The two leading principal minors are M1 = 0 and M2 = −81, so H(x∗1) is indefinite andx∗1 = (0, 0) is a saddle point.

Evaluated at x∗2,

H(x∗2) =

(18 99 18

)The two leading principal minors are M1 = 18 and M2 = 243. Since both are positive,H(x∗2) is positive definite and x∗2 = (3,−3) is a strict local min.

3. Global concavity/convexity. In evaluating the Hessians for x∗1 and x∗2 we saw that theHessian is not everywhere positive semidefinite. Hence, we can’t infer that x∗2 = (3,−3)is a global minimum. In fact, if we set x1 = 0, the f(x) = −x3

2, which will go to −∞ asx2 →∞.

Lecture 9: Constrained Optimization 76

3.9 Constrained Optimization

Today’s Topics13:

• Constrained Optimization • Equality Constraints • Inequality Constraints • Kuhn-Tucker Con-ditions

3.9.1 Constrained Optimization

• We have already looked at optimizing a function in one or more dimensions over the wholedomain of the function. Often, however, we want to find the maximum or minimum of afunction over some restricted part of its domain.

• In any constrained optimization problem, the constrained maximum will always be less thanor equal than the unconstrained maximum. If the constrained maximum is less than theunconstrained maximum, then the constraint is binding.

• For a function f(x1, . . . , xn), there are two types of constraints that can be imposed:

1. Equality constraints: constraints of the form ck(x1, . . . , xn) = rk. Budget constraintsare the classic example of equality constraints in social science.

2. Inequality constraints: constraints of the form gm(x1, . . . , xn) ≤ bm. These might arisefrom non-negativity constraints or other threshold effects.

• When working with constrained optimization problems, always make sure that the set ofconstraints are not pathological; it must be possible for all of the constraints to be satisfiedsimultaneously.

• Example: Maximize f(x1, x2) = −(x21 + 2x2

2) subject to the constraint that x1 + x2 = 4. Itis easy to see that the unconstrained maximum occurs at (x1, x2) = (0, 0), but that does notsatisfy the constraint. How should we proceed?

3.9.2 Equality Constraints

• Equality constraints are the easiest to deal with because we know that the maximum orminimum has to lie on the (intersection of the) constraint(s).

• The trick is to change the problem from a constrained optimization problem in n variablesto an unconstrained optimization problem in n + k variables, adding one variable for eachequality constraint.

• Lagrangian function: We define the Lagrangian function L(x1, . . . , xn, λ1, . . . , λk) as fol-lows:

L(x1, . . . , xn, λ1, . . . , λk) = f(x1, . . . , xn)−k∑i=1

λi(ci(x1, . . . , xn)− ri)

Occasionally, you may see the following form of the Lagrangian, which is equivalent:

L(x1, . . . , xn, λ1, . . . , λk) = f(x1, . . . , xn) +k∑i=1

λi(ri − ci(x1, . . . , xn))

The λ terms are known as Lagrange multipliers.

13Much of the material and examples for this lecture are taken from Gill (2006) Essential Mathematics for Politicaland Social Research and Simon & Blume (1994) Mathematics for Economists.


• To find the critical points, we take the partial derivatives of L(x1, . . . , xn, λ1, . . . , λk) withrespect to each of its variables. At a critical point, each of these partial derivatives must beequal to zero, so we obtain a system of n+ k equations in n+ k unknowns:

∂L

∂x1=

∂f

∂x1−

k∑i=1

λi∂ci∂x1

= 0 (1)

... =... (2)

∂L

∂xn=

∂f

∂xn−

k∑i=1

λi∂ci∂xn

= 0 (3)

∂L

∂λ1= c1(xi, . . . , xn)− r1 = 0 (4)

... =... (5)

∂L

∂λk= ck(xi, . . . , xn)− rk = 0 (6)

• Some caveats apply. There may be more than one critical point. Analogs to second-orderconditions for unconstrained optimization exist, or it may suffice to check the critical pointsindividually. There are also conditions on the behavior of the constraints at critical points;these are typically satisfied with non-pathological linear constraints.

• Example: Maximize

f(x) = −(x21 + 2x2

2) (7)

s.t. x1 + x2 = 4 (8)

1. Begin by writing the Lagrangian:

L(x1, x2, λ) = −(x21 + 2x2

2)− λ(x1 + x2 − 4)

2. Take the partial derivatives and set equal to zero:

∂L

∂x1= −2x1 − λ = 0 (9)

∂L

∂x2= −4x2 − λ = 0 (10)

∂L

∂λ= −(x1 + x2 − 4) = 0 (11)

(12)

3. The only solution to this system of linear equations occurs at (x1, x2, λ) = (83 ,

43 ,−

163 ).

Therefore, the only critical point occurs when x1 = 83 and x2 = 4

3 . This gives f(83 ,

43) =

−969 , which is less than the unconstrained optimum f(0, 0) = 0.

3.9.3 Inequality Constraints

• Inequality constraints are more challenging because we do not know ahead of time whichconstraints will be binding and which will not. Inequality constraints define the boundaryof a region over which we seek to optimize the function. The maximum/minimum could liealong one of the constraints, or it could be in the interior of the region.


• Again, one way to deal with this problem is by introducing more variables in order to turnthe problem into an unconstrained optimization.

• Slack: For each inequality constraint gi(x1, . . . , xn) ≤ bi, we define a slack variable s2i for

which the expression gi(x1, . . . , xn) ≤ bi − s2i would hold with equality. These slack variables

capture how close the constraint comes to binding. We use s2 rather than s to ensure thatthe slack is positive.

• The Lagrangian function in this case is written as

L(x1, . . . , xn, λ1, . . . , λk, s1, . . . , sm) = f(x1, . . . , xn)−m∑i=1

λi(gi(x1, . . . , xn) + s2i − bi)

• To find the critical points, we now need to take the partials with respect to each x, λ, and s.This will give us n+ 2m equations in n+ 2m unknowns:

∂L

∂x1=

∂f

∂x1−

m∑i=1

λi∂gi∂x1

= 0 (13)

... =... (14)

∂L

∂xn=

∂f

∂xn−

m∑i=1

λi∂gi∂xn

= 0 (15)

∂L

∂λ1= g1(xi, . . . , xn) + s2

1 − b1 = 0 (16)

... =... (17)

∂L

∂λm= gk(xi, . . . , xn) + s2

m − bm = 0 (18)

∂L

∂s1= 2s1λ1 = 0 (19)

... =... (20)

∂L

∂sm= 2smλm = 0 (21)

• Complementary slackness: The last set of first order conditions of the form 2siλi = 0 areknown as complementary slackness conditions. These conditions can be satisfied one of threeways:

1. λi = 0 and si 6= 0: This implies that the slack is positive and thus the constraint doesnot bind.

2. λi 6= 0 and si = 0: This implies that there is no slack in the constraint and the constraintdoes bind.

3. λi = 0 and si = 0: In this case, there is no slack but the constraint binds trivially,without changing the optimum.

• Example: Find the critical points for the following constrained optimization:


f(x) = −(x21 + 2x2

2) (22)

s.t. x1 + x2 ≤ 4 (23)

x1 ≥ 0 (24)

x2 ≥ 0 (25)

1. Begin by writing the Lagrangian:

L(x1, x2, λ1, λ2, λ3, s1, s2, s3) = −(x21+2x2

2)−λ1(x1+x2+s21−4)−λ2(−x1+s2

2)−λ3(−x2+s23)

2. Take the partial derivatives and set equal to zero:

∂L

∂x1= −2x1 − λ1 + λ2 = 0 (26)

∂L

∂x2= −4x2 − λ1 + λ3 = 0 (27)

∂L

∂λ1= −(x1 + x2 + s2

1 − 4) = 0 (28)

∂L

∂λ2= −(−x1 + s2

2) = 0 (29)

∂L

∂λ3= −(−x2 + s2

3) = 0 (30)

∂L

∂s1= 2s1λ1 = 0 (31)

∂L

∂s2= 2s2λ2 = 0 (32)

∂L

∂s3= 2s3λ3 = 0 (33)

(34)

3. This is a huge mess: a system of 8 non-linear equations. We only have to look at thevarious ways that we can satisfy the complementary slackness conditions:

Hypothesis s1 s2 s3 λ1 λ2 λ3 x1 x2 f(x1, x2)

s1 = s2 = s3 = 0 No solutions1 6= 0, s2 = s3 = 0 2 0 0 0 0 0 0 0 0s2 6= 0, s1 = s3 = 0 0 2 0 -8 0 -8 4 0 -16s3 6= 0, s1 = s2 = 0 0 0 2 -16 -16 0 0 4 -32s1 6= 0, s2 6= 0, s3 = 0 No solutions1 6= 0, s3 6= 0, s2 = 0 No solution

s2 6= 0, s3 6= 0, s1 = 0 0√

83

√43 −16

3 0 0 83

43 −96

9

s1 6= 0, s2 6= 0, s3 6= 0 No solution

4. This method has identified the four critical points of the function in the region consistentwith the constraints. The constrained maximum is located at (x1, x2) = (0, 0), which isthe same as the unconstrained max. The constrained minimum is located at (x1, x2) =(0, 4), while there is no unconstrained minimum for this problem.


3.9.4 Kuhn-Tucker Conditions

• The process described above will identify the critical points of a function subject to someconstraints, but it can be a pain to implement. In particular, explicitly including the non-negativity constraints makes the problem significantly more complex.

• Kuhn-Tucker conditions: Because the problem of maximizing a function subject to in-equality and non-negativity constraints arises frequently in economics, the Kuhn-Tucker ap-proach provides a method that often makes it easier to both calculate the critical points andidentify points that are (local) maximums.

1. Setup: We want to maximize a function f(x1, . . . , xn) subject to inequality constraintsg1(x1, . . . , xn) ≤ b1, . . . , gm(x1, . . . , xn) ≤ bm and non-negativity constraints x1, . . . , xn ≥0.

2. Lagrangian function: We use the same Lagrangian as if we were dealing with equalityconstraints (be careful with the signs!!!):

L(x1, . . . , xn, λ1, . . . , λm) = f(x1, . . . , xn)−m∑i=1

λi(gi(x1, . . . , xn)− bi)

3. Kuhn-Tucker conditions for maximum:

∂L

∂x1≤ 0, . . . ,

∂L

∂xn≤ 0 (35)

∂L

∂λ1≥ 0, . . . ,

∂L

∂λm≥ 0 (36)

x1 ≥ 0 . . . xn ≥ 0 (37)

λ1 ≥ 0 . . . λm ≥ 0 (38)

x1∂L

∂x1= 0, . . . , xn

∂L

∂xn= 0 (39)

λ1∂L

∂λ1= 0, . . . , λm

∂L

∂λm= 0 (40)

4. The last two sets of conditions are analogs to the complementary slackness conditionsdiscussed in the previous section.

5. Kuhn-Tucker conditions for minimum: To minimize the function f(x1, . . . , xn), the sim-plest thing to do is maximize the function −f(x1, . . . , xn); all of the conditions remainthe same after reformulating as a maximization problem.

6. There are additional assumptions (notably, f(x) is quasi-concave and the constraints areconvex) that are sufficient to ensure that a point satisfying the Kuhn-Tucker conditionsis a global max; if these assumptions do not hold, you may have to check more than onepoint.

• : Consider the example from the previous section; we want to maximize:

f(x) = −(x21 + 2x2

2) (41)

s.t. x1 + x2 ≤ 4 (42)

x1 ≥ 0 (43)

x2 ≥ 0 (44)


1. This time, we begin by writing the Kuhn-Tucker Lagrangian:

L(x1, x2, λ) = −(x21 + 2x2

2)− λ(x1 + x2 − 4)

2. The Kuhn-Tucker conditions for this problem are:

∂L

∂x1= −2x1 − λ ≤ 0 (45)

∂L

∂x2= −4x2 − λ ≤ 0 (46)

∂L

∂λ= −(x1 + x2 − 4) ≥ 0 (47)

x1 ≥ 0 (48)

x2 ≥ 0 (49)

λ ≥ 0 (50)

x1∂L

∂x2= x1(−2x1 − λ) = 0 (51)

x2∂L

∂x2= x2(−4x2 − λ) = 0 (52)

λ∂L

∂λ= −λ(x1 + x2 − 4) = 0 (53)

(54)

3. The same four points are identified using just the equality constraints - (0, 0, 0), (4, 0,−8),(0, 4,−16), and (8

3 ,43 ,−16

3 ). Three of these points, however, violate the requirement thatλ ≥ 0, so the point (0, 0, 0) is the maximum.

• Exercise: Maximize

f(x) =1

3log(x1 + 1) +

2

3log(x2 + 1) (55)

s.t. x1 + 2x2 ≤ b (56)

x1 ≥ 0 (57)

x2 ≥ 0 (58)

82

4 Computing Handouts

VNC Sessions 83

4.1 Starting/Connecting to a VNC Session

4.1.1 Opening a new VNC session

We will be using special FAS Linux servers (icegov1.unix.fas.harvard.edu throughicegov4.unix.fas.harvard.edu) that have been configured for the Government Department for use inmethods courses.14 These servers are connected to the regular FAS computing system, so that youhave access to the e-mail and filespace that you would normally use by logging into fas.harvard.edu.You can use any SSH program to connect to the servers, we suggest that you connect via VNC(Virtual Network Computing).

Connecting on a Windows machine

There is a special script that you can use to connect to the servers using VNC, which automatesmost of the process:

1. If you are at an HMDC lab computer, double-click on the folder called HMDC VNC, whichshould be on the desktop.

2. If you are using a different Windows computer, set a web browser tohttp://www.hmdc.harvard.edu/HMDC_VNC.EXE and download and install the script.

3. Double-click on the icon that says “Double Click ME” and follow the instructions. Ignoreany warning messages from anti-spyware programs.

4. While the script is creating your VNC session, it will ask you for the server name (icegov1through icegov4), then your username, and finally your password. This is your FAS usernameand password (the same ones as your email). It will then tell you what your session numberis; try to remember this number.

5. A window will open entitled “plink.exe”; you can minimize this window, but don’t close itwhile you are using your VNC session or the link to the servers will be broken.

Once this is done, you are ready to go! Open a terminal window by clicking on “Applications” inthe upper left-hand corner and moving the mouse over “System Tools>>Terminal”.

Connecting on a Mac

The process for connecting on a Mac is slightly more complicated.

1. Find and open the Mac Terminal. The simplest way is to type in ”terminal” in spotlight atthe upper right corner. It is also located under ”Utilities” in the Finder window.

2. At the terminal prompt, type in

ssh <username>@<servername>.unix.fas.harvard.edu

14Compiled from material written by Patrick Lam, Mike Kellermann, Olivia Lau, Ryan T. Moore, Dan Hopkins,Ian Yohai, and probably a bunch of other people.

http://www.hmdc.harvard.edu/HMDC_VNC.EXE

VNC Sessions 84

<username> is your FAS username (same as your FAS email).<servername> is either icegov1, icegov2, icegov3, or icegov4.

For example,

ssh [email protected]

Press Enter.

3. The computer should respond with something like this:

The authenticity of host ’icegov3.unix.fas.harvard.edu (140.247.34.56)’ can’t be established.

RSA key fingerprint is 4b:c4:2e:fb:5f:ac:71:45:77:ee:fa:a8:14:8c:81:03.

Are you sure you want to continue connecting (yes/no)?

Type yes and hit Enter.

4. It will then prompt you for your password. Type in your FAS password and hit Enter. Thenhit Enter again.

5. You should see a prompt similar to the following:

icegov3:plam [~]>

Type in vncserver once.

6. It will then give you a session number. Remember this number! Logout by typing logout

and then hit Enter.

7. At the terminal prompt, type in the following:

ssh -L 59xx:localhost:59xx <username>@<servername>.unix.fas.harvard.edu

where <username> and <servername> are the same as before and xx represents your two-digitsession number. If your session number is one digit, then add a 0 before it.

For example,

ssh -L 5906:localhost:5906 [email protected]

Then hit Enter.

8. When prompted, enter your FAS password and press Enter. Hit Enter again. Leave theTerminal open.

9. Download, install, and open Chicken of the VNC from http://sourceforge.net/projects/

cotvnc/.

• On the left side, click on New Server.

• In the Host field, make sure it says localhost.

• In the Display field, type in your session number (no leading 0 needed).

• In the Password field, type in your FAS password.

Click Connect.

Once this is done, you are ready to go!

http://sourceforge.net/projects/cotvnc/

http://sourceforge.net/projects/cotvnc/

VNC Sessions 85

4.1.2 Transferring files from VNC to your desktop

Windows: The HMDC connection script comes with WinSCP when you connect or you can usea program like SecureFX.

Mac: Download Fugu http://rsug.itd.umich.edu/software/fugu/.

• Connect to: <servername>.unix.fas.harvard.edu

• Enter FAS username and click Connect.

• Enter FAS password when prompted.

4.1.3 What to do when you are done

You have two options for disconnecting from the VNC server:

Suspend the Session: If you are still working on something or running something computational,you will want to suspend the VNC session. The programs will stay open and running even thoughyou’ve exited the VNC server.

• To suspend a VNC session, simply click the X on the VNC viewer window, similar to howyou would exit a program. The VNC window will close, but everything on the server will beleft untouched for when you reconnect.

Terminate the Session: If you are completely finished with everything, you will want to terminatethe VNC session so that you are not wasting resources on the server. All running programs will beended and anything that is not saved will be lost.

• To terminate a VNC session, type

vncserver -kill :<session number>

in the VNC terminal. You can also type it in in the PuTTy window (for Windows) or theMac terminal (for Mac). For example,

vncserver -kill :6

Note the space between -kill and :6. If you forgot your session number, type in vncfinger.

4.1.4 Reconnecting to an existing (suspended) session

Reconnecting to your suspended VNC session is easy.

Windows

If you are using the same computer from which you created the session, look in the “HMDC VNC”folder. There should be an icon with the name of the server followed by your username. Forexample:

http://rsug.itd.umich.edu/software/fugu/

VNC Sessions 86

icegov3.unix.fas.harvard.edu-plam

will connect to my VNC session on icegov3. Double-click on the icon and enter your password whenasked.

If you are working from a different computer, you will have to double-click the “Double Click ME”icon again to connect. Enter the name of the server on which your existing session is located,followed by your username and password. The script should be smart enough to figure out whetheryou have an existing VNC session on that server.

Mac

On Mac, begin with step 7 from above. If you forgot your session number, start from the beginningand type in vncfinger for step 5.

4.1.5 Troubleshooting

How do I print? It is possible to print directly from your VNC session, but it is likely to printin the basement of the Science Center. The easiest way to print in CGIS is tomove files from yourhome directory to your local computer.

Why is the screen only partially filled? When you launch a VNC session for the first time,it records the screen resolution. It will remain at this resolution even if you connect to the samesession from a different computer. Unfortunately, you will need to 1) return to a computer withthe same resolution as the original, 2) kill your VNC session and start over, or 3) deal with it.

Why do I have multiple VNC sessions? There are two ways that you can have multiple VNCsessions. If you have sessions on multiple servers, then you have been connecting to the serversincorrectly. Each time that you connect to a server, the script will create a new session if it can’tfind one already active. It doesn’t know that you meant to type “icegov2” but accidentally typed“icegov3”. If you have multiple sessions on a single server, then you have probably created themmanually by typing “vncserver” at the prompt. In either case, you need to pick one session andkill all of the others, lest all of your sessions be killed for you.

Home Computer 87

4.2 Configuring Your Computer

4.2.1 Downloading/Installing R

1. Go to http://www.r-project.org.

2. Click on CRAN (left-hand side of screen).

3. Select a mirror (e.g. http://cran.cnr.Berkeley.edu).

4. Click on your OS platform.

5. For Windows, click on Base and then download and run the executable file (R-2.9.1-win32.exe),accepting all the defaults. For Mac, download the image file (R-2.9.1.dmg) and follow theinstallation instructions.

6. You should be all set. A good introduction to using R can be found athttp://people.hmdc.harvard.edu/~mathpre/r/RMathCamp04.pdf.

4.2.2 Downloading/Installing LATEX

In order to get LaTeX up and running, you need to download and install two programs:

1. A TeX distribution (which you will almost never touch)

• Windows: Download the basic MikTeX system (http://miktex.org/).

• Mac: Download the MacTeX distribution (http://www.tug.org/mactex/).

2. A LaTeX Editor (which you will do almost everything in)

• I recommend Texmaker (http://www.xm1math.net/texmaker/), which is free and avail-able for both Windows and Mac.

• Others editors used by faculty and graduate students in the department include XEmacs(free), WinEdt (free from Harvard Computing), and LyX (free).

Once you’ve downloaded and installed these two programs, you should be all set. Open up yourLaTeX editor to begin.

A useful resource for getting started with LATEX can be found athttp://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf.

http://www.r-project.org

http://people.hmdc.harvard.edu/~mathpre/r/RMathCamp04.pdf

http://miktex.org/

http://www.tug.org/mactex/

http://www.xm1math.net/texmaker/

http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf

Documents

Math (P)refresher for Political Scientists · 3.4 Probability I: Probability Theory ... (P)refresher for Political Scientists Wednesday, August 22 - Thursday, August 30 ... William