Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Stat 451 Lecture Notes 0112
Introduction
Ryan MartinUIC
www.math.uic.edu/~rgmartin
1Based on parts of: Dalgaard’s ISwR book, Chapter 1 in Givens & Hoeting,and Chapter 7 of Lange
2Updated: January 13, 20161 / 56
What to compute?
Stat 451 is a course about computational statistics.
Therefore, it is important first to discuss what we want tocompute in a statistics problems.
Here, we are basically concerned with two kinds of things:
maximizing the likelihood functionintegrating a “posterior distribution”
The former notion should be familiar from your experiencewith maximum likelihood in Stat 411.
The latter may be new to you — it’s “Bayesian”.
Next is a brief introduction to these concepts, along with anon-trivial illustration.
2 / 56
Maximum likelihood
Suppose we have n independent observations, Y1, . . . ,Yn, andthe density/mass function pθ for these observations dependson an unknown parameter θ.
The likelihood and log-likelihood functions are
L(θ) =n∏
i=1
pθ(Yi ) and `(θ) =n∑
i=1
log pθ(Yi ).
The maximum likelihood estimator (MLE) θ of θ, based ondata, maximizes the likelihood, i.e.,
θ = arg maxθ
L(θ) ⇐⇒ ˙(θ) = 0.
Need to be able to optimize and/or find roots of functions.
3 / 56
Maximum likelihood (cont)
Besides producing an estimate of the unknown parameter, wemight also like to assess its uncertainty.
In Stat 411 you learn that, under some conditions, when thesample size n is large, the distribution of θ is approximatelynormal with mean θ and variance I (θ)−1, where I (θ) is theFisher information matrix:
I (θ) = Eθ{ ˙(θ) ˙(θ)>} = −Eθ{¨(θ)}.
Then an approximate 95% confidence interval for θj is
θj ± 1.96 ·√
[I (θ)−1]jj , j = 1, . . . , d .
So, computing derivatives and inverting matrices is important.
4 / 56
Bayesian approach
The Bayesian approach is based on using the rules ofprobability for inference.
Start with a prior distribution for θ, with density/massfunction π(θ), basically just a weight function.
Yields a conditional distribution for θ, given Y , as
π(θ | Y ) =L(θ)π(θ)∫L(u)π(u) du
∝ L(θ)π(θ).
Now we treat π(θ | Y ) as the object of interest and the goalis to produce various summaries, such as mean, variance,quantiles, probabilities, etc.
So, integrating functions will be important.
5 / 56
Example: probit regression
Y1, . . . ,Yn are independent (not iid) binary observations.
Specifically, Yiind∼ Ber
(Φ(x>i θ)
), i = 1, . . . , n, where:
“Ber” denotes a Bernoulli distribution;x1, . . . , xn are fixed d-dimensional covariates;θ is a d-dimensional parameter vector; andΦ is the standard normal distribution function.3
Exercise:
write out log-likelihood functioncalculate Fisher information matrix...
3Other cdfs can be used, but then the model isn’t called “probit”...6 / 56
Remarks
This course will mainly study how to solve certain optimizationand integration problems that arise in statistics applications.
We’ll need some background on general numerical methods.
Software will also be important — we will use R.
Some of what we discuss in the class will be simple, otherthings more difficult.
My goal is that students completing the course will havesufficient background to read current papers on computationalstatistics and implement their methods.
7 / 56
Outline
1 Review of statistical inference
2 Introduction to RBasicsR sessionR graphicsR programmingData entry
3 Math and stat tools
8 / 56
General facts about R
R is a free version of the S-PLUS software.
Can be downloaded for free (http://cran.r-project.org)for Windows, Mac, and Unix computers.
Environment is interactive by default—like a calculator—butusers can create files of R code (called scripts) on the sidewhich can be run all at once within R.
It is possible to write code that works together with lower-levelprogramming languages like C and FORTRAN (for speed).
R is powerful because of its flexibility — users can easilydefine their own functions or modify existing functions to suittheir needs.
9 / 56
Arithmetic
Among other things, R can do arithmetic like a calculator.
Basic binary (arithmetic) operations are:
+ Addition ^ or ** Exponentiation- Subtraction %/% Integer division* Multiplication %% Modulus (remainder)/ Division
10 / 56
Variables and assignments
Even with fairly routine calculations it is helpful to be able tostore some intermediate values.
R allows users to assign a value to a particular variable.
Syntax: x <- 7
This means that the value 7 is assigned to the variable x.
Note: The assignment symbol <- is to be treated as a singlecharacter, an arrow pointing to the left.
One can use the underscore symbol or an equal sign in placeof the assignment character — not recommended.
Underscore symbol cannot be used in variable names; use aperiod instead, e.g., pred.value.
11 / 56
Expressions and objects
In R, the user enters an expression and the system evaluates itand produces output.
These expressions need not be formulas — they can generategraphs, output data sets, etc.
Expressions work on objects, basically anything that can beassigned to a variable.
But the syntax used is expression/object specific.
In what follows we will discuss several important types ofexpressions and objects.
Use the str(X) to view the “structure” of the object X.
12 / 56
Functions and arguments
Functions in R can take many forms:
There’s the kind that look like mathematical functions, saylog(x),and the kind that don’t, say plot(x, y, pch=2).
The common feature is that there is a set of paranthesescontaining those arguments that the fuction applies to.
Two “types” of arguments:
Positional – variable recognized by position in the list.Named – variable recognized by name.
Some functions don’t have arguments, some have defaultarguments, and some allow “arbitrary” arguments.
R has an extensive list of built-in function that can do all sortsof things – and it’s easy to write your own functions since thefunction syntax in R is the same as ordinary R syntax.
13 / 56
Vectors
Numeric vectors are fairly straightforward.
There are basically4 two other kinds of vectors:
CharacterLogical
Character vectors have elements made up of character strings;e.g. names <- c(’Small’, ’Medium’, ’Large’)
Logical vectors have elements TRUE or FALSE, and are veryuseful for indexing data sets.
An example of how to get a logical vector:
> gpa <- c(3.0, 2.8, 3.4, 3.7, 3.9, 3.3)
> gpa > 3.5
[1] FALSE FALSE FALSE TRUE TRUE FALSE
4Complex vectors also exist14 / 56
Vectors (cont.)
Three functions to create vectors:
c() – concatenateseq() – patterned sequencerep() – repeat something
A vector must contain elements of the same “type”, so whathappens if two variables x and y of different types areconcatenated?
The general (and non-informative) answer is that they arecoerced into types that match.
For example:
> c(FALSE, 7)
[1] 0 7
> c(11.7, ’abc’)
[1] ’11.7’ ’abc’
15 / 56
Vectors (cont.)
An interesting feature of R is that it does “vectorizedarithmetic.”
That is, R will apply arithmetic operations (and some otherfunctions) in a natural way.
For example:
> x <- c(7, 10, 11)
> y <- seq(5, 3, by=-1)
> x + y
[1] 12 14 14
If the two vectors are not of the same length, the shorter onegets “recycled” — error message if the length of the longervector is not a multiple of the length of the shorter vector.
When defining your own functions, remember to be carefulabout assuming it will vectorize how you want it to!
16 / 56
Matrices and arrays
A natural extension of a vector is a matrix, which is just avector with a double index.
Example: M <- matrix(1:6, nrow=3, ncol=2).
In R, matrices are almost always5 treated just like vectors.
rbind() and cbind() functions can be used to append twoor more matrices by rows and columns, respectively.
Can name the rows and columns with rownames() andcolnames() functions.
More generally, R can work with an array (a vector with nindices), but these are a bit less common, perhaps becausethey’re hard to visualize.
5The only time R treats matrices in a linear algebra sort of way is when theuser asks R to do something “linear algebra like” such as matrix multiplication.
17 / 56
Data frames
A data frame is R’s version of what we think of as a datamatrix or data set.
The columns represent variables and the rows represent cases.
This idea is similar to a matrix, but matrices must be entirelyof the same type, while data frames can have a mixture ofnumeric, character, and logical variables.
To create a data frame: D <- data.frame(list-of-variables)
We’ll talk about reading files into a data frame later.
Many statistical routines in R (e.g., linear regression) aredesigned to operate on data frames.
18 / 56
Lists
The list structure is quite different and (as far as I know)unique to R.
A list in R is exactly as its name suggests — a list of objects.
The distinguishing feature is that a list can contain (almost?)any kind of object in R.
For example, objects in the list can be vectors, matrices, andeven functions.
Syntax: mylist <- list(list-of-objects).
For example:
> M <- matrix(c(2, 5, 7, 7), nrow=2)
> f <- function(x) log(x)+x^2
> mylist <- list(mymat=M, myfun=f)
> mylist$myfun(mylist$mymat)
19 / 56
Indexing
Given a vector/matrix/array/data frame/list, we would like tobe able to pick off certain values.
Need understand how these objects are indexed.
For example, if M is a matrix, then M[i,j] refers to the valuein the i-th row and j-th column of M.
Also, M[,j] returns the j-th column of M as a vector.
Data frames are indexed similarly, and vectors are justone-dim matrices.
We’ve seen that objects of a list are indexed by their nameand a $ sign.
Example: What is mylist$mymat[2,2]?
Example: What is mymat[-1,]?
20 / 56
Subsetting
By thinking of indexing in terms of logical variables, we canextend this idea to allow for subsetting of our object.
For example, consider the code M[1,2].
This is equivalent to defining two logical vectors:
> row.log <- (1:nrow(M)) == 1
> col.log <- (1:ncol(M)) == 2
Then M[1,2] is equivalent to M[row.log, col.log].
To generalize, we can define any sort of logical variable we likeand apply it as above.
For example, suppose a data frame D has a variable age. Toget only those rows for adults, use the code D[,D$age > 19]
There’s another generalization of the row/column indexing:
> x <- seq(5, 25, by=5)
> x[c(2,3)]
[1] 10 15
21 / 56
Implicit loops
In many cases, we want to obtain some kind of summary ofthe rows and/or columns of a matrix or data frame (or list).
Such a process requires scanning the particular dimension ofthe object and applying the function each time.
R functions apply() and its variations do this directly.
For lists, use lapply() or sapply(); for matrices or dataframes use apply().
Syntax: Suppose x and y are two numeric vectors.
> mylist <- list(var1=x, var2=y)
> lapply(mylist, mean)
> sapply(mylist, mean)
> mymat <- cbind(var1=x, var2=y)
> apply(mymat, 2, mean, na.rm=TRUE)
22 / 56
Sorting
Sorting a single vector is easy — use sort(x).
But what if you want to sort the rows of a matrix by aparticular column?
Goal is to sort the rows data frame D by column 1.
Use the order() function:
> o <- order(D[,1])
> D.sorted <- D[o,]
Using o <- order(D[,1], D[,2]) would sort D by the firstcolumn and then the second.
23 / 56
Outline
1 Review of statistical inference
2 Introduction to RBasicsR sessionR graphicsR programmingData entry
3 Math and stat tools
24 / 56
Workspace and directories
When you fire up R, you’ll have a working directory.
To view this directory, type getwd().
Change the directory by typing setwd(’mydir’).
To view the objects in the workspace, type ls().
To remove an object X from workspace, type rm(X).
25 / 56
Workspace and directories (cont.)
Save workspace to the working directory: save.image().
This command saves all the objects in the current workspaceto a file .Rdata in the working directory — you can specify adifferent filename if you like.
To save objects x, y and z in file myfile.Rd, type
save(x, y, z, file=’myfile.Rd’)
Load the saved file: load(file=’myfile.Rd’)
Note: Saving the workspace does not save the output!
26 / 56
Saving input and output
The save command saves objects in the workspace, but doesnot save either the R input or output.
R input (commands) can be stored in an external file, called ascript, say myscript.R.
Run commands in script: source(’myscript.R’).
To begin a session where the output is stored in a file myfile,type sink(’myfile’).
When an expression is evaluated, nothing is printed to theoutput terminal — instead, everything is printed to the filemyfile until the user types sink().
27 / 56
Getting help
In R, type help(mean) to see some documentation for thefunction mean. You can also type ?mean for short.
Typing help.start(’mean’) will open a HTML help filewith searching capabilities, etc.
Google searches are very helpful too.
Extensive documentation online or in your installation — see“Introduction to R” and “Writing R Extensions” (probablymore than you’ll ever need to know about R).
28 / 56
Packages
Thousands of extra packages are available that containspecialized functions and data sets.
These functions often contain compiled (C or Fortran) code.
Look at the CRAN repository for a list (with descriptions) ofavailable packages.
To install a package pkg, type install.packages(’pkg’)
and follow the instructions.
Once the package is installed, to access its contents typelibrary(pkg).
The objects within this package are now available for use.
29 / 56
Outline
1 Review of statistical inference
2 Introduction to RBasicsR sessionR graphicsR programmingData entry
3 Math and stat tools
30 / 56
Introduction
One of the main advantages of R is its graphical capabilities.
This includes having a number of built-in graphicalprocedures, as well as giving the user the flexibility to producehis/her own plots.
Here we’ll see a few examples of the available graphical tools,with some focus on how to annotate graphs for presentation.
Note: It is possible to directly produce PDF or Postscriptgraphics for inclusion in LaTeX files.
31 / 56
Scatterplots
The following code contains lots of new ideas.
Take notice of where the labels are printed!
x <- runif(50, 0, 2)
y <- runif(50, 0, 2)
plot(x, y, xlab=’x-label’, ylab=’y-label’,
main=’Main Title’, sub=’subtitle’)
text(0.6, 0.6, ’text at (0.6,0.6)’)
abline(h=0.6, v=0.6, lty=2)
for(s in 1:4) mtext(-1:4, side=s, at=0.7, line=-1:4)
mtext(paste(’side’, 1:4), side=1:4, line=-1, font=2)
32 / 56
Histograms
Histograms are very easy in R.
The basic command is hist(X), where X is the variable youwant to draw the histogram of.
There are a number of options to customize this plot.
You can add lines to the plot with curve().
Add a legend to label the various curves.
See the mean.med.hist function in the code online.
33 / 56
Boxplots
Nice way to visualize the location and spread of a distribution.
Especially useful for comparing two or more distributions.
Basic syntax is boxplot(X) where X is a numeric vector or alist that contains multiple numeric vectors.
Can do some similar sorts of customization.
See the function mean.med.comp in the code online.
34 / 56
Outline
1 Review of statistical inference
2 Introduction to RBasicsR sessionR graphicsR programmingData entry
3 Math and stat tools
35 / 56
Flow control 1: If-then-else
An important part of programming is conditional execution ofcommands, and this is accomplished through the if-then-elsestructure.
Basic syntax:
if(condition1) {
## Do something
} else if(condition2) {
## Do something else
} else {
## Do another thing
}
Inside the if() is a logical variable, taking values TRUE orFALSE.
Logical variables can be “combined” with the usual Booleanoperators: & (and), | (or), ! (not).
To compare two variables, type A == B or A != B.
36 / 56
Flow control 2: Loops
The three major players are for(), while(), and repeat.
Illustrate while() and repeat with an example of computingthe square root of a non-negative number.
y <- 12345
x <- y / 2
while(abs(x*x - y) > 1e-10) x <- (x + y / x) / 2
print(x) # based on while()
x <- y / 2
repeat {
x <- (x + y / x) / 2
if(abs(x*x - y) < 1e-10) break
}
print(x) # based on repeat
37 / 56
Flow control 2: Loops (cont.)
The for() loop is by far the most common looping structure.
It is typical to run the loop with a counter, stopping once thecounter reaches it maximum value:
for(i in 1:n) { ## Do something }
In particular:
x <- seq(0, 1, by=0.05)
plot(x, x, type=’’l’’)
for(j in 2:5) lines(x, x^j)
But there’s a bunch of other things one can do too; e.g.,
for(i in (1:10)^4)
for(j in c(2,5,7))
for(var in names(data))
for(f in c(sin, cos, tan))
38 / 56
Avoiding loops
Knowing when (and how) to avoid loops is as important asknowing how to use them.
Loops are very easy to program in R, but can run very slowdepending on the application.
Often a version of the apply function is better.
Example: find the maximum value in each column of X
Do this
max.X <- apply(X, 2, max)
don’t do this
max.X <- rep(NA, ncol(X))
for(j in 1:ncol(X)) max.X[j] <- max(X[,j])
apply is sometimes much faster, other times not much faster;but it’s always much cleaner!
39 / 56
Outline
1 Review of statistical inference
2 Introduction to RBasicsR sessionR graphicsR programmingData entry
3 Math and stat tools
40 / 56
Reading data 1: scan
Aside from typing in data directly with c(...), the simplestway to read in data is with the scan command.
Can read in vector or list objects.
If file.dat is a text file containing numeric or characterdata, typing X <- scan(file=’file.dat’) will read thedata from this file and store it in the vector X.
Lists can also be done but the syntax is weird; go tohelp(scan) for details.
41 / 56
Reading data 2: read.table
In statistical applications, we usually have several variablesmeasured on a number of cases.
In such cases, a data frame is the most convenient data type.
By default, R looks for data points arranged in columns with asingle space separating two values.6
If two spaces (delimiters) appear next to one another, Rassumes the value is missing, and enters NA.
Suppose we have a file data.dat that contains data pointsseparated by commas, with a header row containing thevariable names. Then the syntax is
read.table(file=’data.dat’, header=TRUE, sep=’,’)
6It’s easy to change this default!42 / 56
Reading data 2: read.table (cont.)
Things to consider when reading a file:
header lineseparator/delimiterquotesmissing valuesunfilled lineswhite space in character fieldscomments...
Check out help(read.table) for details.
There are also some special shortcut functions, such asread.csv and read.delim, that read comma and tabdelimited data files by default.
43 / 56
Writing data to a file
In some cases we will have an output data set that we wouldlike to write to a file, perhaps for someone else using adifferent software to analyze.
For a “rectangular” data object X in R, we can write this to atext file with the write.table command.
The syntax is basically the same as that of read.table.
Note that R will first coerce X to a data frame, so that it’spossible to include headers.
44 / 56
Outline
1 Review of statistical inference
2 Introduction to R
3 Math and stat toolsProbability stuffStatistical methodsLinear algebra
45 / 56
Combinatorics
For “counting problems,” we’d like to have built-in functionsto calculate “combinations” and factorial.
factorial(x) returns x! for integer x.
choose(n,k) returns(nk
).
Related functions are gamma, lgamma, digamma, etc — theseare the gamma function, the log-gamma function, thederivative of the log-gamma function, respectively.
46 / 56
Random sampling
The sample function can be used to take random samplesfrom a finite set.
If X is a vector, then sample(X) will generate a randompermutation of the elements in X.
For integer n, sample(n) and sample(1:n) are equivalent.
Options: size=k or replace=TRUE or...
If X is a matrix/data frame with 10 columns, thenX[,sample(10,size=7)] will create new matrix containing 7of the original columns in random order.
47 / 56
Probability distributions
R has a number of built-in functions to do probabilitycalculations for random variables.
Built in stuff for normal, binomial, Poisson, exponential,gamma, uniform, hypergeometric,...
Let dist be the abbreviation for a generic distribution; forexample norm for normal.
ddist = compute pdf of dist
pdist = compute cdf of dist
qdist = compute inverse cdf of dist
rdist = generate random variables from dist
dist can be norm, binom, pois, exp, gamma, unif,...
Look at the help files for the parametrizations.
48 / 56
Probability distribution example
Draw plots of a binomial pdf (technically a pmf) and cdf.
n <- 25
p <- 0.4
plot(x=0, y=0, type=’n’, xlim=c(0,n), ylim=c(0,1),
xlab=’x’, ylab=’PDF and CDF’)
lines(x=0:n, y=pbinom(0:n, n, p), type=’s’, lwd=2,
col=’gray’)
lines(x=0:n, y=dbinom(0:n, n, p), type=’h’, lwd=2)
legend(’right’, inset=0.05, lwd=2, col=c(’black’,’gray’),
c(’PDF’, ’CDF’))
49 / 56
Outline
1 Review of statistical inference
2 Introduction to R
3 Math and stat toolsProbability stuffStatistical methodsLinear algebra
50 / 56
Quick summary
R is designed for statistical analysis so, naturally, it has built-infunctions that do many of the standard statistical methods.
For example:
t.test (obviously) does t-testslm does linear models (e.g., ANOVA, regression, etc)glm for generalized linear models (e.g., logistic regression)
Some examples in the code online.
Here, in Stat 451, the goal is to learn how to carry out thesecomputations, so we will avoid using the built-in functions,except for checking our answers.7
7Of course, outside Stat 451, it is best to use built-in functions to do thesestandard things.
51 / 56
Outline
1 Review of statistical inference
2 Introduction to R
3 Math and stat toolsProbability stuffStatistical methodsLinear algebra
52 / 56
Matrix arithmetic
Let A and B be two matrices of suitable dimension.8
Adding and subtracting matrices is obvious.
What about A * B or A / B?
Matrix multiplication requires a different symbol: A %*% B.
We’ll talk about matrix inversion below.
8You need to be careful about making sure the matrix dimensions arecorrect, since some of the arithmetic operations will “vectorize” and can giveunexpected results...
53 / 56
More matrix things
det(M) will return the determinant of M.
diag(M) will do one of two things:
If M is a matrix, then diag(M) is a vector filled with thediagonal entries of M;if M is a vector, then diag(M) will be a diagonal matrix withvector M on the diagonal.
Solving a linear system, Ax = b for x : solve(A, b).
Matrix inversion:
If M is invertible, then solve(M) is the inverse;if M is not invertible, then ginv(M) returns a generalizedinverse.9
9Requires the MASS library.54 / 56
Matrix decompositions
Spectral theorem says that if M is a symmetric d × d positivedefinite matrix, then there exists a diagonal matrix Λ and anorthonormal matrix U such that M = UΛU>.
Diagonal entries of Λ are eigenvalues of M and the columns ofU are the corresponding eigenvectors.
R gives this decomposition of M with the function eigen(M).
There are other matrix decompositions of interest:
Cholesky decomposition: chol(M)
singular value decomposition: svd(M)
...
55 / 56
Neat example: sweep operator
Let M = (Mij) be a symmetric positive definite matrix.
Sweeping on the kth diagonal entry returns a new matrixM = (mij) defined by
mkk = − 1
mkk, mik =
mik
mkk, mkj =
mkj
mkk, mij = mij −
mikmkj
mkk.
Sweeping gives lots of nice properties of the matrix; seeChapter 7.5 in Lange.
In particular, sweeping M successively along each diagonalentry (in any order) returns the inverse M−1.
See the function sweep in the online code.
56 / 56