8/2/2019 R Tutorial.pdf
1/27
An Easy Introduction To R for
IE 460, IE 508 and IE 586 Course Participants
Ismail Basoglu
February 23, 2012
1
8/2/2019 R Tutorial.pdf
2/27
1 Introduction
This document contains an easy introduction to the programming language R. By the help of eachexample given in this document, you should be able to gather a basic knowledge about R, which will helpyou
To use predefined functions of statistical forecasting models and realize an effective analysis of givendata or time series in IE 460 Statistical Forecasting and Time Series course,
To run statistical tests, build statistical models and apply inferential methods regarding the topicsin IE 508 Statistical Inference course,
To create financial applications and implement Monte Carlo Methods in IE 586 Quantitative Fi-nance course.
In order to comprehend this programming language, it is recommended that you try each and every stepof the applications presented in this document.
You can download the latest version of R from http://cran.r-project.org/. For Windows users,click Windows link, then the base link and you will see the download link for the *.exe file.
Once you install R, we recommend you to write your code in script files. Just click File from the
quick access bar, then New script and you can write your code inside this script. If you have a completecode in your script file, you can press Ctrl+A and then Ctrl+R to run your code in the R console ina fast manner. You can always save your script files, then reach them again by clicking File and Openscript from the quick access bar.
Have fun!
2
8/2/2019 R Tutorial.pdf
3/27
2 R Works with Vectors
2.1 Creating Vectors
In order to assign a value to a specified variable (e.g. 3 to x), we do the following:
x < - 3or
x = 3
We will use the operator
8/2/2019 R Tutorial.pdf
4/27
x
8/2/2019 R Tutorial.pdf
5/27
If we are not interested in the length of the sequence but the step size, we can use by parameterinstead of length.out.
x =: greater than or equal to ==: equal to (do not forget that a single = symbol is used for assigning values) !=: not equal to
In the following sequence of examples, we create a vector and use it in different logical expressions. If avector element satisfies the expression, it returns a TRUE, otherwise a FALSE in the corresponding index.You can use && as and and || as or in between logical expressions.
x
8/2/2019 R Tutorial.pdf
6/27
x
8/2/2019 R Tutorial.pdf
7/27
x
8/2/2019 R Tutorial.pdf
8/27
vec
8/2/2019 R Tutorial.pdf
9/27
x
8/2/2019 R Tutorial.pdf
10/27
y
8/2/2019 R Tutorial.pdf
11/27
x
8/2/2019 R Tutorial.pdf
12/27
x
8/2/2019 R Tutorial.pdf
13/27
exp(4.60517) # must yield 100, maybe with a rounding error
# [1] 99.99998
exp(log(100)) # no rounding errors
# [1] 100
exp(seq(-2,2,0.4))
# [1] 0.1353353 0.2018965 0.3011942 0.4493290 0.6703200 1.0000000 1.4918247
# [8] 2.2255409 3.3201169 4.9530324 7.3890561
gamma(5) # equivalent to factorial(4)
# [1] 24
gamma(5.5) # equivalent to factorial(4.5)
# [1] 52.34278
x
8/2/2019 R Tutorial.pdf
14/27
3 Probability and Statistical Basis of R
3.1 Probability Functions in R
There are four functions related to the distributions which are well-known and commonly used in proba-bility theory and statistics. Let us give the definitions of those functions on normal distribution and then
talk about this probability distributions which are available in R.
dnorm(x,y,z): returns the pdf (probability distribution function) value ofx in a normal distributionwith mean y and standard deviation z.
pnorm(x,y,z): returns the cdf (cumulative density function) value of x in a normal distributionwith mean y and standard deviation z.
qnorm(x,y,z): returns the inverse cdf value of x in a normal distribution with mean y and standarddeviation z. Clearly x must be in the unit interval (x [0, 1]).
rnorm(x,y,z): returns a vector of random variates (RVs) which has length x. The variates willfollow a normal distribution with mean y and standard deviation z.
Check out the following examples about normal distribution:
dnorm(0.5) # if no parameter is defined, R assumes a std. normal distribution
# [1] 0.3520653
dnorm(0,2,1)
# [1] 0.05399097
dnorm(3,3,5)
# [1] 0.07978846
pnorm(0) # the area below the curve
# on the left side of "0" in a std. normal distribution
# [1] 0.5
pnorm(2)
# [1] 0.9772499pnorm(5,3,1)
# [1] 0.9772499
# following are the inverse of the previous "pnorm()" functions
qnorm(0.5)
# [1] 0
qnorm(0.9772499)
# [1] 2.000001
qnorm(0.9772499,3,1)
# [1] 5.000001
rnorm(20,2,1) # will generate 20 RVs which follow normal dist.
# with mean 2 and std. dev. 1# [1] 2.31502453 0.37445729 2.04994863 1.89381118 0.63099383 1.50837615
# [7] 0.57363369 2.84601422 2.54003868 3.43652548 0.88941281 3.36373629
# [13] 0.58945290 2.44678124 -0.05360271 2.73920472 2.73643684 1.79465998
# [19] 1.30906099 2.18648566
14
8/2/2019 R Tutorial.pdf
15/27
Here is a list of useful distributions that are available for computation in R. There are also otherdistributions which are available in R but not in this list. (For each distribution below, you can obtainthe cdf function by changing the initial letter d to p, the inverse cdf by changing to q and random variategenerator by changing to r). Apart from the normal distribution, please intend to practice and learnabout d,p,q,r functions over the first nine distributions in this list3.
dpois(x,y) : returns the pmf (probability mass function) value of x in a Poisson distribution withmean (rate) y.
dbinom(x,y,z) : returns the pmf value of x in a binomial distribution with a population size yand success probability z.
dgeom(x,y) : returns the pmf value of x in a geometric distribution with a success probability y. dunif(x,y,z) : returns the pdf value of x in a uniform distribution with lower bound y and upper
bound z.
dexp(x,y) : returns the pdf value of x in a exponential distribution with a rate parameter y. dgamma(x,y,scale=z) : returns the pdf value of x in a gamma distribution with a shape parameter
y and a scale parameter z. (If you do not write scale in parameter definition, it assumes z as therate parameter, which is equal to 1/scale)
dchisq(x,y,z) : returns the pdf value of x in a chi-square distribution with degrees of freedom yand the non-centrality parameter z.
dt(x,y,z) : returns the pdf value of x in a t-distribution with degrees of freedom y and thenon-centrality parameter z.
df(x,y,z,a) : returns the pdf value of x in a F-distribution with degrees of freedom-1 y, degreesof freedom-2 z and the non-centrality parameter a.
dcauchy(x,y,z) : returns the pdf value of x in a Cauchy distribution with a location parameter yand scale parameter z.
dnbinom(x,y,z) : returns the pmf value of x in a negative binomial distribution with dispersionparameter y and success probability z.
dhyper(x,y,z,a) : returns the pmf value of x (number of white balls) in a hyper geometricdistribution with a white population size y, a black population size z, number of drawings madefrom the whole population a.
dlnorm(x,y,z) : returns the pdf value of x in a log-normal distribution with log-mean y andlog-standard deviation z.
dbeta(x,y,z) : returns the pdf value of x in a beta distribution with shape-1 parameter y andshape-2 parameter z.
dlogis(x,y,z) : returns the pdf value of x in a logistic distribution with a location parameter yand scale parameter z.
dweibull(x,y,z) : returns the pdf value of x in a Weibull distribution with a shape parameter yand scale parameter z.
3For IE 586 students, it is sufficient to practice and learn about the first five distributions in the list.
15
8/2/2019 R Tutorial.pdf
16/27
3.2 Statistical Functions in R
You can find the mean of a vector with the function mean(), its standard deviation with sd(), its variancewith var(), its median with median(). You can use the function summary() to learn about 25 and 75percent quantiles (which are called quartiles altogether with the median).
x
8/2/2019 R Tutorial.pdf
17/27
4 Creating Functions and Defining Loops in R
4.1 Creating Functions in R
We use the following structure in order to create a specific function which is not already defined in R.
# f
8/2/2019 R Tutorial.pdf
18/27
# EXAMPLE 02
# A function that yields the perimeter and the area of a triangle
# given corner coordinates in R2
# Check "www.mathopenref.com/coordtriangleareabox.html" for the explanation
triangle
8/2/2019 R Tutorial.pdf
19/27
coora
8/2/2019 R Tutorial.pdf
20/27
# EXAMPLE 04
f
8/2/2019 R Tutorial.pdf
21/27
simmax2unif(100000)
# expectation
# 0.665354266
system.time(x
8/2/2019 R Tutorial.pdf
22/27
Here is a basic root finding algorithm that uses a while-loop:
# a root finding algorithm
# finds the unique real root of a continuous function in an interval
# the function should intersect with x-axis and should not be a tangent to x-axis
findroot
8/2/2019 R Tutorial.pdf
23/27
5 Drawing Plot Diagrams and Histograms in R
We would like to draw a plot diagram for the density function of standard normal distirbution in theinterval (-4,4). We should create a dense vector in the x-axis (it should be dense in order to make a goodapproximation), and evaluate their function responses as a second vector.
x
8/2/2019 R Tutorial.pdf
24/27
Figure 1: Plot diagrams for the density function of standard normal distribution
Figure 2: Histograms of a vector of normal RVs with mean 3 and standard deviation 1.5
Figure 3: Adding lines on existing diagrams with lines() (1-2) and abline() (3) commands
24
8/2/2019 R Tutorial.pdf
25/27
6 Basic User Information
6.1 Scaning and Printing Data
Assume that you have a data7 written in a text file in the following format.
3 25 94.9 12547 32556 56
89 567
435 342.1
76.5 983.2
0 343
# There are 15 real values
You can use the command scan() in order to store this data in a vector by scanning it from left toright and top to down. Spaces and new lines will separate the values to store them in new indices.
x
8/2/2019 R Tutorial.pdf
26/27
x
8/2/2019 R Tutorial.pdf
27/27
apropos("exp")
# [1] ".__C__expression" ".expand_R_libs_env_var" ".Export"
# [4] ".mergeExportMethods" ".standard_regexps" "as.expression"
# [7] "as.expression.default" "char.expand" "dexp"
# [10] "exp" "expand.grid" "expand.model.frame"
# [13] "expm1" "expression" "getExportedValue"
# [16] "getNamespaceExports" "gregexpr" "is.expression"
# [19] "namespaceExport" "path.expand" "pexp"
# [22] "qexp" "regexpr" "rexp"
# [25] "SSbiexp" "USPersonalExpenditure"
If you need to see all the objects that you have created in your work session, simply write objects().
objects()
# [1] "a" "b" "circle" "coora"
# [5] "coorb" "coorc" "error" "f"
# [9] "findroot" "fixedcost" "func" "int"
# [13] "lbound" "marginalcost" "n" "orderingcostlist"
# [17] "res" "simmax2unif" "simmax2unif_2" "totalcost"
# [21] "triangle" "ubound" "units" "vec"# [25] "x" "xest" "xinv" "y"
# [29] "y1" "y2" "y3" "y4"
# [33] "y5" "y6" "z"
You can always save your R session together with the objects that you have created by clicking File,then Save Workspace from the quick access bar. You can always reach your saved workspaces by adouble-click on the saved file.
Recommended