15
R: THE TRUE BASICS R, also called the Language for Statistical Computing, was developed by Ross Ihaka and Robert Gentleman at the University of Auckland in the nineties. It is considered an open source implementation of the S language, which was developed by John Chambers in the Bell Laboratories in the eighties. R provides a wide variety of statistical techniques and visualization capabilities. Another very important feature about R is that it is highly extensible. Because of this and more importantly because R is open source, it actually was the vehicle to bring the power of S to a larger community. Like in every programming language, there are pros and cons. ADVANTAGES: 1) It is an open source and free. 2) Master at graphics 3) Command – line Interface 4) Reproducibility through R scripts 5) R packages: Extensions of R DISADVANTAGES: 1) Easy to learn, harder to master 1

R Programming

Embed Size (px)

Citation preview

Page 1: R Programming

R: THE TRUE BASICS

R, also called the Language for Statistical Computing, was developed by Ross Ihaka and Robert Gentleman at the University of Auckland in the nineties. It is considered an open source implementation of the S language, which was developed by John Chambers in the Bell Laboratories in the eighties.

R provides a wide variety of statistical techniques and visualization capabilities. Another very important feature about R is that it is highly extensible. Because of this and more importantly because R is open source, it actually was the vehicle to bring the power of S to a larger community. Like in every programming language, there are pros and cons.

ADVANTAGES:

1) It is an open source and free.

2) Master at graphics

3) Command – line Interface

4) Reproducibility through R scripts

5) R packages: Extensions of R

DISADVANTAGES:

1) Easy to learn, harder to master

2) Poorly written code hard to read/maintain

3) Command – Line interface daunting at first

4) Poorly written code is slow

The first step in R is one of the most important components of R, and where most of the action happens, is the R console. It's a place where you can execute R commands. You simply type something at the prompt in the console, hit Enter, and R interprets and executes your command.

1

Page 2: R Programming

Let's start our experiments by having R do some basic arithmetic; we'll calculate the sum of 1 and 2. We simply type 1 + 2 in the console and hit Enter. R compiles what you typed, calculates the result and prints that result as a numerical value.

Now let's try to type some text in the console. We use double quotes for this. You can also simply type a number and hit Enter. R understood your character string and numerical value, but simply printed that string as an output. This sbrings me to the first super important concept in R: the variable. A variable allows you to store a value or an object in R. You can then later use this variable's name to easily access the value or the object that is stored within this variable. You can use the less than sign followed by a dash to create a variable.

2

Page 3: R Programming

Suppose the number 2 is the height of a rectangle. Let's assign this value 2 to a variable height. We type height, less than sign, dash, 2: This time, R does not print anything, because it assumes that you will be using this variable in the future. If we now simply type and execute height in the console, R returns 2: We can do a similar thing for the width of our imaginary rectangle. We assign the value 4 to a variable width. If we type width, we see that indeed, it contains the value 4. As you're assigning variables in the R console, you're actually accumulating an R workspace. It's the place where variables and information is stored in R.

You can access the objects in the workspace with the ls() function. Simply type ls followed by empty parentheses and hit enter. This shows you a list of all the variables you have created in the R session. If you have followed all the examples up to now, you should see "height" and "width". This tells you that there are two objects in your workspace at the moment. When you type height in the console, R looks for the variable height in the workspace, finds it, and prints the corresponding value. If, however, we try to print a non-existing variable, depth for example, R throws an error, because depth is not defined in the workspace and thus not found. The principle of accumulating a workspace through variable assignment makes these variables available for further use. Suppose we want to find out the area of our imaginary rectangle, which is height multiplied by width. Let's go ahead and type height asterisk width. The result is 8, as you'd expect. We can take it one step further and also assign the result of this calculation to a new variable, area. We again use the assignment operator. If you now type area, you'll see that it contains 8 as well. Inspecting

3

Page 4: R Programming

the workspace again with ls, shows that the workspace contains three objects now: area, height and width.

4

Page 5: R Programming

Basic Data TypesR's fundamental data types, also called atomic vector types. Throughout our experiments, we will use the function class(). This is a useful way to see what type a variable is. Let's head over to the console and start with TRUE, in capital letters. TRUE is a logical. That's also what class(TRUE) tells us. Logical are so-called boolean values, and can be either `TRUE` or `FALSE`. Well, actually, `NA`, to denote missing values, is also a logical.

We can perform all sorts of operations on them such as addition, subtraction, multiplication, division and many more. A special type of numeric is the integer. It is a way to represent natural numbers like 1 and 2. To specify that a number is integer, you can add a capital L to them. We don't see the difference between the integer 2 and the numeric 2 from the output. However, the `class()` function reveals the difference. Instead of asking for the class of a variable, you can also use the is-dot-functions to see whether variables are actually of a certain type. To see if a variable is a numeric, we can use the is-dot-numeric function. It appears that both are numeric.

To see if a variable is integer, we can use is-dot-integer. This shows us that integers are numeric, but that not all numeric are integers, so there's some

5

Page 6: R Programming

kind of type hierarchy going on here. Last but not least, there's the character string. The class of this type of object is "character".

It's important to note that there are other data types in R, such as double for higher precision numeric, complex for handling complex numbers, and raw to store raw bytes.

6

Page 7: R Programming

Vectors

1) Create and name vectors: A vector is nothing more than a sequence of data elements of the _same_ basic data type. First things first: creating a vector in R! You use the `c()` function for this, whichallows us to combine values into a vector. Suppose you're playing a basic card game, and record the suit of 5 cards you draw from a deck. A possible outcome and corresponding vector to contain this information could be this one of course we could also assign this character vector to a new variable, drawn suits for example. We now have a character vector, drawn suits. We can assert that it is a vector, by typing is dot vector drawn suitsLikewise, we could create a vector of integers for example to store how much cards of each suit remain after we drew the 5 cards.

Let's call this vector remain. There are 11 more spades, 12 more hearts, 11 diamonds, and all 13 clubs still remain..

7

Page 8: R Programming

We can use the `names ()` function for this. Let's first create another character vector, `suits`, that contains the strings "spades", "hearts", "diamonds", and "clubs", the names we want to give your vector elements.

2) Vector Arithmetic:

We learned that we can use variables to perform arithmetic Remember how you summed apples and oranges? From the previous section, we also know that actually these variables, `my_apples` and `my_oranges`, are simply vectors. This means that we can perform arithmetic with vectors in R.

8

Page 9: R Programming

The most important thing to remember about operations with vectors in R , is that they will be applied element by element. This means that standard mathematics is extended to vectors in an element-wise fashion.Imagine you have a vector containing your gambling earnings for the past 3 days. Not bad for a few days in the desert, is it? Imagine a well-dressed gentleman approaches you and offers to triple your earnings for the past three days, if you beat him in one round of poker. If you want to calculate the expected earnings for each of the past three days, you can easily do it in R.As you can see, R multiplies each element in the `earnings` vector with 3, resulting in 150 dollars of promised earnings in the first day, 300 in the second day and 90 in the third day. .Likewise, division, subtraction, summation and many more are all carried out element wise, just as if you are carrying out the operation between two scalars three times. From these lines of code you don't see anything different from what we've done before, because of course, you were working with vectors all along. The mathematics naturally extended to vectors that contain more than one element. Let's go back to your Vegas adventures. To enjoy your earnings, you also decided to go shopping and spend some money every day on the Las Vegas Strip. You recorded a vector of expenses.Because you are a very conscious programmer in training, you decide to compute whether your luck in the casino was sufficient to pay for your expenses.

9

Page 10: R Programming

MATRICES

Creating and naming matrices:A matrix is kind of like the big brother of the vector. Where a vector is a sequence of data elements, which is one-dimensional, a matrix is a similarcollection of data elements, but this time arranged into a fixed number of rows and columns. Since we are only working with rows and columns, a matrix is called two-dimensional.

The matrix can contain only one atomic vector type. This means that you can'thave logical and numeric in a matrix for example. There's really not much more theory about matrices than this: it's really a natural extension of the vector, going from one to two dimensions. Of course, this has its implications for manipulating and subsetting matrices, but let's start with simply creating and naming them. To build a matrix, you use the matrix function. Most importantly, it needs a vector, containing the values you want to place in the matrix, and at least one matrix dimension. You can choose to specify the number of rows or the number of columns. Have a look at the followingexample, that creates a 2-by-3 matrix containing the values 1 to 6, by specifying the vector and setting the row argument to 2: R sees that the input vector has length 6 and that there have to be two rows. It then infers that you'll probably want 3 columns, such that the number of matrix elements matches the number of input vector elements.

10

Page 11: R Programming

If you prefer to fill up the matrix in a row-wise fashion, suchthat the 1, 2 and 3 are in the first row, you can set the `by row` argument of matrix to `TRUE` Can you spot the difference? Remember how R did recycling when you were subsetting vectors using logical vectors? The same thing happens when you pass the matrix function a vector that is too short to fill up the entire matrix. Suppose you pass a vector containing the values 1 to 3 to the matrix function, and explicitly say you want a matrix with 2 rows and 3 columns: R fills up the matrix column by column and simply repeats the vector.

If you try to fill up the matrix with a vector whose multiple does not nicely fit in the matrix, for example when you want to put a 4-element vector in a 6-element matrix, R generates a warning message.Actually, apart from the `matrix()` function, there's yet another easy way to create matrices that is more intuitive in some cases. You can paste vectors together using the `cbind()` and `rbind()` functions. Have a look at these calls`cbind()`, short for column bind, takes the vectors you pass it, and sticks them together as if they were columns of a matrix. The `rbind()` function, short for row bind, does the same thing but takes the input as rows and makes a matrix out of them. These functions can come in pretty handy, because they're often more easy to use than the `matrix()` function.

11

Page 12: R Programming

If you want to add another row to it, containing the values 7, 8, 9, you could simply run thiscommand: You can do a similar thing with `cbind()`: Next up is naming the matrix. In the case of vectors, you simply used the names() function, but in the case of matrices, you could assign names to both columns and rows. That's why R came up with the rownames () and colnames () functions. Their use is pretty straightforward. Retaking the matrix `m` from before,we can set the row names just the same way as we named vectors, but this time with the row names function.

12