4
An introduction to WS 2018/2019 Dr. Sonja Grath Dr. Eliza Argyridou Special thanks to: Prof. Dr. Martin Hutzenthaler and Dr. Benedikt Holtmann for significant contributions to course development, lecture notes and exercises Getting Started with R 2 What you should know after day 2 Part I: Getting started What is R How R is organized Installation of R and Rstudio Organize your R session Part II: Basics R as calculator What is a function? What is an assignment? How to plot a continuous function and how to make a scatterplot Getting help 3 Part I Getting started 4 What is R? R is a comprehensive statistical environment and programming language for professional data analysis and graphical display. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories. Webpage: http://www.r-project.org Advantages: R is free New statistical methods are usually first implemented in R Lots of help due to active community Disadvantages: R has a long learning phase No 'undo' Work with scripts 5 R commands are organized in packages (also called libraries) Examples: stats, datasets, ggplot2, dplyr To use a package, it has to be installed AND loaded! Which packages are loaded at start? library(lib.loc=.Library) Which packages are installed? installed.packages() Load package: library(packagename) How to get help: library(help="package") ??package How R is organized Try yourself: library(help="ggplot2") ??ggplot2 6 Pre-Defined Datasets R comes with a huge amount of pre-defined datasets, available in the package ‘datasets’ (usually available at start) Examples: 'cars', 'mtcars', 'chickwts', ... → can be used for exercises, demonstration of in-built functions... How to use a dataset: data(cars) How to get help on a dataset: ?cars We will use pre-defined datasets in some of the exercises

Getting Started with R - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1819/slides/... · Getting Started with R 2 What you should know after day 2 Part I: Getting started

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Getting Started with R - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1819/slides/... · Getting Started with R 2 What you should know after day 2 Part I: Getting started

An introduction toWS 2018/2019

Dr. Sonja GrathDr. Eliza Argyridou

Special thanks to: Prof. Dr. Martin Hutzenthaler and Dr. Benedikt Holtmann for significant contributions to

course development, lecture notes and exercises

Getting Started with R

2

What you should know after day 2

Part I: Getting started

● What is R● How R is organized● Installation of R and Rstudio● Organize your R session

Part II: Basics

● R as calculator● What is a function?● What is an assignment?● How to plot a continuous function and how to make a scatterplot● Getting help

3

Part IGetting started

4

What is R?● R is a comprehensive statistical environment and programming

language for professional data analysis and graphical display.

● It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories.

● Webpage: http://www.r-project.org

Advantages:● R is free● New statistical methods are

usually first implemented in R● Lots of help due to active

communityDisadvantages:● R has a long learning phase● No 'undo'➔ Work with scripts

5

R commands are organized in packages (also called libraries)

Examples: stats, datasets, ggplot2, dplyr

To use a package, it has to be installed AND loaded!

Which packages are loaded at start?library(lib.loc=.Library)Which packages are installed?installed.packages()

Load package:library(packagename)

How to get help:library(help="package")??package

How R is organized

Try yourself:library(help="ggplot2")??ggplot2

6

Pre-Defined Datasets

R comes with a huge amount of pre-defined datasets, available in the package ‘datasets’ (usually available at start)

Examples: 'cars', 'mtcars', 'chickwts', ...→ can be used for exercises, demonstration of in-built functions...

How to use a dataset:data(cars)

How to get help on a dataset:?cars

➔ We will use pre-defined datasets in some of the exercises

Page 2: Getting Started with R - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1819/slides/... · Getting Started with R 2 What you should know after day 2 Part I: Getting started

7

What is R Studio?

● Powerful Integrated Development Environment (IDE) for R

● It is free and open source, and works on Windows, Mac, Linux and over the web

● Webpage: https://www.rstudio.com/

● We will work with RStudio throughout the course, but everything we do would also work in the R console

RProgramming.net

Some benefits:Syntax highlighting, code completion, and smart indentationExecute R code directly from the source editorEasily manage multiple working directories using projectsPlot history, zooming, and flexible image and PDF exportIntegrated R help and documentationSearchable command history

8

Organize your R session

General advice:

● Organize your work in folders (e.g., “Rcourse/Day2”)

● Save your commands in scripts

What is a script?

● A computer-readable text file (do not use MS Word or similar)

● For R, the conventional extension is .R

Example:

example.R (see webpage)

9

Questions:1) What does “#” stand for?2) Who wrote the script and when?3) Where does the data come from?4) Which command is used to compute the clutch size mean for the European green woodpecker?5) How do we get information on the current session?

10

How to organize your work and a R sessionA recipe in 15 steps

1) On your computer, create a folder for your project and keep all your material there (e.g. Rcourse/day2)

2) Start Rstudio or a R console

3) Open a new script file or a pre-existing script

4) Save the file to the folder you created in step 1 (file extension .R), save it regularly (!) ('Ctrl+S')

5) Write all your instructions for R into this script file, execute and test your instructions (with button or 'Ctrl+Enter' in RStudio)

6) Always put comments (#) in your script

7) Clean R at the beginning of your work

8) Set your working directory with setwd("path2directory")or over menu in RStudio (Session -> Set working directory)

9) Check your working directory with getwd()

10) Load (and install) required packages● Install with install.packages("name") - only once, need to specify CRAN mirror● Load with library(name) – each session if required

11) Import your data and perform your analyses

12) Output is saved in your working directory (if folder unspecified)

13) Consider sessionInfo() and Sys.time()

14) Save your final script

15) Quit your session (q() in console) and save workspace if required

11

Part IIBasics

12

R as calculator

Basic arithmetic operations

2+3

7­4

3*5

7/3; 2^6

Integer vs. modulo division

5 %/% 3 # “5 divided by 3 without decimal positions” 

6 %% 2 # “if you divide 6 by 2 – what's the rest?” 

Caution: German (Spanish, French..) decimal notation does not work!> 1,2Error: unexpected ',' in “1,”

> 1.2

Page 3: Getting Started with R - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1819/slides/... · Getting Started with R 2 What you should know after day 2 Part I: Getting started

13

Functions/Commands

General form:

function()

Examples:

sqrt()exp()sum()prod()...

Functions can have pre-defined parameters/arguments with default settings

→ help page of the function

?read.table()

14

Examples for functions in R

Important functions

exp()sin()cos()max()min()sum()prod()sqrt()factorial()choose()

factorial()“4 factorial”, 4!→ 4*3*2*1

choose()“5 choose 2”, (ab)

Try yourself:exp(1); exp(log(5))sin(pi/2)cos(pi/2)max(4,2,5,1); min(4,2,5,1) sum(4,2,5,1); prod(4,2,5,1)sqrt(16)factorial(4)choose(5,2)

15

How to plot a continuous function

You need

● Plotting function● The continuous function you want to plot● Range [a, b]

If fun is a function, then plot(fun, from = a, to = b) plots fun in the range [a, b]

Examples:plot(sin, from = -2*pi, to = 2*pi)plot(dnorm, from = -3, to = 3)

As plotting function you can use

plot()

16

17

AssignmentsGeneral form:variable <- value

Example:x <­ 5“The variable 'x' is assigned the value '5'”

Valid variable names: may contain numbers, '_', charactersAllowed:my.variable, my_variable, myVariable, favourite_color, a, b, c, data2, 2test …NOT allowed:

– '.' followed by number at the beginning: .4you

– reserved words, e.g.: if, else, repeat, while, function, for, FALSE, TRUE, …

R is case-sensitive: data, DATA, Data – all different variable names

In general: Use IDE (RStudio) and consider following a style guide for your code

Examples for style guides: http://adv-r.had.co.nz/Style.html Hadley Wickhamhttps://google.github.io/styleguide/Rguide.xml Google Style Guide for R

18

Assignments

You can write an assignment in three different ways:x <­ 5 # preferred version in scripts5 ­> x

x = 5

But – have a look here:plot(dnorm, from = ­3, to = 3)

Works with longer expressions:x <­ 2y <­ x^2 + 3z <­ x + y

And with complete vectors (more on vectors tomorrow):x <­1:10> x [1]  1  2  3  4  5  6  7  8  9 10y <­x^2  > y [1]   1   4   9  16  25  36  49  64  81 100

Page 4: Getting Started with R - Evolutionary Biologyevol.bio.lmu.de/_statgen/Rcourse/ws1819/slides/... · Getting Started with R 2 What you should know after day 2 Part I: Getting started

19

How to plot numerical vectors

You need

● Plotting function● Two vectors

If x and y are numerical vectors, then plot(x, y) produces a scatterplot of y against x

Examples:x <- 1:10y <- x^2plot(x, y)

As plotting function you can use

plot()

20

Help!

R console

help(solve) # help page for command "solve"

?(solve) # same as help(solve)

help("exp")

help.start()

help.search("solve") # list of commands which could                         # be related to string "solve"

??solve #same as help.search("solve")

example(exp) #examples for the usage of 'exp'

example("*") # special characters have to be in# quotation marks

21

What you should do now

If you have your own laptop or computer1. Install R and RStudio (see web tutorials)2. Read the first chapter of “R in Action” (course web page)http://www.manning.com/kabacoff/SampleCh-01.pdf 3. Open a R session and try the commands we learned today (lecture slides)→ if you have trouble with installing R, ask us/the tutors4. Work on the first exercise sheet 5. Go over the solutions of exercise sheet 1

If you don't have your own laptop or computer1. Read the first chapter of “R in Action” (course web page)http://www.manning.com/kabacoff/SampleCh-01.pdf 2. This afternoon in the exercise session: open a R session and try the commands we learned 3. Work on the first exercise sheet 4. Go over the solutions of exercise sheet 1

22

R is programming...

… you can only be a good programmer if you practice a lot!

23

Literature

R in Action Data Analysis and Graphics with R2nd edition (2011)Robert I. Kabacoffhttps://www.manning.com/books/r-in-action-second-edition

Homework: Read Chapter 1 (freely available online as PDF)

Webpagehttp://www.statmethods.net/

… and many, many more (also free web tutorials)

Getting started with RAn Introduction for Biologists(2017)Andrew P. Beckerman, Dylan Z. Childs & Owen L. Petchey