Inian Eco Lecture 02

Embed Size (px)

Citation preview

  • Getting Started with R Alok Srivastava Lecture - 02

    Getting started with R

    Alok SrivastavaCRRAO-AIMSCS, Hyderabad, INDIA

    Jan 08, 2015

  • Topics

    Basics of R Programming Alok Srivastava 101212

    Topics

    1 How to use R

    2 Data types in R

    3 Data creation

    4 Data curation

    Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Topics

    Basics of R Programming Alok Srivastava 101212

    Topics

    1 How to use R

    2 Data types in R

    3 Data creation

    4 Data curation

    Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Topic 1 : How to use R

    Basics of R programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Check your Working Path

    R installation directory R.home() # R installation directory {which R}

    Check your working path getwd() # To get the location of current working

    directory

    Linux /home/alok/WorkShop/2014/Workshop_UoH_14_Jan/Lecture2

    WindowsC:/Users/Alok/Documents

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    Itll help to load the path

    Getting Started with R Lecture 02

    H1

  • Change your Working Path

    Change your working path setwd() # To change the location of working

    directory

    Recheck your working directory getwd()

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H2

    Strings

  • Working with Text editor

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    - Use hash # to comment

  • Use R as Calculator

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    Airthmetic Operators

    Addition + Subtraction - Multiplication * Division / Exponent ^ OR ** Modulus (x mod y) x%%y Integer Division x%/%y

    H3 H4

    Variable

    H5

    Mulitple Variable

  • Workspace in R

    Save Workspace Save workspace

    save.image() # Default file .Rdata unlink(.RData) # To remove save.image(mywork.Rdata) # In specific file load(mywork.Rdata) # Load previous work savehistory(file=abc) # Save in txt file, default .Rhistory loadhistory(file=abc) # Load history from file

    Quit Session q() # It will ask to save the workspace

    image? [y/n/c]

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Character variable in R

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    - to store name or categorical variable

    - Use double quotes to store variable

    - Use with c operator to store multiple values

    H6

  • Getting help in R Within R:

    The ? Command can be used to get help on a specific command within R ? keyword or help(keyword) # Command search library(help=pamr) ??keyword or help.search(keyword) # If dont know function apropos("mean") # list all functions containing

    string meanSearch library functions

    library(help=base) # List of base function available with R console library(help=samr) # To display the list of function available in

    package samr. But to display the help page, first we have to load the library.

    Documentation Help files can be accessed in the text file or html format. Manuals, reference cards, tutorials and news about recent developments

    are available at http://www.r-project.org/other-docs.html Online help

    R-help : https://www.stat.math.ethz.ch/pipermail/r-help/ Bioconductor-help : https://stat.ethz.ch/mailman/listinfo/bioconductor

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H6-b

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    PracticeSession:1

    Getting Started with R Lecture 02

  • Topic 2 : Data Types in R

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Variable types in R Numeric

    Integer x = 6is.real(x) # TRUEis.integer(x) # FALSE

    Logicalx = c(1,2,3,4,5); y = (x

  • Vectors in R Vectors may have mode logical,numeric,character.

    Examples of Vectorsx = c(45, 90, 135 )

    y = c("Kinjal","Madhav","Roopa","Suraj")

    z = c(" gene1 " , " gene2 " , " gene3 " , " gene4" )

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Arrays in R Vectors may also have mode logical,numeric,character. Two dimension array is same as matrix

    Examples of Two dimension arrayx = array(data, dim)

    x = array(1:3, c(2,4))

    Examples of Three dimension array x = array(1:3, c(2,4,2)) # 2, represent the dimension

    x = array(1:3, c(2,4,3)) # 3, represent the dimension

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Matrices in R

    Col1 Col2 Col3 Col4

    Row1

    Row2

    Row3

    Is a matrix

    Dimension : 3 X 4

    Row names : Row1, Row2, Row3

    Column names : Col1, Col2, Col3

    Row size: 3

    Column size: 4

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Data frames in R

    Col1 Col2 Col3 Col4

    Row1

    Row2

    Row3

    Data frame is a generalization of a matrix

    Different column may have different data types

    All elements of any column must ,have the same datatype, i.e. all numeric, or all factor, or all character, or all logical

    Use for R modeling and graphical functions

    If the data is read in using the command read.csv, read.txt etc, it will automatically be saved as a data frame.

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Data lists in R

    Row1

    Row2

    Row3

    Data list is arrangement of different lists

    Different rows may have different number of variables

    All elements of any rows must ,have the same datatype, i.e. all numeric, or all factor, or all character, or all logical

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Topic 3 : Data Creation

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Variable types in R Numeric

    Integer x = 6is.real(x) # TRUEis.integer(x) # FALSE

    Logicalx = c(1,2,3,4,5); y = (x

  • Vectors in R Vectors may have mode logical,numeric,character.

    Examples of Vectorsx = c(45, 90, 135 )

    y = c("Kinjal","Madhav","Roopa","Suraj")

    z = c(" gene1 " , " gene2 " , " gene3 " , " gene4" )

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H8

  • Data Creation : Vectors and Arrays Data creation :

    c(1,2,3,4) # combine argument to create a vector

    from:to # create sequence from to to

    seq(from,to,by=diff) # create airthmetic series

    rep(c(1,2,3,4),4) # Replicate Elements of Vectors

    array(1:3, c(2,4)) # create array of size 2X4

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H9

  • Arrays in R Vectors may also have mode logical,numeric,character. Two dimension array is same as matrix

    Examples of Two dimension arrayx = array(data, dim)

    x = array(1:3, c(2,4))

    Examples of Three dimension array x = array(1:3, c(2,4,2)) # create two array of size 2X4

    # 2, represent the dimension

    x = array(1:3, c(2,4,3)) # 3, represent the dimension

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H10

  • Data creation : Matrices in RA = 1:3B = 4:6c = 7:9

    # cbind combined object by ColumnX = cbind(a,b,c)

    # rbind combined object by RowY = rbind(a,b,c)

    # Matrix by defining number of rows and columnsZ = matrix(c(1,4,6,2,3,7.8), nrow=2, ncol=3, byrow=T)

    Z = matrix(c(1,4,6,2,3,7.8), nrow=2, ncol=3, byrow=F)

    expression_data = matrix(c(1,2,3, 11,12,13), nrow = 2, ncol=3, byrow=TRUE,dimnames = list(c("gene1", "gene2"),c("Sample.1", "Sample.2", "Sample.3")))

    # To generate random matrix of 10 rows and 5 columnsreplicate(5, rnorm(10))

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H11

  • Data Creation : Data framesData Frame :

    go.term = c (GO0009117,GO0009253,GO0009354) gene.count = c(15,18,25) avg.expression.value = c(0.5432,0.2371,0.7867) go.term.rank.rank= c(2,1,3)

    mydata = data.frame (go.term,gene.count,avg.gene.expression,go.term.rank)

    mydata2 = data.frame(rank=1:4,gene_name=c("ddr1","apr2","bac","p53"),n=c(.90,.75,.52,.31));

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H12

  • Data Creation : Data ListsData List :

    genelist1 = c (abc1,abc2) genelist2 = c(brca1,brca2,tp53,mdm2) genelist3 = c(apr,erpn,myc)

    mylist = list (genelist1,genelist2,genelist3)

    mylist2 = list(rank=1:4,gene_name=c("ddr1","apr2","bac","p53"),n=c(.90,.75,.52,.31));

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H13

    List : collection of several objects of any type x1 = c("gene1","gene2","gene3","gene4",gene5) x2 = c(2,4,7,9,11) x = list(x1,x2)

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    PracticeSession:2

  • Topic 4 : Data Curation

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

  • Variable Information is.na (x) # To identify missing values is.array(x) # To store one, two or more dimension data is.vector(x) # One dimension array is.matrix(x) # Two dimension array is.data.frame(x) is.numeric(x) is.complex(x) is.character(x)

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H14

  • Variable conversion

    as.vector(x) as.matrix(x) as.data.frame(x) as.character(x)

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H14

  • Variable attributes

    Attributes length(x) # Length of vector dim(x) # Dimension of matrix dimnames(x) # Dimension names

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H15

  • Missing ValuesVariables of each data type (numeric, character, logical) can also take the

    value NA: not available.

    - NA is not the same as 0

    - NA is not the same as

    - NA is not the same as FALSE

    For any operations (calculations, comparisons) that involve NA, we have to logically indicate whether missing values should be considered or removed.

    > NA==1

    [1] NA

    > 1+NA

    [1] NA

    > max(c(NA, 4, 7))

    [1] NA

    > max(c(NA, 4, 7), na.rm=T)

    [1] 7

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H16

  • Data selection and manipulation Slicing and Extracting Data : Vectors

    x[n] # nth element x[-n] # all but nth element x[1:n] # first n element x[-c(1:n)] # elements from n+1 to end x[c(2,5,7)] # specific elements x[x>5] # all elements greater than 5 x[x5 & x < 9] # all elements between 5 and 9 x[x %in% c("ab","sh")] # elements in given vector

    Data selection from list and data frame : x[[n]] # nth element of the list x$name # extract x attribute with variable name attributes(x) # attributes of data frame

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H17

  • Basic Matrix operation Matrix curation :

    x[r,c] # element at rth row and cth column x[r,] # row r X[,c] # column c x[c(2,5,8)] # To select specific column

    Matrix operation: dim(x) # Dimesnion of matrix x+y # Sum of matrix x and y dim(x) # Dimesnion of matrix t(x) # Transpose of matrix diag(x) # Diagonal element of matix nrow(x) # numer of rows rownames(x) # row names rowSums(x) # row sum rowMeans(x) # row means

    cor(x) # correlation matirx var(x) # variance matrix

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H18

  • Data selection and manipulation Data selection and manipulation :

    X*2 # scalar multiplication length(x) # length of the vector sum (x) # sum of element in vector max, min # max and min values rev # reverse in order sort # sorting unique # unique rle # run length encoding table(a,b) # comparison table sample(x) # for random sampling of the data which.max(x) # return index of the max elements of x. Which.min(x) # return index of the min elements of x. Which (x == a) # returns a vector of indices of x, if

    comparsion operator is TRUE Which (x %in% a) # return index which matches with a choose (n,k) # combinations of k events among n repetitions. rank(x) # ranking round(x,3) # round the element of x to 3 decimal places order

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H19

  • Basic Math and Statistics function Basic Maths and Statistics functions:

    sqrt(x) # square root of the function sin(x), cos(x), # trignometry functions asin(x), acos(x) # inverse trignometry functions log(x), log10(x), log(x,base) # log exp(2) # exponential function max(x), min(x), # min and max value range(x) # range sum(x) # sum of x

    mean(x) # mean of the elements of x median(x) # median of the elements of x var(x) # variance of the element of x sd(x) # standard deviation of x

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H20

  • Advance R Built-in function functions:

    abs(x) # absolute values ceiling(x) # Ceiling floor(x) # floor trunc(x) # trunc round(3.4578) # round, decimal place signif(3.4578) # signif, significant digits

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    H21

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    PracticeSession:3

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    Exercise:1

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    Exercise 1: Round off the number 3.543321 up to three decimal place.

    Exercise 2: Generate a sequence, x=seq(1,524,d). where d is a random number between 2 to 9. Find

    length(x) sum(x) cube root of x extract 5, 7th element from vector x extract 2nd to 5th element from vector x create vector without 2nd to 5th element from

    vector x which elements of vector x are greater than 10 find a vector whose elements are greater than 10 find a vector whose elements are greater than 10

    and less than 50 find: max, min, rev, sort, unique, range

    Getting Started with R Lecture 02

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    Exercise 3: Explore the commands a=rep(2,5) b=rep(3,7) c=rep(4,2) z2= c(a,b,c) z= sample(z2) # analyze z u = rle(z) sort (z) unique (z) what sample command does? attributes of u analyze u # Interpret what rle does mean, median, var,sd, convert z into log scale

    Getting Started with R Lecture 02

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    Exercise 4: Generate 20 replicate of TRUE sample denoted by T, and 10 replicate of FALSE sample denoted by F.

    Exercise 5: Generate a vector z1 from 2 to 5, second vector z3 from 12 to 15, and combine them into a new vector z.

    Exercise 6: Write the sequence expression for 5 10 15 20 25 30 35 40 45 50

    Exercise 7: Generate a sequence start with 19 to 957, with a difference of 17.

    Exercise 8: Generate any 3X4 matrix using command matrix

    Exercise 9: Ceate 3 vectors, a,b,c of size 5, generate a matrix using cbind and rbind, calculate the dimension of matrix.

    Getting Started with R Lecture 02

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    Exercise 10: A class of 20 student, appeared for maths and biology exam, secure marks between 20 to 90. Genearte random marks, satisfying the above criteria in a matrix, that contain First Row as name of the student as S1, S2, ...., S20, and first column as the subject math1 and bio1 respectively.Save the name of the students and marks of the student who,

    Secure more than 70 % marks in either of two subjects, and

    Fail in either of two. Average marks secure by students in both subjects

    Getting Started with R Lecture 02

  • Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02

    DontforgettosaveWorkspace..

    Getting Started with R Lecture 02

  • THANK YOU .........

    Alok SrivastavaAssistant Professor, CRRAO- AIMSCS, Hyderabad, INDIA

    Date 8-01-15

    Basics of R Programming Alok Srivastava 101212Basics of R programming Lecture - 02Getting Started with R Lecture 02

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46