R for Absolute Beginners

  • Upload
    pdgcss

  • View
    240

  • Download
    0

Embed Size (px)

Citation preview

  • 7/31/2019 R for Absolute Beginners

    1/23

  • 7/31/2019 R for Absolute Beginners

    2/23

    R for absolute beginners A gentle introduction Duncan Golicher

    How can I break down the calculations of a mean step by step? . . . . . . . . . . . . . . . . . . . . . . . . 20

    How do I calculate a sample standard deviation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    How do I calculate a standard error? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    What does the t-distribution look like and how do I use it to test a null hypothesis? . . . . . . . . . . . . 22

    Introduction

    Dont do Sudukos, use R

    This document is very loosely based on the material that I have previously taught to masters students in spanishsince 2004. I have translated it and brought it up to date in the hope that it might be useful as an introductionto R for a general audience of postgraduate students and researchers. I am an ecologist, not a statistician andhave no formal training in mathematics. I feel this is probably an advantage when teaching introductory mate-rial. It allows me to appreciate the difficulties many have with R and try to draw on my own experiences in or-der to help. My first contact with R was far from encouraging. A statistical colleague mentioned R in an email,claiming it to be the most useful piece of software around. I downloaded and installed R version 1.7. The icon onmy desktop was removed by Windows after about six months due to lack of use. I simply couldnt see what to dowith it. Eventually I did find some time to experiment. I followed the precise but rather terse introductory mate-rial available at the time. This was hard, but fortunately previous experience with computer languages did helpme to get started. I began to use R and quickly found that my colleague was right all along. Using R, insteadof a typical combination of Excel and SPSS does increase the productivity of non statisticians. I was amazed tofind that a single line of R could to do things that were very difficult to achieve in other ways. I quickly became aconvert and wanted to spread the word.

    My subsequent experiences have been both encouraging and frustrating. I have aimed at helping students and re-searchers to follow my path and adopt R as their preferred platform for data analysis. Overall I have been pleas-antly surprised by the high number of successes. I am often contacted for advice by those I have taught severalyears ago as they begin to use R for serious research. At the same time I have been frustrated by the difficulty in

    persuading more to adopt and use R on a routine basis. I have noticed two quite distinct classes of recalcitrants.Students who have never previously seen a command line are quite understandably put off at the start. The Rstyle of working does not offer immediate attractions. R has a very steep learning curve. With some gentle en-couragement this barrier can often be overcome, but it is always a struggle. The second class of potential R usersare much more difficult to persuade. They are experienced researchers who have invested considerable effort inlearning how to use alternative statistics package such as SPSS or SAS. For a researcher time is the resource inshortest supply. Learning a new, superficially more complicated, way to analyse data appears to be a luxury theysimply cant afford. This is also understandable. However it is regrettable. This is the class of users that couldpotentially benefit most from a working knowledge of R. While students can be forced to use R through givingthem evaluated course work, there is little that can be used to encourage researchers apart from demonstratingresults that they would like to produce themselves.

    This course aims to help absolute beginners of both types to move up the initially steep gradient of the learning

    curve and begin to enjoy using R. Once the hard barriers are overcome potential users should find that develop-ing skills in R is a satisfying experience in its own right. At the same time they will find that knowledge of R canresult in much greater scientific productivity. I have aimed for an informal style as much as possible in order toengage with the reader and help to soften the impact. I have also adopted a question and answer format for mostof the document. At each stage I will try to identify the key FAQs and provide my own answer to them.

    What can R do for me?

    An example of potential productivity gains is provided by this document itself. I have written the material us-ing Sweave in Latex (Lyx). This is a great combination of tools for scientific writing. It allows me to embed R

    2

  • 7/31/2019 R for Absolute Beginners

    3/23

    R for absolute beginners A gentle introduction Duncan Golicher

    code directly within the text. This code is run when the document is compiled. The output is thus incorporatedinto quite a professional looking pdf document with no effort. Once the document is set up I dont need to thinkabout the formatting or typesetting. I can simply work on the content. This detail alone shows the immensepower of R and its associated tools. The potential for productivity gains once one has learnt how to use themis endless.

    I will be explaining what R code is later on. However here is an example of a simple line of code that producesoutput.

    library(fortunes)

    fortune("pizza")

    Roger D. Peng: I dont think anyone actually believes that R is designed to

    make *everyone* happy. For me, R does about 99% of the things I need to do, but

    sadly, when I need to order a pizza, I still have to pick up the telephone.

    Douglas Bates: There are several chains of pizzerias in the U.S. that provide

    for Internet-based ordering (e.g. www.papajohnsonline.com) so, with the

    Internet modules in R, its only a matter of time before you will have a

    pizza-ordering function available.Brian D. Ripley: Indeed, the GraphApp toolkit (used for the RGui interface

    under R for Windows, but Guido forgot to include it) provides one (for use in

    Sydney, Australia, we presume as that is where the GraphApp author hails from).

    Alternatively, a Padovian has no need of ordering pizzas with both home and

    neighbourhood restaurants ....

    -- Roger D. Peng, Douglas Bates and Brian D. Ripley

    R-help (June 2004)

    Perhaps pizza lovers outside Australia will be disappointed. However this exchange on the R help list makes avery serious point. I frequently use R to bring in data from the Internet and analyse it. R is often used in financefor producing automated reports on the state of the stock market. Geneticists can use R to consult huge databanks. R is very well integrated within the contemporary cloud style of scientific computing.

    As an example of the unusual ways that R can be used this function fetches the daily sudoku puzzle from

    http://www.sudoku.org.uk/

    library(sudoku)

    puz

  • 7/31/2019 R for Absolute Beginners

    4/23

    R for absolute beginners A gentle introduction Duncan Golicher

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]

    [1,] 6 4 9 7 1 3 2 5 8

    [2,] 1 5 8 9 4 2 3 7 6

    [3,] 2 3 7 5 6 8 1 9 4

    [4,] 9 2 4 1 8 6 7 3 5

    [5,] 5 8 1 3 9 7 4 6 2[6,] 3 7 6 2 5 4 9 8 1

    [7,] 7 1 5 6 2 9 8 4 3

    [8,] 8 6 3 4 7 1 5 2 9

    [9,] 4 9 2 8 3 5 6 1 7

    Solving Sudokus may seem trivial, but it certainly shows the power of the R language to save at least someonestime. It is now very difficult to identify forms of serious scientific computing that havent been implemented in R.However there is an important caveat that is very neatly summed up in this exchange.

    fortune("Yoda")

    Evelyn Hall: I would like to know how (if) I can extract some of theinformation from the summary of my nlme.

    Simon Blomberg: This is R. There is no if. Only how.

    -- Evelyn Hall and Simon Yoda Blomberg

    R-help (April 2005)

    In other words, although you can do almost anything in R, it is often far from obvious how. The main aim ofthis course is to help you get to the stage where you can begin to find ways to solve your own problems using R.

    Some very advanced analysis can only be achieved using R. As Frank Harrell notes, R is at the cutting edge.

    fortune(10)

    Overall, SAS is about 11 years behind R and S-Plus in statistical capabilities

    (last year it was about 10 years behind) in my estimation.

    -- Frank Harrell (SAS User, 1969-1991)

    R-help (September 2003)

    fortune(120)

    Rene M. Raupp: Does anybody know any work comparing R with other (charged)

    statistical software (like Minitab, SPSS, SAS)? [...] I have to show its as

    good as the others.

    Kjetil Brinchmann Halvorsen: Sorry. That will be difficult. Couldnt it do to

    prove it is better?

    -- Rene M. Raupp and Kjetil Brinchmann HalvorsenR-help (May 2005)

    What exactly is R?

    This is not an easy question to answer. In one sense the flexibility and power of R means that it becomes some-thing different to every user. The conventional answer is that R is a system for statistical computation and graph-ics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain systemfunctions, and the ability to run programs stored in script files. In other words R is both a computer languageand a set of procedures that have already been implemented in order to carry out specific tasks.

    4

  • 7/31/2019 R for Absolute Beginners

    5/23

  • 7/31/2019 R for Absolute Beginners

    6/23

    R for absolute beginners A gentle introduction Duncan Golicher

    Mingzhai Sun: When you use it [R], since it is written by so many authors, how

    do you know that the results are trustable?

    Bill Venables: The R engine [...] is pretty well uniformly excellent code but

    you have to take my word for that. Actually, you dont. The whole engine is

    open source so, if you wish, you can check every line of it. If people were out

    to push dodgy software, this is not the way they

    d go about it.-- Mingzhai Sun and Bill Venables

    R-help (January 2004)

    How do I install R?

    R can be run under Windows, Linux, Mac or Solaris and probably most other platforms. A quote from BarryRowlinson that still has not made it into the Fortunes package Id like to see a Nintendo Wii port, just so I canplay Super Mario Generalised Linear Modelling by waving the controller around.

    At the time of writing R for Windows is available by clicking on the link below.

    http://cran.r-project.org/bin/windows/base/R-2.8.0-win32.exe

    Versions change over time. You should ensure that you install the latest version (for example R-2.9.0-win32.exewhen it becomes available). Linux users now find versions of R in the standard repositories of most popular dis-tributions. In my own case I prefer to run R on Ubuntu. This Linux distribution is laptop friendly and handlesmost dependency issues very cleanly. Packages can be installed using Synaptic or aptitude. As a minimum r-base, r-base-dev, r-base-core,r-base-html and r-base-latex are needed. A search for r-cran in the Synaptic PackageManager will show a large number of extension packages. It is probably worth installing all of them right fromthe start in order to save time when they are found to be needed. Linux users can also install RKWard whichprovides a sophisticated graphical interface for R. RKWard also includes a large number of scripts for routine sta-tistical analysis. Together they form a user friendly free alternative to SPSS.

    Windows users can obtain a graphical interface from the sciviews project.

    http://www.sciviews.org/

    Even if the full GUI is not used I recommend the program Tinn-R for editing R scripts with syntax highlightingif you work under Windows.

    http://www.sciviews.org/Tinn-R/index.html

    However this said, I will not be using any graphical interface to R in this course. All the material will be generic,cross platform and robust to any future changes in versions of R. I will explain the dynamic involved in using thematerial in the next section where I will also explain the general procedure for extending R using packages thatcan be used by users on any platform.

    Exercises and activities

    1. Install the latest version of R on your computer.

    2. Investigate the contents of CRAN (http://cran.r-project.org/)

    3. Download all the relevant manuals and courses from the Contributed section of CRAN.

    4. Browse the R graphics gallery (http://addictedtor.free.fr/graphiques/)

    Getting started in R. Finding help and extending R with packages

    Just to get started I will assume that you are running R in Windows. Mac users have a similar experience. In thecase of Linux R is run by typing R in the terminal. By default in Linux no GUI is produced. I will assume that

    6

  • 7/31/2019 R for Absolute Beginners

    7/23

    R for absolute beginners A gentle introduction Duncan Golicher

    Linux users are already hardened to this style of working. On starting R in Windows you are presented with thesuperficially unhelpful looking interface shown below.

    What do I do now?

    This is always where the panic starts. The Windows GUI version at least gives you a few things to click on (theconsole version in Linux just has a cursor), but they dont seem to do very much. There are no inmediate indica-tions that R can do any statistics. The experience for many is quite offputting.

    The first thing to point out is that this interface is in fact more helpful than it looks at first sight. But lets re-turn to that later. First just to break the ice lets make R do something. Anything.

    At this point I will introduce the convention that will be followed throughout this course. All lines that appear inthe format below can (and should) be either typed directly into the console or copied and pasted from this docu-

    ment.

    demo(graphics)

    They will then run. So if you type demo(graphics) things will start happening. In this case R runs through anumber of scripts that give some examples of the sort of graphical output it is capable of. This particular demo isnow quite old and doesnt really do full justice to Rs graphical potential. As the script runs you will be promptedto press return to get more output. I have included the first graph of several such demos below.

    demo(graphics)

    7

  • 7/31/2019 R for Absolute Beginners

    8/23

    R for absolute beginners A gentle introduction Duncan Golicher

    0 10 20 30 40 50

    2

    1

    0

    1

    2

    Simple Use of Color In a Plot

    Just a Whisper of a Label

    demo(image)

    x

    y

    100 200 300 400 500 600 700 800

    100

    200

    300

    400

    500

    600

    Maunga Whau Volcano

    col=terrain.colors(100)

    demo(persp)

    8

  • 7/31/2019 R for Absolute Beginners

    9/23

    R for absolute beginners A gentle introduction Duncan Golicher

    x

    y

    z

    .

    z == Sinc(( x2 ++ y2))

    demo(plotmath)

    Arithmetic Operators

    x + y x ++ y

    x y x y

    x * y xy

    x/y x y

    x %+% y x y

    x%/%y x y

    x %*% y x y

    x %.% y x y

    x x

    +x ++x

    Sub/Superscripts

    x[i] xi

    x^2 x2

    Juxtaposition

    x * y xy

    paste(x, y, z) xyz

    Lists

    list(x, y, z) x,, y,, z

    Radicals

    sqrt(x) x

    sqrt(x, y) y x

    Relations

    x == y x == y

    x != y x y

    x < y x > y

    x >= y x y

    x %~~% y x y

    x %=~% y x y

    x %==% y x y

    x %prop% y x y

    Typeface

    plain(x) x

    italic(x) x

    bold(x) x

    bolditalic(x) x

    underline(x) x

    How do I work with R?

    Fear of the command line seems to be the biggest barrier to using R. At the same time adopting a script basedapproach to data analysis is the greatest advantage of R. So, it is worth taking some time at this stage, to ex-plain carefully how to work with the R console. Why is the R interface so minimalist? When you realise whatR can do and think carefully the reason becomes obvious. A menu based GUI for a statistics program is simplya way to trace a path to a function and prompt the user for inputs to that function. So, in SPSS or Excel, you

    9

  • 7/31/2019 R for Absolute Beginners

    10/23

    R for absolute beginners A gentle introduction Duncan Golicher

    might drill through a couple of layers of menu in order to find the function to produce a boxplot of a vector ofdata called, say, treediams. In R you write boxplot(treediams). Once you have learnt basic commands this is anefficient way of working. Statistical analysis is a vast subject. How many different function calls do you need inorder to be able to apply any of the multitude of statistical methods that are available in R? The answer is un-known, but it must easily be in the tens of thousands. A menu based GUI that provided access to all of them

    would be huge. It would no doubt look much more threatening to new users than the command line. Even itit were to be built, few would use it. In many cases it would still be quicker to type commands once they areknown.

    Speed isnt the only advantage of using written commands to run analyses. The biggest advantage is that everystep of the analysis is documented. You can collect the steps you took to produce a figure or table together into ascript and reproduce the results exactly. So, the typical method of working is to open R and also open a text ed-itor like notepad. I recommend TinnR for Windows users as it has built in syntax highlighting. I usually experi-ment with a command in the console first. When I find it does what I intended I copy and paste it to the script Iam building. At the end of a session using R I have a complete record of what I have been doing.

    How do I get help?

    So, how do you know what commands are available? There are two complementary ways. The first is to followa book or course like this one that introduces you to commands in a logical sequence. The other is to use thecomprehensive R help system. The R help system will not teach you any statistics nor will it explain why youmight want to run a function. However it will show you how to run almost all the functions in R and also providean example of their use. If it takes a little effort to find out how to run a function users might be encouraged tospend more time finding out why and whether they need it.

    fortune(51)

    The documentation level of R is already much higher than average for open

    source software and even than some commercial packages (esp. SPSS is notorious

    for its attitude of "You want to do one of these things. If you dont

    understand what the output means, click help and well pop up five lines of

    mumbo-jumbo that youre not going to understand either.")

    -- Peter Dalgaard

    R-help (April 2002)

    You can often simply substitute the data used in the example for your own to get results. You can open a webbrowser interface to the help system from the console.

    This will take you to the page shown below. You can also do the same by writing a command to call a function,as that is all the GUI really does.

    help.start()

    10

  • 7/31/2019 R for Absolute Beginners

    11/23

    R for absolute beginners A gentle introduction Duncan Golicher

    The most used links on this page are Packages and Search Engine and Keywords. To use the search engine youneed java installed. Try searching for histogram.

    This will then show a number of links to functions associated with the production of histograms. Try lookingat the function hist. You will find a standard page which is the same for all functions, including sections la-belled Description,Usage, Arguments, Details, Value, References, See Also and Examples. Probably the mostimportant section at this stage is Examples. This provides you with a template for using the function. All the

    examples can be run in the console either by copying and pasting the code on the help page or by typing exam-ple(thefunctionyouwant). If you have a good idea of the name of the function the help page will be shown bysimply typing ?hist or help(hist).

    help("hist")

    example("hist")

    Histogram of islands

    islands

    Frequency

    0 5 00 0 1 00 00 15 00 0

    0

    10

    20

    30

    40

    Histogram of islands

    islands

    Frequency

    0 5 00 0 1 00 00 15 00 0

    0

    10

    20

    30

    40

    2 1 1 1 1 0 0 1

    Histogram of sqrt(islands)

    sqrt(islands)

    Frequency

    0 20 60 100 140

    0

    5

    15

    25

    35

    Histogram of sqrt(islands)

    sqrt(islands)

    Density

    0 20 60 100 140

    0.0

    0

    0.0

    4

    0.0

    8

    11

    532

    1 0 0 2 3 2

    11

  • 7/31/2019 R for Absolute Beginners

    12/23

    R for absolute beginners A gentle introduction Duncan Golicher

    How do I use the R-help list?

    If you have real problems with R you can get direct help from the best in the business. These are the set of pro-grammers, developers and long time R users on the R-help list. To subscribe or unsubscribe visit https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body help to r-help-request@r-

    project.org. However you should think carefully before using this fantastic resource directly. All the previous an-swers to questions are held on line and are found easily by the usual search engines. Trivial questions that havealready been answered are rarely tolerated. R developers are extremely busy people who have neither time norinclination to help with homework. The posting guide states.

    1. Do help.search("keyword") and apropos("keyword") with different keywords (type this at the R prompt).

    2. Do RSiteSearch("keyword") with different keywords (at the R prompt) to search R functions, contributedpackages and R-Help postings. See ?RSiteSearch for further options and to restrict searches.

    3. Read the online help for relevant functions (type ?functionname, e.g., ?prod, at the R prompt)

    4. If something seems to have changed in R, look in the latest NEWS file on CRAN for information about it.

    5. Search the R-faq and the R-windows-faq if it might be relevant (http://cran.r-project.org/faqs.html)

    6. Read at least the relevant section in An Introduction to R

    7. If the function is from a package accompanying a book, e.g., the MASS package, consult the book beforeposting

    8. It helps to provide a small example that someone can actually run.

    fortune("demigod")

    You may have not been long enough on this list to see that some of the old-time

    gurus have reached a demigod like status. Demigods have all rights to be rude

    (thats almost a definition of a demi-deity).-- Jari Oksanen (in a discussion on whether answers on R-help should be more

    polite)

    R-help (December 2004)

    How do I extend R by installing packages?

    A fundamental concept of R is the idea of packages. The initial instalation of R provides a base of functions mostof which have been developed and maintained by a small core team of programmers. However R is capable ofcarrying out a huge number of additional analytical techniques. These are often written in R itself. Fortan orC code can also be linked into R and run as R commands. It is this extensibility that has led to R becomingthe lingua franca of statistical computing. The biggest challenge is keeping up with the vast number of packages

    available and being aware of what is available. It is safe to assume that someone has implemented almost anystandard technique you might need in R.

    The list of packages with a short description of what they do can be found in CRAN. CRAN means Compre-hensive R Archive Network and is mirrored throughout the world. Because the list of packages is now so vast aTask View section has been set up that helps users to find packages associated with specific types of work. Forexample the Spatial section would be a first stop if you are interested in using R for processing geographicalinformation. Environometrics shows some of the most useful packages for ecologists and resource managers.

    The important element to remember with regard to packages is the difference between installing a package andmaking it available for use during an R session. When a package is installed it is downloaded to your hard disk

    12

  • 7/31/2019 R for Absolute Beginners

    13/23

    R for absolute beginners A gentle introduction Duncan Golicher

    and can be used. This needs to be done only once, with the exception of updating packages as new versions be-come available1.

    Packages can be installed under Windows from the graphical interface by choosing install packages(s) under thePackages menu. Again the job can also be done through a command. This is my preferred way of installing pack-ages. The following line will install the package vegan, a key tool for multivariate analysis in ecology, and vcd forvisualising categorical data. The addition of dep=T tells R to install all other packages upon which these pack-ages depend.

    install.packages(c("vegan", "vcd"), dep = T)

    The notion of dependencies is well known to those who use open source software.

    There are some key points about R to mention at this point. First R is case sensitive. The line below will notwork.

    Install.packages(c("vegan", "vcd"), dep = T)

    The next point is that you only need to install a packege to the hard disk once. However must load it into mem-ory every time you need to use a function from the package. This is achieved using the command library. Forexample

    library(vegan)

    Makes the vegan package available for use. This will become clearer over time.

    Exercises and activities

    1. Make a list of the packages on CRAN that are potentially useful for multivariate analysis.

    2. Run examples of canonical correspondence analysis and non metric multidimensional scaling using the pack-age vegan. (Note, you do not necessarily have to understand the analysis at this stage, the exercise is aimedat practice in using the help system and examples)

    3. Install the packages nortest and moments. What do they do and how might they be useful?

    4. Run an example of a test for normality

    Vectors: Working with one variable at a time

    The first goal when you begin working with R is to become sufficiently comfortable with the underlying concepts

    of the R language to be able to manipulate data easily. This ability does not come overnight. You will need topractice with a lot of examples. At first it may seem difficult to achieve simple results. The pay back is that withexperience it will become simple to achieve difficult results.

    1Linux users will already be familiar with this concept. Debian users also have the advantage that packages installed with apti-

    tude will be automatically updated.

    13

  • 7/31/2019 R for Absolute Beginners

    14/23

    R for absolute beginners A gentle introduction Duncan Golicher

    How can I use R as a calculator?

    R can be used as a scientific calculator. Any operation written in the console will be evaluated and the result re-turned to the console.

    1 + 1

    [1] 2

    More complex operations follow the typical operator order. Be careful to use brackets correctly. An extra bracketdoesnt do any harm, but leaving one out may give results you dont expect.

    1 + 1 * 3

    [1] 4

    ( 1 + 1 ) * 3

    [1] 6

    3 * 100/10 + 5

    [1] 35

    3 * 100/(10 + 5)

    [1] 20

    (3 * 100)/(10 + 5)

    [1] 20

    1 0 * ( 3 - 1 )

    [1] 20

    10 * (3 - 1)^2

    [1] 40

    1 0 * 3 - 1 ^ 2

    [1] 29

    14

  • 7/31/2019 R for Absolute Beginners

    15/23

    R for absolute beginners A gentle introduction Duncan Golicher

    How do I assign values to a variable?

    You may have noticed that the file menu in the Windows R console does not provide an obvious way of gettingdata into R. On the introductory R help page there is a link to a document called R Data Import/Export. Thisis a comprehensive and useful document for experienced R users written by Brian Ripley, a well known R Guru.

    fortune(47)

    Seldom are prizes, credit, and gratitude given, else Brian would be drowning in

    them.

    -- Anthony Rossini (about the merits of implementing software)

    R-help (May 2004)

    However I do not recommend it to beginners. This course (will) contains a whole section devoted to importingand exporting data from other statistical packages such as SPSS and from spreadsheets and data bases. For thetime being we will enter data by hand in the console. If you type

    x

  • 7/31/2019 R for Absolute Beginners

    16/23

    R for absolute beginners A gentle introduction Duncan Golicher

    Where do the numbers go?

    It wasnt until I had taught two courses on R and heard this question several times that I realised that those whoask it are expecting a completely non-technical answer. They dont want to know details about the way R usesmemory. The problem that arises in some students minds is related to the almost ubiquitous use of spreadsheets.

    fortune(59)

    Lets not kid ourselves: the most widely used piece of software for statistics

    is Excel.

    -- Brian D. Ripley (Statistical Methods Need Software: A View of

    Statistical Computing)

    Opening lecture RSS 2002, Plymouth (September 2002)

    Spreadsheet users are used to typing in numbers. The numbers remain staring at them until they move away.The notion of more abstract data objects is natural to anyone who has rudimentary contact with programminglanguages. However the idea is not intuitive for everyone.

    This is an unexpected barrier to communication between those already used to the R way of doing things and thebeginner. It needs dealing with carefully.

    My explanation is that as I work on an R session I produce a collection of objects held in the computers mem-ory. The basic properties of these objects should also be held in my own memory. I need to have a good idea ofwhat I have put into R and why. However I really dont want to be looking at the numbers themselves all thetime. This just causes clutter and confusion. As you think more about it this seems reasonable. What is the dif-ference in R between a vector called x containing 10 numbers and one containing 10,000? The answer is essen-tially nothing. The following line produces a vector with ten thousand numbers. They are then multiplied by 2.The second line is identical regardless of the size of the vector.

    x

  • 7/31/2019 R for Absolute Beginners

    17/23

    R for absolute beginners A gentle introduction Duncan Golicher

    How do I transform a vector?

    Any operation will be performed on the whole vector. Try these

    x

  • 7/31/2019 R for Absolute Beginners

    18/23

  • 7/31/2019 R for Absolute Beginners

    19/23

    R for absolute beginners A gentle introduction Duncan Golicher

    library(moments)

    agostino.test(x)

    DAgostino skewness test

    data: xskew = 0.2962, z = 0.3482, p-value = 0.7277

    alternative hypothesis: data have a skewness

    anscombe.test(x)

    Anscombe-Glynn kurtosis test

    data: x

    kurt = 2.2753, z = -0.0702, p-value = 0.944

    alternative hypothesis: kurtosis is not equal to 3

    Notice that testing for significant deviations from the desired normal properties in the case of small samples isnot particularly useful. The null hypothesis is much less likely to be rejected when there is little power avail-able. There is nothing wrong with our assumption that the numbers were drawn from a normal distribution. Inthis special case we know it to be absolutely correct because that is what we told R to do for us. However a hsi-togram of the small sample itself does not look particularly normal. In fact if we repeat the process 36 times wecan see that the histogram of a sample of ten numbers very rarely looks normal even when they are drawn from anormal population.

    par(mfcol = c(6, 6), mar = c(0.5, 0.5, 0.5, 0.5))

    replicate(36, hist(rnorm(10, 1, 2), col = "grey", xlab = "",

    + ylab = "", main = "", axes = F))

    This explains why testing for the normality of small samples is far less important than having a good justificationfor assuming that they could have been drawn from a population with normal properties. In this case if negativevalues are not possible the whole process would be completely meaningless. Lets assume that they represent thatdifferences between two paired values which could indeed have either sign.

    How do I run a t-test in R?

    Now the idea of statistical hypothesis testing is to try estimate the probability attached to various statements wecould make about some underlying population from which these ten numbers were drawn. In this case we know

    19

  • 7/31/2019 R for Absolute Beginners

    20/23

    R for absolute beginners A gentle introduction Duncan Golicher

    what this population is but in real life we do not. In this case a fairly reasonable statement to test would be thatthe true difference between the pairs of observation is zero, in other words nothing much is happening.

    In a practical research setting you run a t-test in R quickly with a single function. There are various inputs tothe test, but we will use the default two tailed option.

    t.test(x)

    One Sample t-test

    data: x

    t = 2.5612, df = 9, p-value = 0.03063

    alternative hypothesis: true mean is not equal to 0

    95 percent confidence interval:

    0.1476105 2.3812007

    sample estimates:

    mean of x

    1.264406

    How can I break down the calculations of a mean step by step?

    So now R has told us what the result should be. Lets do all the calculations step by step. The mean of the sam-ple is calculated by

    x = 1nn

    i=1xi

    To have each step of this simple operation calculated in turn we calculate the sum in R and then divide by n

    n

  • 7/31/2019 R for Absolute Beginners

    21/23

    R for absolute beginners A gentle introduction Duncan Golicher

    We can even simulate this 100,000 times, look at the results as a histogram and calculate the proportion of thereplicated samples that have a mean below zero. The theoretical basis of our null hypothesis test is based on thisconcept.

    samps

  • 7/31/2019 R for Absolute Beginners

    22/23

    R for absolute beginners A gentle introduction Duncan Golicher

    rootmeansquare

  • 7/31/2019 R for Absolute Beginners

    23/23

    R for absolute beginners A gentle introduction Duncan Golicher

    plot(function(x) dt(df = n - 1, x), -4, 4)

    xvals