23
Introduc)on to R David Smith, REvolu)on Compu)ng LUGOD, Nov 16 2009

Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Introduc)on  to  R  

David  Smith,  REvolu)on  Compu)ng  LUGOD,  Nov  16  2009  

Page 2: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Open-­‐source  sta)s)cal  analysis  and  visualiza)on  

2  LUGOD,  Nov  16  2009  

Page 3: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

What  is  R?  

•  Open-­‐source  soJware  for  advanced  sta)s)cal  analysis  and  data  visualiza)on  

•  Interpreted  language  designed  for  sta)s)cal  programming  –  Virtually  all  sta)s)cal  and  predic)ve  analy)c  methods  available  

–  Create  analysis  in  a  frac)on  of  the  )me  compared  to  C++,  Python,  SAS,  SPSS,  …  

•  www.r-­‐project.org    

LUGOD,  Nov  16  2009   3  

Page 4: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

R  History  

•  Late  70’s  /  Early  ’80s:  S  language  invented  at  AT&T  Bell  Labs  

•  1983-­‐1992:  S  version  3    – 1998:  Chambers  wins  ACM  SoJware  Systems  award  for  S  

•  1993:  R  Gentleman  and  R  Ihaka  create  R  – 1995:  Released  under  GPL  v2  

•  1997:  R  Core  Group  formed  

•  2001:  R  1.0.0  released  

LUGOD,  Nov  16  2009   4  

Page 5: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

R  Today  

•  Oct  2009:  R  2.10.0  released  •  2M+  users  (es)mated)  

– Google,  Facebook,  Pfizer,  Bank  of  America,  New  York  Times  

•  2000+  contributed  packages  – cran.r-­‐project.org  

•  Annual  interna)onal  user  group  mee)ng  

•  Commercially  supported  by  Revolu)on  Compu)ng  

LUGOD,  Nov  16  2009   5  

Page 6: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Momentum  behind  R  

LUGOD,  Nov  16  2009   6  

Page 7: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

What  is  Sta)s)cs?  

http://xkcd.com/552/

7  LUGOD,  Nov  16  2009  

Page 8: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

REvolu)on  Compu)ng:  The  “R”  Company  

•  REvolu)on  R  –  Free,  high-­‐performance  binary  distribu)on  of  R  

•  REvolu)on  R  Enterprise  –  Subscrip)on-­‐based  version,  bundled  with  proprietary  extensions  

–  Fully  supported,  validated  •  R  Consul)ng  and  Training  •  R  Community  Development  

–  R  Evangelism  (blog.revolu)on-­‐compu)ng.com)  –  R  community  portal  –  Technical  and  financial  contribu)ons  to  R  Project  

LUGOD,  Nov  16  2009   8  

Page 9: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

What  can  you  do  with  R?  

•  Mash-­‐up  messy  data  sources  to  analyze  the  foreclosure  crisis  

From O'Reilly's Data Mash-ups in R. 9  LUGOD,  Nov  16  2009  

Page 10: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

What  can  you  do  with  R?  

•  Find  a  clean  place  to  surf  in  the  Bay  Area  

John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

10  LUGOD,  Nov  16  2009  

Page 11: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

What  can  you  do  with  R?  

•  Compare  baseball  player  performance  

PitchFX Viewer, by Mike Driscoll: labs.dataspora.com/gameday/ 11  LUGOD,  Nov  16  2009  

Page 12: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

R  saves  )me  for  the  New  York  Times  

•  Published  3  hours  aJer  Jackson’s  death:  

nyt.com, June 25 2009 12  LUGOD,  Nov  16  2009  

Page 13: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Gelng  Started  with  R  on  Ubuntu  (Karmic)  

$ sudo apt-get revolution-r $ R

•  R  is  interac)ve:  type  statements  at  the  prompt,  and  the  result  is  printed:  

> x <- rnorm(10, mean=1, sd=2)

> mean(x)

[1] 1.695564

LUGOD,  Nov  16  2009   13  

Page 14: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Learning  R  

•  Introduc-on  to  R  (free)  – 20-­‐minute  tutorial:  Appendix  A  

•  Introductory  Sta-s-cs  with  R  – Peter  Dalgaard  

•  R  in  a  Nutshell  (O’Reilly)  

•  Other  resources  for  beginners:  – www.revolu)on-­‐compu)ng.com/community/resources.php  

– blog.revolu)on-­‐compu)ng.com/beginner-­‐)ps/  LUGOD,  Nov  16  2009   14  

Page 15: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

ESS:  An  emacs-­‐based  GUI  for  R  

•  sudo apt-get install ess •  Start  R  in  Emacs:  M-x R

•  hnp://ess.r-­‐project.org  •  ESS  Reference  Card:  

– hnp://ess.r-­‐project.org/refcard.pdf  

LUGOD,  Nov  16  2009   15  

Page 16: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Let’s  do  a  simula)on!  

•  Is  this  your  birthday?  

6  

January

16  LUGOD,  Nov  16  2009  

Page 17: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Simula)ng  birthdays  

•  A  simple  simulation:  

birthday <- function(n) { ntests <- 10000 pop <- 1:365 anydup <- function(i) any(duplicated( sample(pop, n, replace=TRUE))) sum(sapply(seq(ntests), anydup)) / ntests }

x <- foreach (j=1:100) %dopar% birthday (j)  

17  LUGOD,  Nov  16  2009  

Page 18: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Birthday  Simula)on  

> x <- foreach (j=1:100) %dopar% birthday (j)!> plot(1:100, unlist(x),type="l") !

18  LUGOD,  Nov  16  2009  

Page 19: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

More  Examples  

•  Plolng  unemployment  data  on  a  map  •  Displaying  financial  data  as  a  calendar  heatmap  

•  Analyzing  Twiner  hashtags  

LUGOD,  Nov  16  2009   19  

Page 20: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Unemployment  data  map  

LUGOD,  Nov  16  2009   20  

Page 21: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Calendar  heat  map  

LUGOD,  Nov  16  2009   21  

Page 22: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Social  network  analysis  with  Twiner  

LUGOD,  Nov  16  2009   22  

Page 23: Introduc)on*to*R...John Oram, a scientist at the San Francisco Estuary Institute (SFEI) uses R to collect and monitor environmental data from the waters and wetlands of the Bay Area

Thank  You!  

•  David  Smith  –   david@revolu)on-­‐compu)ng.com,  @revodavid  

•  Revolu-ons,  the  R  blog  – blog.revolu)on-­‐compu)ng.com    

•  R  Project  – www.r-­‐project.org  

•  Links  for  this  talk:  – )nyurl.com/lugod-­‐nov09    

23  LUGOD,  Nov  16  2009