Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
A Brief Introduction to RIntroduction to Econometrics,Fall 2020
Zhaopeng Qu
Nanjing University
9/16/2020
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 1 / 65
Introduction to R
Section 1
Introduction to R
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 2 / 65
Introduction to R
What is R?
Not only a statistical programming language, but a computingenvironment for statistical computing and graphics.The R language has its roots in the S language developed by AT&T,which also developed the C language. It is not the onlydomain-specific language available for statistical analysis: there are[many others][boc-comp] like SAS, SPSS or Stata, or evenapplications of statistics with mathematical software and scriptinglanguages like Java or Python.
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 3 / 65
Introduction to R
Why R: A Free but Powferful tool
Free and Open sourcePowerful program and brilliant visualizationPopularity
Used across wide variety of disciplines both academics and business.tremendous online resources: books, blogs, forums, videos and onlinecourses.
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 4 / 65
Introduction to R
Why R: Top 5 Programming Languages in 2019Top 10 Programming Languages in 2019 by IEEE
Figure 1: Top 10 Languages in Data Science
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 5 / 65
Introduction to R
Why R: Top 5 Software in 2019The Popularity of Data Science Software in 2019 by Jobs
Figure 2: Top10
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 6 / 65
Introduction to R
Why R: Top 11 increase during 2017-2019The Change during 2017-2019 by Jobs
Figure 3: Top10
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 7 / 65
Introduction to R
Wrap up
The number-crunching language R rounds out the top 5.Despite being a much more specialized language than the others, it’smaintained its popularity in recent years due to the world being awashin an ever-growing pile of big data.
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 8 / 65
Introduction to R
Setup
Default your operating system environmentWindows: Win7 or Win10(preferred)Mac: macOS 10.15 Catalina(preferred) or 10.14 Mojave
InstallingRRStudioLatex(not required)Pandoc(not required)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 9 / 65
Introduction to R
Installing R: 注意事项 (for Windows)
如果没有按照注意事项安装,最好先卸载,重新安装最新版本。安装路径里不要有中文
因为可能会涉及语言编码问题,后面更新包容易出错。
选择安装语言时,最好选择 English, 而不是中文(如果是系统默认的,后面最好也调整成英文)安装目录中去掉 R 的版本号(应付新旧版本 R 的兼容问题)
比如不要安装在 C:/Program Files/R/R-3.6.1/,而是把它改成C:/Program Files/R/,这样更新就会自动覆盖之前的版本。
另外建议最好换个目录安装,不要在 C:/Program Files/或者干脆不在 C 盘, 比如 D:/Application/R/
这样不会因为在使用 R 需要读写安装目录时遭遇 Windows 特有的system administration 权限问题。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 10 / 65
How to learn R
Section 2
How to learn R
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 11 / 65
How to learn R
Tips
先从初步操作学起,熟悉界面,掌握基础命令。参考书不是特别重要,有 1-2 本足以,主要用于初步学习上手以及后续的查询。“干中学”:以完成作业或研究项目为目的有针对性的学习。善于使用 R 或者 RStudio 的帮助文件 help()善于使用搜索引擎和关键字进行搜索。
最好是 Google, 其次是 Yahoo 或者 Bing,Baidu 慎用,其他国内搜索引擎不要用
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 12 / 65
How to learn R
R 的基本界面
Figure 4: R InterfaceZhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 13 / 65
How to learn R
R 的简单操作
两种方式与 R 对话写命令在命令窗口 (Console)写在 R 脚本中 (R Script)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 14 / 65
How to learn R
R 的简单操作
1 + 2
## [1] 3
1 / (2 + 3) == .2
## [1] TRUE
1:5
## [1] 1 2 3 4 5
as.matrix(1:3)
## [,1]## [1,] 1## [2,] 2## [3,] 3Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 15 / 65
How to learn R
R 的简单操作
赋值 (Assignment):在 R 中一般使用 <-作为赋值符号, 而不是 =。
v <- 2w <- 3v
## [1] 2
w/v
## [1] 1.5
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 16 / 65
How to learn R
R 的简单操作
赋值向量
cnumber <- c(2,3)cnumber
## [1] 2 3
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 17 / 65
How to learn R
R 的简单操作赋值函数
x <- rnorm(10000,mean=100,sd=36)hist(x,breaks=51,col="orange",main="rnorm")
rnorm
x
Fre
quen
cy
−50 0 50 100 150 200
010
020
030
040
050
060
0
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 18 / 65
How to learn R
R 的简单操作
对象命名规则可包含大小写字母 az、AZ、下划线 _ 和小数点.,不能包含其他符号。大小写敏感。只能以字母或点开头,不能以数字或下划线开头,当以点开头时,第一个点之后不能紧接着为数字。最好不要与系统默认的命令或函数名称相同。
var.name2 <- cnumbervar.name2
## [1] 2 3
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 19 / 65
How to learn R
R 的简单操作: 输入一组数据
Figure 5: R Interface
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 20 / 65
How to learn R
R 的简单操作:
使用 Rscript 文件 (.R) 输入一组数据并求统计量
age <- c(1,3,5,2,11,9,3,9,12,3)weight <- c(4.4,5.3,7.2,5.2,8.5,7.3,6.0,10.4,10.2,6.1)mean(age)sd(age)cor(age,weight)plot(age,weight)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 21 / 65
How to learn R
R 的简单操作: Rscript选中命令行,然后快捷键 Command+Enter 执行。
Figure 6: R InterfaceZhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 22 / 65
How to learn R
R 的简单操作:
由于 R 本身是开源免费软件,负责开发和维护的团队人数和资源都有限,所以 R 本身的界面和操作都显得过于原始。在这种情况下,以 R 作为平台,进行二次开发的应用软件层出不穷,这些软件通常被叫做集成开发环境 (Integrated DevelopmentEnvironment,IDE)。它集成了代码编写功能、分析功能、编译功能、调试功能等一体化的开发软件服务套件。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 23 / 65
Using IDE: RStudio
Section 3
Using IDE: RStudio
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 24 / 65
Using IDE: RStudio
Intro RStudio
The most popular IDE for R如果把 R 认为是发动机的话,IDE 可以认为是操作面板,两者共同构成数据分析的 “战车”。所以发动机可以都是 R, 但可以使用不同品牌或公司的操作面板(IDE)。所有操作面板中,RStudio 是最容易上手的,使用者最多的。
Also Free(for basic version)Combine with Markdown and Latex to make scientific writings orpresentation easierDownload it from here: RStudio
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 25 / 65
Using IDE: RStudio
Installing RStudio
注意事项 (for Windows)官网下载安装路径里不要有中文目录最好不是 program files, 不然后面可能会由于 “管理员权限” 影响使用。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 26 / 65
Using IDE: RStudio
RStudio 的基本界面三种方式与 R 对话
写命令在命令窗口 (Console):简单命令写在 R 脚本中 (R Script):复杂的一系列的命令或函数等写在 Rmarkdown 文档中,生成分析代码 + 结果 + 文字的报告。
Figure 7: RStudio Interface
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 27 / 65
Using IDE: RStudio
RStudio 的基本界面
ConsoleR Script or RmarkdownEnvironment
Import DataHistoryConnection: data sourcesGit
OthersFilesPlotsPackagesHelpViewer
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 28 / 65
Using IDE: RStudio
Packages(包)
包是基于 R 的基本功能,用来完成某些高级功能的专属模块。可以通过全世界的镜像地址,在线下载和安装。
.packages(TRUE)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 29 / 65
Using IDE: RStudio
Packages(包)
Packages(包) 基本分成三类:基础包附加包个人包
因为开源,提交包的门槛相对较低,所以导致包的质量良莠不齐。因此在使用一个包之前,最好对这个包的使用情况有一定了解。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 30 / 65
Using IDE: RStudio
Packages(包)
基本命令方式安装:用 install.packages(“包的名字”)安装之后,要想使用还必须加载:用 library(包的名字)有时还要卸载:detach(“package: 包的名字”, unload=TRUE)彻底移除:remove.packages(“包的名字”)
在 RStudio 里,也可以采用窗口菜单点选的方式。安装计量经济学常用包 AER
install.packages("AER",repos = "http://mirrors.ustc.edu.cn/CRAN/")
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 31 / 65
Using IDE: RStudio
Packages(包)
选择合适的镜像Tools→Global Options→Packages→在 Primary CRAN repository 中选择国内的相关镜像。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 32 / 65
Using IDE: RStudio
Packages(包)查看 Packages 的内容加载 Packages
library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
#### Attaching package: 'zoo'
## The following objects are masked from 'package:base':#### as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
data(STAR)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 33 / 65
Using IDE: RStudio
Packages(包)
library(ggplot2)p1<-ggplot(STAR, aes(readk)) +
geom_histogram(bins=30,colour="black",fill="white")p2<-ggplot(STAR, aes(x=gender,y=readk)) +
geom_boxplot()
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 34 / 65
Using IDE: RStudio
Packages(包)
library(gridExtra)grid.arrange(p1,p2,ncol = 2, nrow = 1)
0
250
500
750
300 400 500 600readk
coun
t
300
400
500
600
male female NAgender
read
k
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 35 / 65
Directory Management
Section 4
Directory Management
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 36 / 65
Directory Management
Workspace
The workspace is your current R working environment and includesany user-defined objects (vectors, matrices, functions, data frames,and lists).
When you quit R, you can save an image of the current workspacethat’s automatically reloaded the next time R starts.
save.image("myfile")load("myfile")
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 37 / 65
Directory Management
Working directory
The current working directory is the directory from which R willread files and to which it will save results by default.
getwd() # 显示当前目录setwd(~Dropbox/R/2020/Lab1) # 设定工作目录
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 38 / 65
Directory Management
Building your own system of directory
Project name directoryRawDataWorkDataFiguresTables……
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 39 / 65
Directory Management
My not a perfect example:
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 40 / 65
Directory Management
Where are packges installed
1 R 的安装目录可写吗?如果有写的权限,那么就把包装到 R 安装目录下,比如 C:/Software/R/library/
2 如果安装不可写,那么 R 会要求新建一个文件夹来安装包。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 41 / 65
Directory Management
User’s directory
normalizePath('~')
## [1] "/Users/byelenin"
list.files('~', all.files = TRUE) # how many files in the HOME directory
## [1] "." ".." ".android"## [4] ".bash_history" ".bash_profile" ".bash_sessions"## [7] ".cache" ".CFUserTextEncoding" ".config"## [10] ".cups" ".dropbox" ".DS_Store"## [13] ".gitconfig" ".idlerc" ".local"## [16] ".oracle_jre_usage" ".python_history" ".r"## [19] ".Rapp.history" ".rcp" ".Renviron"## [22] ".Rhistory" ".Rprofile" ".rstudio-desktop"## [25] ".serverauth.31468" ".ShadowsocksX-NG" ".ssh"## [28] ".start_app_logs" ".subversion" ".Trash"## [31] ".viminfo" ".Xauthority" ".zshrc"## [34] "Agent" "Applications" "Calibre Library"## [37] "Desktop" "Documents" "Downloads"## [40] "Dropbox" "Library" "Mega"## [43] "Mirror" "Movies" "Music"## [46] "OneDrive" "Parallels" "Pictures"## [49] "Public" "test.R" "v2ray_ins.log"
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 42 / 65
Directory Management
.Renviron and .Rprofile
file.edit('~/.Renviron') # open the file and edit
Restart R(Cmd+Shift+F10), then try to install a package自定义包的安装路径的好处:当重装系统、更换电脑等情况下,不需要重新 reinstall 这些包。
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 43 / 65
Directory Management
.Renviron and .Rprofile
而.Rprofile 文件则是一个 R 代码文件,在 R 启动时,如果这个文件存在,它会被首先执行。
file.edit('~/.Rprofile') # open the file and edit
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 44 / 65
Directory Management
Windows 下的 Rconsole 文件
file.path(R.home('etc'),'Rconsole') # find the path
修改语言为英语
language = en # find the path
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 45 / 65
Directory Management
MacOS 下的系统环境
Sys.setenv(LANGUAGE = 'en')Sys.unsetenv()
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 46 / 65
Data Practice
Section 5
Data Practice
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 47 / 65
Data Practice
Importing Data: From CSV & Excel
caschool_csv <- read.csv("Data/caschool.csv")
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 48 / 65
Data Practice
Importing Data: From Other Statistical Tools
Funtion Formatread.spss SPSSread.dta Stataread.ssd SASread.mtp minitab
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 49 / 65
Data Practice
Importing Data: From STATA
library("foreign")caschool <- read.dta("Data/caschool.dta")
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 50 / 65
Data Practice
View Data:
View(caschool)head(caschool)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 51 / 65
Data Practice
Drop Variables
ca_data_small <- subset(caschool,select=c(observat,testscr,+str,expn_stu,el_pct,avginc))
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 52 / 65
Data Practice
Generate New Variables
ca_data_small$logexp <- log(ca_data_small$expn_stu)ca_data_small$el_high <- ca_data_small$el_pcthead(ca_data_small)
## observat testscr str expn_stu el_pct avginc logexp el_high## 1 1 690.80 17.88991 6384.911 0.000000 22.690001 8.761693 0.000000## 2 2 661.20 21.52466 5099.381 4.583333 9.824000 8.536874 4.583333## 3 3 643.60 18.69723 5501.955 30.000002 8.978000 8.612859 30.000002## 4 4 647.70 17.35714 7101.831 0.000000 8.978000 8.868108 0.000000## 5 5 640.85 18.67133 5235.988 13.857677 9.080333 8.563311 13.857677## 6 6 605.55 21.40625 5580.147 12.408759 10.415000 8.626970 12.408759
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 53 / 65
Data Practice
Drop Variables
ca_data_small <- subset(ca_data_small,select=-c(el_high))head(ca_data_small)
## observat testscr str expn_stu el_pct avginc logexp## 1 1 690.80 17.88991 6384.911 0.000000 22.690001 8.761693## 2 2 661.20 21.52466 5099.381 4.583333 9.824000 8.536874## 3 3 643.60 18.69723 5501.955 30.000002 8.978000 8.612859## 4 4 647.70 17.35714 7101.831 0.000000 8.978000 8.868108## 5 5 640.85 18.67133 5235.988 13.857677 9.080333 8.563311## 6 6 605.55 21.40625 5580.147 12.408759 10.415000 8.626970
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 54 / 65
Data Practice
Summary the Data
summary(ca_data_small)
## observat testscr str expn_stu## Min. : 1.0 Min. :605.5 Min. :14.00 Min. :3926## 1st Qu.:105.8 1st Qu.:640.0 1st Qu.:18.58 1st Qu.:4906## Median :210.5 Median :654.5 Median :19.72 Median :5215## Mean :210.5 Mean :654.2 Mean :19.64 Mean :5312## 3rd Qu.:315.2 3rd Qu.:666.7 3rd Qu.:20.87 3rd Qu.:5601## Max. :420.0 Max. :706.8 Max. :25.80 Max. :7712## el_pct avginc logexp## Min. : 0.000 Min. : 5.335 Min. :8.275## 1st Qu.: 1.941 1st Qu.:10.639 1st Qu.:8.498## Median : 8.778 Median :13.728 Median :8.559## Mean :15.768 Mean :15.317 Mean :8.571## 3rd Qu.:22.970 3rd Qu.:17.629 3rd Qu.:8.631## Max. :85.540 Max. :55.328 Max. :8.950Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 55 / 65
Data Practice
Summary a Variable
summary(ca_data_small$testscr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.## 605.5 640.0 654.5 654.2 666.7 706.8
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 56 / 65
Data Practice
Attach a Dataframe
if the dataframe is attached, simply
attach(ca_data_small)summary(testscr)
## Min. 1st Qu. Median Mean 3rd Qu. Max.## 605.5 640.0 654.5 654.2 666.7 706.8
detach(ca_data_small)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 57 / 65
Plot
Section 6
Plot
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 58 / 65
Plot
Scatter Plot
attach(ca_data_small)plot(str, testscr)
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 59 / 65
Plot
Scatter Plot
14 16 18 20 22 24 26
620
640
660
680
700
str
test
scr
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 60 / 65
Plot
Scatter Plot
plot(str, testscr)abline(lm(testscr ~ str , data = ca_data_small),col = "red")
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 61 / 65
Plot
ggplot2
14 16 18 20 22 24 26
620
640
660
680
700
str
test
scr
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 62 / 65
Plot
ggplot2
library("ggplot2")ggplot(data =ca_data_small,aes(x=str, y=testscr)) +geom_point(shape=1) + # Use hollow circlesgeom_smooth(method=lm) # Add linear regression line
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 63 / 65
Plot
ggplot2
630
660
690
14 16 18 20 22 24 26str
test
scr
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 64 / 65
Plot
Reference
Jared P. Lander (2013), R for Everyone: Advanced Analytics andGraphicsRobert I. Kabacoff(2011), R in Action: Data Analysis and GraphicsWith R谢益辉等 (2018),R 语言忍者秘籍
Zhaopeng Qu (Nanjing University) A Brief Introduction to R 9/16/2020 65 / 65