37
An Example. The question Data Analyses Conclusio ns

An Example. The question Data Analyses Conclusions

Embed Size (px)

DESCRIPTION

Finding the data. Google with various key words: temperature, Recife,... "Eventually lead" to: cdiac.ornl.gov/ftp/ndp041 Carbon dioxide information analysis center! Had to discover Recife Curado station id (Looked at Brasil sites too, but that didn't turn up the data) Years Searched an inappropriate ste for a long time

Citation preview

Page 1: An Example. The question Data Analyses Conclusions

An Example.

The question

Data

Analyses

Conclusions

Page 2: An Example. The question Data Analyses Conclusions

Monthly mean air temperature at Recife 1953-1962

The question:

What is the relationship between temperatures in Recife and El Nino?

Objectives

- to layout analyses

- to explore the data for surprises

- predicted values

- signal + noise?

- ...

Page 3: An Example. The question Data Analyses Conclusions

Finding the data.

Google with various key words: temperature, Recife, ..."Eventually lead" to:

cdiac.ornl.gov/ftp/ndp041

Carbon dioxide information analysis center!

Had to discover Recife Curado station id - 3068290000

(Looked at Brasil sites too, but that didn't turn up the data)

Years 1949-1988Searched an inappropriate ste for a long time

Page 4: An Example. The question Data Analyses Conclusions

30682900001941 274 279 268 267 260 250 246 245 256 260 262 266

30682900001942 273 268 270 270 256 247 236 233 252 260 270 265

30682900001943 270 270 273 261 253 245 236 238 247 260 264 268

30682900001944 269 272 275 263 254 247 239 232 241 256 267 273

30682900001945 278 268 278 271 256 240 236 243 251 258 268 267

30682900001946 268 277 271 259 258 247 247 247 249 257 261 266

30682900001947 269 270 268-9999 253 246 244 245 249 262 268-9999

30682900001948 273 271 270 269 255 249 243 240 248-9999 264 270

30682900001949 272 274 278 266 252 246 240 236 250 261 265 271

30682900001950 273 280 269 251 248 243 235 234 248 258 264-9999

30682900001951 269 267 275 265 254 238 233 238 247 259 263-9999

30682900001952 272 278 267 262 252 245 240 238 253 256 263 268

The web data. monthly

notice -9999 replace by NA file: recifecurado

Page 5: An Example. The question Data Analyses Conclusions

How to handle missing values? Interpolate? Model? ...?

junk<-scan("recifecurado")

junk1<-matrix(junk,ncol=48)

junk2<-junk1[2:13,] # years in first row

series<-c(junk2)/10 # for degrees centigrade

length(series[is.na(series)]) #17 - need to understand missingness

Interpolation

series1<-series

for(i in 2:(length(series)-1)){if(is.na(series[i]))series1[i]<-.5*series[i-1] +.5*series[i+1]}

Page 6: An Example. The question Data Analyses Conclusions

plot(xaxis,series1,type="l",xlab="year",ylab="mean temp (degrees C)",las=1)

title("Mean monthly temperatures 1949-88 Recife Curado")

abline(h=mean(series1))

Page 7: An Example. The question Data Analyses Conclusions

There is seasonality and variability

Restricted range in mid-sixties - nonconstant mean level?

ylim<-range(series1)

par(mfrow=c(2,1))

plot(lowess(xaxis,series1),type="l",ylim=ylim,xlab="year",ylab="degrees C",main="Smoothed Recife series")

abline(h=mean(series1))

junk20<-lowess(xaxis,series1)

plot(xaxis,series1-junk20$y,type="l",xlab="year",ylab="degrees C",main="Residuals")

abline(h=mean(series1-junk20$y))

Page 8: An Example. The question Data Analyses Conclusions
Page 9: An Example. The question Data Analyses Conclusions

par(mfrow=c(1,1))

acf(series1,las=1,xlab="lag(mo)",ylab="",main="autocorrelation recife temperatures",lag.max=50,ylim=c(-1,1))

Page 10: An Example. The question Data Analyses Conclusions

More confirmation of period 12

Note that nearby values are highly correlated

REMEMBER the interpretation of the error lines

Page 11: An Example. The question Data Analyses Conclusions

spectrum(series1,xlab="frequency (cycles/month)",las=1)

Page 12: An Example. The question Data Analyses Conclusions

Note peaks at frequency 1/12 and harmonics

Further confirmation of period 12

Note log scale for y-axis

Note vertical line in upper right

Gives uncertainty

Page 13: An Example. The question Data Analyses Conclusions

What is the shape of the seasonal?

junk4<-matrix(series1,nrow=12)

junk5<-apply(junk4,1,mean)

plot(junk5,type="l",las=1)

abline(h=mean(junk5))

Page 14: An Example. The question Data Analyses Conclusions

Cooler in July-Aug

Southern Hemisphere

Uncertainty?

Page 15: An Example. The question Data Analyses Conclusions

Cooler in July-August. Southern hemisphere

Part of a longer cycle? El Nino explanatory?

After "removing" trend middle has been pulled up

Need uncertainties

Back to original data

Page 16: An Example. The question Data Analyses Conclusions

Remove seasonal

series2<-series1

for(i in 1:48){

for(j in 1:12){

series2[(i-1)*12+j]<-series1[(i-1)*12+j]-junk5[j]

}

}

par(mfrow=c(2,1))

plot(xaxis,series2,type="l",xlab="year",ylab="residual",main="Series after removing seasonal",las=1)

abline(h=0)

ylim<-range(series2)

plot(xaxis,series1-mean(series1),type="l",xlab="year",ylab="degreesC",main="Mean removed series",las=1,ylim=ylim)

abline(h=mean(series1-mean(series1)))

Page 17: An Example. The question Data Analyses Conclusions

original variance 1.342 adjusted .248

Page 18: An Example. The question Data Analyses Conclusions

par(mfrow=c(2,1))

acf(series2,lag.max=50,las=1,xlab="lag (mo)",main="Ajusted by removing monthly means",las=1)

acf(diff(series1,lag=12),lag.max=50,xlab="lag (mo)",main="Order 12 differenced series")

Page 19: An Example. The question Data Analyses Conclusions

Frequency domain analysis.

par(mfrow=c(2,1))

junk9<-spec.pgram(series1,taper=0,detrend=F,demean=F,spans=5,plot=F)

ylim<-range(junk9$spec)

junk9<-spec.pgram(series1,taper=0,detrend=F,demean=F,spans=5,xlab="frequency (cycles/mo)",las=1,main="Original series")

junk10<-spec.pgram(series2,taper=0,detrend=F,demean=F,spans=5,ylim=ylim,main="Monthly means removed",las=1)

Page 20: An Example. The question Data Analyses Conclusions

Work remains on seasonal

Residual "not" white noise

Page 21: An Example. The question Data Analyses Conclusions

Time domain

distributions

Page 22: An Example. The question Data Analyses Conclusions

Parametric model. SARIMA ?

Thinking about prediction, consider

Yt = αYt-1 + βYt-12 + Nt

with some ARMA for Nt

Check seasonal residuals for normality

Hope to end up with white noise

Page 23: An Example. The question Data Analyses Conclusions

Call:

arima(x = series1, order = c(1, 0, 1), seasonal = list(order = c(1, 0, 1), period = 12))

Coefficients:

ar1 ma1 sar1 sma1 intercept

0.7604 -0.3344 0.9990 -0.9201 25.6117

s.e. 0.0610 0.0968 0.0007 0.0220 0.6100

sigma^2 estimated as 0.1771: log likelihood = -337.48, aic = 686.97

Junk<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12))

Page 24: An Example. The question Data Analyses Conclusions
Page 25: An Example. The question Data Analyses Conclusions

tsdiag(Junk,gof.lag=25)

Page 26: An Example. The question Data Analyses Conclusions
Page 27: An Example. The question Data Analyses Conclusions

Junk<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12))

postscript(file="recifeplots1a.ps",paper="letter",hor=T)

Junk2<-predict(Junk,n.ahead=24)

Junk3<-c(series1,Junk2$pred)

Junk3a<-c(rep(0,576),2*Junk2$se)

Junk3b<-c(rep(0,576),-2*Junk2$se)

Junk4a<-Junk3+Junk3a;Junk4b<-Junk3+Junk3b

ylim<-range(Junk4a,Junk4b)

par(mfrow=c(1,1))

xaxis1<-1941+(1:length(Junk3)/12)

plot(xaxis1[xaxis1>1983],Junk4a[xaxis1>1983],type="l",las=1,ylim=ylim,col="red",xlab="year",ylab="degrees C",main="Data + predictions")

lines(xaxis1[xaxis1>1983],Junk4b[xaxis1>1983],col="red")

lines(xaxis1[xaxis1>1983],Junk3[xaxis1>1983],col="blue")

lines(xaxis[xaxis>1983],series1[xaxis>1983])

Page 28: An Example. The question Data Analyses Conclusions

Two series

Bivariate case {Xt, Yt} - jointly distributed

Linear time invariant / transfer function model

tkktkt NXhY

nonparametric/parametric approaches

Page 29: An Example. The question Data Analyses Conclusions

Southern Oscillation Index

El Niño: global coupled ocean-atmosphere phenomenon.

The Pacific ocean signatures, El Niño and La Niña are important temperature fluctuations in surface waters of the tropical Eastern Pacific Ocean

Page 30: An Example. The question Data Analyses Conclusions

Southern Oscillation reflects monthly or seasonal fluctuations in the air pressure difference between Tahiti and Darwin

www.cpc.ncep.noaa.gov/data/soi

Page 31: An Example. The question Data Analyses Conclusions

junk<-scan("recifecurado")

junk1<-matrix(junk,ncol=48)

junk6<-junk1[1,]

junk1<-junk1[,junk6>1950]

junk2<-junk1[2:13,]

series<-c(junk2)/10

length(series[is.na(series)]) #13

xaxis<-1951+(1:length(series)/12)

series1<-series

junk4<-matrix(series1,nrow=12)

junk5<-apply(junk4,1,mean)

for(i in 2:(length(series)-1)){if(is.na(series[i]))series1[i]<-.5*series[i-1]+.5*series[i+1]}

junk4<-matrix(series1,nrow=12)

junk5<-apply(junk4,1,mean)

series2<-series1

for(i in 1:38){

for(j in 1:12){

series2[(i-1)*12+j]<-series1[(i-1)*12+j]-junk5[j]}}

Page 32: An Example. The question Data Analyses Conclusions

kunk<-scan("SOIa.dat")

kunk1<-matrix(kunk,ncol=58); kunk6<-kunk1[1,]

kunk1<-kunk1[,kunk6<1989]

kunk2<-kunk1[2:13,]

teries<-c(kunk2)

length(teries[is.na(teries)]) #0

teries1<-teries; teries2<-teries1

postscript(file="recifeplots3.ps",paper="letter",hor=T)

par(mfrow=c(2,1))

plot(xaxis,series2,type="l",las=1,xlab="year",ylab="",main="Seasonally adjusted Recife temps")

plot(xaxis,teries2,type="l",las=1,xlab="year",ylab="",main="Southern Oscillation Index")

Page 33: An Example. The question Data Analyses Conclusions

postscript(file="recifeplots2.ps",paper="letter",hor=F)

par(mfrow=c(1,1))

acf(cbind(series2,teries2))

Page 34: An Example. The question Data Analyses Conclusions
Page 35: An Example. The question Data Analyses Conclusions

junk10<-cbind(series2,teries2)

junk11<-spec.pgram(junk10,plot=F,taper=0,detrend=F,demean=F,spans=11)

par(mfcol=c(2,2))

plot(junk11$freq,10**(.1*junk11$spec[,2]),log="y",main="SOIspectrum", xlab="frequency", ylab="", las=1,type="l")

plot(junk11$freq,junk11$coh,main="Coherence",xlab="frequency",ylab="",las=1,ylim=c(0,1),type="l")

junkh<-1-(1-.95)**(1/(.5*junk11$df-1))

abline(h=junkh)

plot(junk11$freq,10**(.1*junk11$spec[,1]),log="y",main="Seasonally corrected Recife spectrum",xlab="frequency", ylab="",las=1, type="l")

Page 36: An Example. The question Data Analyses Conclusions

SARIMAX

Yt = αYt-1 + βYt-12 + γXt + Nt

ar1 ma1 sar1 sma1 intercept teries1

0.7885 -0.3717 0.9996 -0.9474 25.5572 -0.0255

s.e. 0.0610 0.1014 0.0006 0.0321 0.6403 0.0149

sigma^2 estimated as 0.1792: log likelihood = -275.68, aic = 565.35

Junk1<-arima(series1,order=c(1,0,1),seasonal=list(order=c(1,0,1),period=12),xreg=teries1)

Page 37: An Example. The question Data Analyses Conclusions

The answer to the question:

There is a hint of a linear time invariant relationship.