14
GARCH modelling in association with FFTeARIMA to forecast ozone episodes Ujjwal Kumar * , Koen De Ridder Environmental Modelling Unit, VITO-Flemish Institute for Technological Research, Boeretang 200, 2400 Mol, Belgium article info Article history: Received 6 April 2010 Received in revised form 24 June 2010 Accepted 28 June 2010 Keywords: GARCH FFT ARIMA O 3 -episodes Air quality modelling Air pollution abstract In operational forecasting of the surface O 3 by statistical modelling, it is customary to assume the O 3 time series to be generated through a homoskedastic process. In the present work, weve taken hetero- skedasticity of the O 3 time series explicitly into account and have shown how it resulted in O 3 forecasts with improved forecast condence intervals. Moreover, it also enabled us to make more accurate proba- bility forecasts of ozone episodes in the urban areas. The study has been conducted on daily maximum O 3 time series for four urban sites of two major European cities, Brussels and London. The sites are: Brussels (Molenbeek) (B1), Brussels (PARL.EUROPE) (B2), London (Brent) (L1) and London (Bloomsbury) (L2). Fast Fourier Transform (FFT) has been used to model the periodicities (annual periodicity is especially distinct) exhibited by the time series. The residuals of actual data subtracted with their corresponding FFT componentexhibited stationarity and have been modelled using ARIMA (Autoregressive Integrated Moving Average) process. The MAPEs (Mean absolute percentage errors) using FFTeARIMA for one day ahead 100 out of sample forecasts, were obtained as follows: 20%,17.8%,19.7% and 23.6% at the sites B1, B2, L1 and L2. The residuals obtained through FFTeARIMA have been modelled using GARCH (Generalized Autoregressive Conditional Heteroskedastic) process. The conditional standard deviations obtained using GARCH have been used to estimate the improved forecast condence intervals and to make probability forecasts of ozone episodes. At the sites B1, B2, L1 and L2, 91.3%, 90%, 70.6% and 53.8% of the times probability forecasts of ozone episodes (for one day ahead 30 out of sample) have correctly been made using GARCH as against 82.6%, 80%, 58.8% and 38.4% without GARCH. The incorporation of GARCH also signicantly reduced the no. of false alarms raised by the models. Ó 2010 Elsevier Ltd. All rights reserved. 1. Introduction Surface O 3 is one of the six criteria air pollutants and a critical air quality indicator (Masters, 1998). Therefore, forecasting and investi- gating statistical nature of O 3 concentration in ambient urban envi- ronment have been the subject of many of the studies (e.g., Prior et al., 1981; Simpson and Layton, 1983; Robeson and Steyn, 1990; Hubbard and Cobourn, 1998, 2007; Slini et al., 2002; Kumar et al., 2009; Tsai et al., 2009; Demuzere and van Lipzig, 2010; etc). Prior et al. (1981) applied regression model for forecasting daily maximum ozone which will occur later in the day in terms of solar radiation intensity, temperature, wind-speed and NOx data taken earlier in the day at St. Louis. Although their model had an overall 83% accuracy in predicting daily maximum O 3 concentration, the model was not quite successful in predicting higheozone days (O 3 > 120 ppb). To develop a probabilistic forecast of ozone concentrations, Robeson and Steyn (1989) suggested that use be made of the inherent properties of seasonality and autocorrelation in O 3 time series. A nonstationary, autocorrelated stochastic process is used to simulate a conditional probability density function (p.d.f.) which quanties the effects of seasonality and autocorrelation. Robeson and Steyn (1990) used three models namely e (1) A univariate deterministic/stochastic model, (2) A univariate Autoregressive Integrated Moving Average (ARIMA) model, and (3) A bivariate temperature and persistence based regression model to estimate daily maximum O 3 concentration in the lower Fraser valley of British Columbia. They concluded that the ARIMA model had nearly the same predictive capability as persistence model while the mixed deterministic/stochastic model performs the worst. Hubbard and Cobourn (1998) made use of 10 parameters multiple linear regression model to predict daily domain level peak O 3 and found that 50% of the forecasts are within 7.6 ppb, and on 80% of the accuracy was within 14.8 ppb Slini et al. (2002) applied autoregressive integrated moving average (ARIMA) to maximum ozone concentration forecasts in Athens, Greece for the analysis of a 9-year air quality observation record. Results show a good index of agreement, accompanied by a weakness in fore- casting alarms. Cobourn (2007) applied Takagiesugeno fuzzy system and a nonlinear regression(NLR) model and report their performance * Corresponding author. Tel.: þ32 14 336761. E-mail addresses: [email protected], [email protected] (U. Kumar). Contents lists available at ScienceDirect Atmospheric Environment journal homepage: www.elsevier.com/locate/atmosenv 1352-2310/$ e see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.atmosenv.2010.06.055 Atmospheric Environment 44 (2010) 4252e4265

GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

Embed Size (px)

Citation preview

Page 1: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

lable at ScienceDirect

Atmospheric Environment 44 (2010) 4252e4265

Contents lists avai

Atmospheric Environment

journal homepage: www.elsevier .com/locate/atmosenv

GARCH modelling in association with FFTeARIMA to forecast ozone episodes

Ujjwal Kumar*, Koen De RidderEnvironmental Modelling Unit, VITO-Flemish Institute for Technological Research, Boeretang 200, 2400 Mol, Belgium

a r t i c l e i n f o

Article history:Received 6 April 2010Received in revised form24 June 2010Accepted 28 June 2010

Keywords:GARCHFFTARIMAO3-episodesAir quality modellingAir pollution

* Corresponding author. Tel.: þ32 14 336761.E-mail addresses: [email protected], ujjwal.ku

1352-2310/$ e see front matter � 2010 Elsevier Ltd.doi:10.1016/j.atmosenv.2010.06.055

a b s t r a c t

In operational forecasting of the surface O3 by statistical modelling, it is customary to assume the O3 timeseries to be generated through a homoskedastic process. In the present work, we’ve taken hetero-skedasticity of the O3 time series explicitly into account and have shown how it resulted in O3 forecastswith improved forecast confidence intervals. Moreover, it also enabled us to make more accurate proba-bility forecasts of ozone episodes in the urban areas. The study has been conducted on daily maximum O3

time series for four urban sites of two major European cities, Brussels and London. The sites are: Brussels(Molenbeek) (B1), Brussels (PARL.EUROPE) (B2), London (Brent) (L1) and London (Bloomsbury) (L2). FastFourier Transform (FFT) has been used to model the periodicities (annual periodicity is especially distinct)exhibited by the time series. The residuals of “actual data subtracted with their corresponding FFTcomponent” exhibited stationarity and have been modelled using ARIMA (Autoregressive IntegratedMoving Average) process. The MAPEs (Mean absolute percentage errors) using FFTeARIMA for one dayahead 100 out of sample forecasts, were obtained as follows: 20%,17.8%,19.7% and 23.6% at the sites B1, B2,L1 and L2. The residuals obtained through FFTeARIMA have been modelled using GARCH (GeneralizedAutoregressive Conditional Heteroskedastic) process. The conditional standard deviations obtained usingGARCH have been used to estimate the improved forecast confidence intervals and to make probabilityforecasts of ozone episodes. At the sites B1, B2, L1 and L2, 91.3%, 90%, 70.6% and 53.8% of the timesprobability forecasts of ozone episodes (for one day ahead 30 out of sample) have correctly been madeusing GARCH as against 82.6%, 80%, 58.8% and 38.4% without GARCH. The incorporation of GARCH alsosignificantly reduced the no. of false alarms raised by the models.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Surface O3 is one of the six criteria air pollutants and a critical airquality indicator (Masters, 1998). Therefore, forecasting and investi-gating statistical nature of O3 concentration in ambient urban envi-ronment have been the subject ofmanyof the studies (e.g., Prior et al.,1981; Simpson and Layton, 1983; Robeson and Steyn, 1990; Hubbardand Cobourn, 1998, 2007; Slini et al., 2002; Kumar et al., 2009;Tsai et al., 2009; Demuzere and van Lipzig, 2010; etc). Prior et al.(1981) applied regression model for forecasting daily maximumozone which will occur later in the day in terms of solar radiationintensity, temperature, wind-speed andNOx data taken earlier in theday at St. Louis. Although their model had an overall 83% accuracy inpredicting dailymaximumO3 concentration, themodelwasnot quitesuccessful in predicting higheozone days (O3> 120 ppb). To developa probabilistic forecast of ozone concentrations, Robeson and Steyn(1989) suggested that use be made of the inherent properties of

[email protected] (U. Kumar).

All rights reserved.

seasonality and autocorrelation in O3 time series. A nonstationary,autocorrelated stochastic process is used to simulate a conditionalprobability density function (p.d.f.) which quantifies the effectsof seasonality and autocorrelation. Robeson and Steyn (1990) usedthree models namely e (1) A univariate deterministic/stochasticmodel, (2) A univariate Autoregressive Integrated Moving Average(ARIMA) model, and (3) A bivariate temperature and persistencebased regressionmodel to estimate dailymaximumO3 concentrationin the lower Fraser valley of British Columbia. They concludedthat the ARIMA model had nearly the same predictive capability aspersistence model while the mixed deterministic/stochastic modelperforms the worst. Hubbard and Cobourn (1998) made use of 10parameters multiple linear regressionmodel to predict daily domainlevel peakO3 and found that 50% of the forecasts arewithin�7.6 ppb,and on 80% of the accuracy was within �14.8 ppb Slini et al. (2002)applied autoregressive integrated moving average (ARIMA) tomaximum ozone concentration forecasts in Athens, Greece forthe analysis of a 9-year air quality observation record. Results showa good index of agreement, accompanied by a weakness in fore-casting alarms. Cobourn (2007) applied Takagiesugeno fuzzy systemand a nonlinear regression(NLR)model and report their performance

Page 2: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4253

results in terms of mean absolute error (8.0 ppb for Takagiesugenofuzzy system and 8.1 ppb for NLR). Tsai et al. (2009) applied two costsensitive artificial neural network (ANN) methods to forecastO3-episode. They found that cost sensitive artificial neural network(ANN) methods perform better than the standard artificial neuralnetwork.

It is to note that a simplemultiple linear regressionmodel is oftendifficult to construct for air pollutants as there exists no direct linearrelationship among the atmospheric variables. Various studies havealso used neural-networks for air quality forecasting (e.g., Tsai et al.,2009). There are some studies that compare forecasting perfor-mances of ARIMA and neural-networks (e.g., Choon and Chuin,2008; Chatfield, 2004; Ho et al., 2002; Shabri, 2001; Tang et al.,1991; etc). However, there’s no single conclusive result. In general,ARIMA works as good as neural network and often outperformsneural network for one-step-ahead forecasts [Chatfield (2004), pp230e235 provides a review on it]. Even multivariate methods doesnot improve the forecasts most often. When purpose is forecastingand the variable in question is affected by innumerable no. of factorsfor which physical relations are not very well structured (as is thecasewith air pollutants), univariate time series techniquemost oftenoutperforms the multivariate methods (in terms of forecasting)(Chatfield, 2004, pp 90e103).

There are also various CTMs (chemical transport models)available [e.g., LOTOS (van Loon et al., 2000), CHIMERE (Schmidtet al., 2001)] that takes meteorology, atmospheric processes,chemical reactions etc into account in order to produce air qualityscenario of a region. These models also most often need to undergoa statistical adjustment exercise called “data-assimilation” (see e.g.,van Loon et al., 2000; Denby et al., 2008). This adjusts and improvesthe spatial scenario, however, producing good future forecastsusing CTMs is still a subject matter of intensive research (see e.g.,Denby et al., 2008; Honoré et al., 2008).

The present study focuses on the forecasting of dailymaximumO3concentration in urban areas. As noted above, when focus is purelyon forecasting, stochasticmodels oftenperformwell. It is noteworthythat in forecasting of surface O3 concentration by stochastic models,the heteroskedasticity of O3 time series has most often been ignoredin earlier studies. In the present study, deterministic part of the timeseries has been modelled using FFT (Fast Fourier Transform) and thestochastic part using ARIMA (Autoregressive Integrated MovingAverage). In addition to stationary stochastic model ARIMA, we’vetaken heteroskedasticity of O3 time series explicitly into account andmodelled it throughGARCH (Generalized Autoregressive ConditionalHeteroskedastic). The GARCH models have been used to reconstructthe new forecast confidence intervals and to make probability fore-casts of O3-episodes. Section 2 presents the description of data andsites used in the study. Methodology has been discussed in Section 3and the ‘results and discussion’ have been presented in Section 4.Section 5 concludes the study.

2. Data and sites

In the present work, four sites have been studied from twomajor European cities, viz, London and Brussels, two sites fromeach city. The O3 data for London’s two sites have been procuredfrom UK air quality archive (http://www.airquality.co.uk). Auto-matic Networks in UK produce hourly pollutant concentrations,with data being collected from individual sites by modem. Thewebsite also provides the statistics for daily maximum O3

concentration from the hourly average data that has been used inthe current study. The data for the Brussels region have beenobtained from European air quality database, i.e., the AIRBASE data-archive (http://air-climate.eionet.europa.eu/databases). The dailymaximum O3 data for the Brussels sites have been extracted from

the hourly average O3 AIRBASE data-file of the corresponding sitesfor the current study. The study sites are as follows:

1. Brussels (Molenbeek) (B1 henceforth) (AIRBASE station code:BE0184A): (Lat: 50�5100100, Lon: 4�2000600). AIRBASE data-archivedefines the characteristics of this site as “traffic urban residen-tial”. Monitor has been located very close to the busy traffic road.

2. Brussels (PARL.EUROPE) (B2 henceforth) (AIRBASE stationcode: BE0403A): (Lat: 50�500330’, Lon: 4�2203200). In terms ofcharacteristics, this site has been defined as “Urban backgroundresidential” by AIRBASE data-archive.

3. London (Brent) (L1 henceforth) (Lat: 51.589618�, Lon:�0.275519�): UK air quality archive categorizes this asa “suburban” site (a residential area).

4. London (Bloomsbury) (L2 henceforth) (Lat: 51.522287�, Lon:�0.125848�): The characteristics of this site has been defined as“urban background” by UK air quality data-archive. The moni-toring station is within a self-contained, air conditioned housinglocated within the north-east corner of a central Londongardens. The gardens are generally laid to grass with manymature trees. All four sides of the gardens are surrounded by 2/4lane one-way road system, which is subject to frequent traffic.

EU air quality standards (http://ec.europa.eu/environment/air/quality/standards.htm) prescribe a daily maximum 8-h meanO3 concentration to be 120 mg m�3. UK air quality standards (http://www.airquality.co.uk/standards.php) prescribe that 8 hourlyrunning or hourly mean O3 concentration should not exceed100 mg m�3 more than 10 times in a year. WHO air quality guide-lines (http://www.who.int/phe/health_topics/outdoorair_aqg/en/)suggest a threshold of 100 mg m�3 (8-h mean) of O3 concentrationfor adequate protection of public health.

In the present study, we’ve followed a threshold of 100 mg m�3

(WHO air quality guidelines and UK air quality standards) whilecalculating probability forecasts of O3-episodes.

3. Methodology

3.1. Fast Fourier Transform (FFT)

FFT is a variant of Discrete Fourier Transform (DFT) with onlydifference that FFT is computationally faster (Press et al., 2002).Computationally, DFT is of the order of O(N2) while FFT is of theorder of O(N.log2N). If ðx0; x1; x2.:; xN�1Þ denote a time series, itsDFT Hn is given by

Hn ¼XN�1

k¼0

xke2pikn=N (1)

which can be inverted by inverse Fourier transform as follows:

xk ¼ ð1=NÞXN�1

n¼0

Hne�2pikn=N (2)

Equation (1) is periodic in n with period N. Thus, the frequencyrange varies from �1/2 to 1/2 at discrete interval n/N. Now theperiodogram estimates of the power spectrum at differentfrequencies are given as (Press et al., 2002)

Pð0Þ ¼ Pðf0Þ ¼ 1N2jH0j2

PðfkÞ ¼ 1N2

h��Hk��2 þ ��HN�k

��2i (3)

where fkð¼ kNÞ is defined only for the zero and positive frequencies

(also Fourier transform (1) is symmetric, i.e., Hk and H�k have thesame value). In the present study, we’ve plotted power vs. period.

Page 3: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e42654254

The inspection of power spectrum helps us to identify and selectthe frequencies (periods) which are supposedly dominant in thetime series. After taking only the selected number of dominantfrequencies for a particular time series, we constructed the Fouriertransform and took inverse of it. In the present study, the timeseries obtained after inverse Fourier transform under selectedfrequencies has been called as FFT component of the time series.

3.2. ARIMA modelling

A time series {xt; t ¼ 0,�1,�2, ..} is ARMA (p, q) if it iscovariance stationary and can be represented as

xt ¼ f1xt�1 þ.þ fpxt�p þ 3t þ q13t�1 þ.:þ qq3t�q; (4)

(Shumway and Stoffer, 2006)with fps0, qqs0, and 3t are the innovations with N(0,s23 ) and

s23>0. The parameters p and q are called the autoregressive [AR(p)]and the moving average [MA(q)] orders, respectively. Whena time series doesn’t appear covariance stationary, the differencingprocedure may be applied to make it stationary. Then, the ARMA(p,q) model can be applied to the stationary differenced time seriesand model so constructed is called ARIMA(p, d, q,) model whered denotes the order of differencing (Shumway and Stoffer, 2006;Brockwell and Davis, 2002). The parameters f and q have beenestimated using maximum likelihood method (Brockwell andDavis, 2002) in the present study.

An inspection of autocorrelation function (ACF) and partialautocorrelation function (PACF) helps in identifying the orders AR(p) and MA(q). In addition, more objectively defined criterions suchas Akaike information criterion (AIC), HannoneQuinn InformationCriterion (HIC), Bayesian Information Criterion (BIC) and FinalPrediction Error (FPE) can also be used to identify the correct ordersp and q (Brockwell and Davis, 2002; Kumar and Jain, 2009).

3.3. GARCH modelling

Let 3t denote a real valued discrete-time stochastic process. Inthis study, 3t are the innovations of the ARMA process in equation(4). Engle (1982) defined them as an autoregressive conditionalheteroskedastic process where all 3t are of the form

3t ¼ ztst ; (5)

where zt is an identically independent distributed process withzero mean and unit variance. Although 3t is serially uncorrelated bydefinition, its conditional variance equals s2t which might beautocorrelated and, therefore, may change over time.

The variance equation of the GARCH(p, q) can be expressed as(Bollerslev, 1986; Aradhyula and Holt, 1988; Shumway and Stoffer,2006; Brockwell and Davis, 2002)

ztwDqð0;1Þ; (6)

s2t ¼ a0 þXq

i¼1

ai32t�i þ

Xp

i¼1

bis2t�i (7)

s2t ¼ a0 þ aðBÞ32t�1 þ bðBÞs2t�1 (8)

where aðBÞ and bðBÞ are the appropriate polynomial of the lagoperator B, Dqð0;1Þ is the probability density function of theinnovations or residuals with zero mean and unit variance and

p � 0; q � 0 (9)

a0 > 0;ai � 0; i ¼ 1;2;.; q; and (10)

bi � 0; i ¼ 1;2;.; p: (11)

It is to note that, for p ¼ 0, the process reduces to an ARCH(q)process. Also, for p ¼q ¼0 the conditional variance is constant, as inARMA, and the innovation 3t simply reduces to white noise.

As an ARMA analogue, the GARCH process could be justifiedthrough a Wald’s decomposition type of argument as a moreparsimonious description. Bollerslev (1986) shows that theGARCH(1,1) process is wide-sense stationary with Eð3tÞ ¼ 0,varð3tÞ ¼ a0=ð1� að1Þ � bð1ÞÞ and covð3t ; 3sÞ ¼ 0 for tss if andonly if að1Þ þ bð1Þ < 1. The GARCH model parameters have alsobeen estimated using maximum likelihood method (Shumway andStoffer, 2006; Brockwell and Davis, 2002).

The key insight of GARCH lies in the distinction between condi-tional and unconditional variances of the innovations process f3tg.The term conditional implies explicit dependence on a pastsequence of observations. The termunconditional ismore concernedwith long term behaviour of a time series and assumes no explicitknowledge of the past. When the conditional variance parameterssatisfy the inequalities in equations (9)e(11), the unconditionalvariance (i.e., time-independent, or long-run variance expectation)of the innovations process f3tg is

s2 ¼ E�32t

�¼ a0

1�Ppi¼1 ai �

Pqj¼1 bj

Equivalently, it can easily be noted that long-run conditionalvariance expectation is explicitly dependent on GARCH modelparameters and becomes equal to unconditional variance of theinnovations process f3tg.

3.3.1. Test for the presence of ARCH/GARCH effects (Engle, 1982)Since ARCH model requires iterative procedures, it may be

desirable to test whether it is appropriate before going to the effortto estimate it. The Lagrange multiplier test is ideal for this as inmany similar cases (e.g., Breusch and Pagan, 1978, 1980; Godfrey,1978; Engle, 1979).

Under the null hypothesis, a1 ¼ a2 ¼ . ¼ ap ¼ 0. The testis based upon the score under the null and the informationmatrix under the null. Consider the ARCH model withs2t ¼ hðztaÞ, where h is some differentiable function which,therefore, includes both the linear and exponential cases aswell as lots of others and zt ¼ ð1; 32t�1;.; 32t�pÞ where 3t are theordinary least square residuals. Under the null, s2t is a constantdenoted by s20. Engle (1982) shows that the LM test statisticscan be consistently estimated by

x* ¼ 12f 0

0zðz0zÞ�1z0f 0

where z0 ¼ ðz’1;.; z’T Þ and f0 is the column vector of ð32ts2t� 1Þ.

It is to note that f 0’f 0=T ¼ 2 because normality has beenassumed. Thus, an asymptotically equivalent statistics would be

x ¼ Tf 0’zðz’zÞ�1z’f 0=f 0’f 0 ¼ TR2

where R2 is the squaredmultiple correlation between f0 and z. Sinceadding a constant andmultiplying by a scalar will not change the R2

of a regression, this is also the R2of the regression of 32t on anintercept and p lagged values of 32t . The statistic will be asymptot-ically distributed as chi square with p degrees of freedom whenthe null hypothesis is true. Thus, the test procedure is to run theordinary least square regression and save the residuals. Regress thesquare residuals on a constant and p lags and test TR2 as a c2p .

Page 4: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4255

4. Results and discussion

All the computations involved in the present task havebeen carried out on MATLAB� 7.7 platform. The matlab toolboxesucsd_garch (available from http://www.kevinsheppard.com/wiki/UCSD_GARCH) and the econometric-toolbox (availablefrom http://www.spatial-econometrics.com/) have been freelyused. To validate the forecasting performance of an FFTeARIMAmodel, last 100 data from each time series has been kept out ofmodelling procedure. The models’ performance for out of sampleforecasts has been evaluated on the basis of mean absoluteerror (MAE), mean absolute percentage error (MAPE) and rootmean square error (RMSE). While evaluating the performance ofGARCH models to make probabilistic forecasts of O3-episodes, 30“out of sample” data has been used. This is related to the factthat long term conditional variances essentially converge tounconditional variance (Section 3.3), thus, taking too long “out ofsample” data-points can’t properly exhibit the effectiveness ofGARCH model. Hence, we limit to 30 “out of sample” data-pointswhile using GARCH. Section 4.1 presents general description oftime series at the four urban sites. Section 4.2 carries outFFT modelling for each time series. ARIMA modelling resultsfor the residuals obtained after subtracting FFT componentfrom observed data have been presented in Section 4.3. Section4.4 discusses the forecasting performances of FFTeARIMAmodels applied on each time series. GARCH modelling outcomes

a

c

Fig. 1. Time Series plots of daily maximum O3 concentration at, (a) Brussels (Molenbeek) (B1Jan-02 to 30-Jun-07, (c) London (Brent) (L1) for the period 1-May-96 to 30-sep-07, and (d)

and its effect on models’ performance have been discussed inSection 4.5.

4.1. Time series of daily maximum ambient O3 concentration

Fig. 1(a)e(d) present the time series plots of daily maximumambient O3 concentration at four different urban sites of Brusselsand London. A total of 65, 48, 138 and 221 missing data has beenencountered out of 3499, 2191, 4170 and 3196 data at Brussels(Molenbeek) (B1), Brussels (PARL.EUROPE) (B2), London (Brent) (L1)and London (Bloomsbury) (L2), respectively. These missing valueshave been filled up using linear interpolation technique based onstate space method (Ljung, 1999). A visual inspection of these timeseries clearly reveals that an annual cycle is present in each of thetime series. This feature has been exploited using FFT technique.

4.2. FFT modelling

Fig. 2(a)e(d) shows the power vs. period plot of each time serieswith the most dominant periods marked. The frequencies corre-sponding to these dominant periods have been chosen to constructthe FFT component of the time series. For each of the site, first threepredominant frequencies [corresponding periods have beenmarkedin Fig. 2(a)e(d)] have been chosen to reconstruct the periodic (FFT)component of the time series. Fig. 3(a)e(d) shows the FFT compo-nent of each time series along with the original time series.

b

d

) for the period 1-Jan-98 to 31-July-07, (b) Brussels (PARL.EUROPE) (B2) for the period 1-London (Bloomsbury) (L2) for the period 1-Jan-00 to 30-Sep-08.

Page 5: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

a b

c d

Fig. 3. FFT component (in red) of the original daily max O3 time series (in blue) at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d)London (Bloomsbury) (L2). (For the interpretation of the reference to color in this figure legend the reader is referred to the web version of this article.)

a b

c d

Fig. 2. Power vs. Time-period plots for the daily maximum O3 time series at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London(Bloomsbury) (L2).

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e42654256

Page 6: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

a b

c d

Fig. 5. ACF (autocorrelation function) of FFT residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2).The two (blue) straight lines parallel to x-axis show the 95% confidence bounds. (For the interpretation of the reference to color in this figure legend the reader is referred to the webversion of this article.)

a b

c d

Fig. 4. The FFT residuals (original data subtracted corresponding FFT component) at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d)London (Bloomsbury) (L2).

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4257

Page 7: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

a b

c d

Fig. 6. PACF (partial autocorrelation function) of FFT residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London(Bloomsbury) (L2). The two (blue) straight lines parallel to x-axis show the 95% confidence bounds. (For the interpretation of the reference to color in this figure legend the reader isreferred to the web version of this article.)

Table 1The AR and MA coefficients of applied ARIMA models to FFT residuals at differentsites.

Coefficients Std error t-statistics P value

Brussels (Molenbeek) (B1)AR(1) 0.8042 0.0230 34.9 <0.0001MA(1) �0.2065 0.0298 �6.9 <0.0001MA(2) �0.1545 0.0241 �6.4 <0.0001

Brussels (PARL.EUROPE) (B2)

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e42654258

4.3. ARIMA modelling

In the next step, FFT component has been subtracted from theoriginal time series to obtain the FFT residuals. These FFT residualshave been presented in Fig. 4(a)e(d). A visual examination of Fig. 4(a)e(d) suggests that these time series can be considered covari-ance stationary as there are no upward or downward or exponen-tial or sinusoidal or diminishing or expanding or any such apparenttrend. The autocorrelation function (ACF) pattern of these FFTresiduals have been shown in Fig. 5(a)e(d). The ACF pattern in eachfig clearly exhibits a pattern somewhat similar to exponentialdecline. Such pattern favour for AR(1) process. In addition, PACFpattern (Fig. 6(a)e(d)) also shows that AR(1) coefficients are themost significant part of the process. We also applied AIC, BIC, HICand FPE criterions in order to ascertain the orders p, q of ARMA(p, q)models as correctly as possible (Section 3). Based on these crite-rions and the test of stationarity and invertibility of ARIMA models(Kumar and Jain, 2009), the final selected model for FFT residualsare ARIMA(1,0,2) for all the study sites. Table 1 lists the AR and MAcoefficients with their statistics at different sites. The t-statisticsand the p values in Table 1 clearly indicate that these coefficients inthe ARIMA models are statistically quite significant.

AR(1) 0.7764 0.0317 24.5 <0.0001MA(1) �0.1942 0.0401 �4.8 <0.0001MA(2) �0.0886 0.0319 �2.8 0.0052

London (Brent) (L1)AR(1) 0.8141 0.0194 41.9 <0.0001MA(1) �0.2529 0.0257 �9.8 <0.0001MA(2) �0.1214 0.0210 �5.8 <0.0001

London (Bloomsbury) (L2)AR(1) 0.8275 0.0232 35.7 <0.0001MA(1) �0.2998 0.0305 �9.8 <0.0001MA(2) �0.1556 0.0243 �6.4 <0.0001

4.4. Forecasting performance of FFTeARIMA models

Forecasting performances of the applied FFTeARIMAmodels havebeen evaluated against 100 out of sample one day ahead forecasts.The appropriateness of applied FFTeARIMA models has been testedagainst whiteness of their residuals (Shumway and Stoffer, 2006;Kumar and Jain, 2009). Fig. 7(a)e(d) shows the ACF of ARIMA resid-uals. All the ACF values are almost within the confidence bounds andhence residuals can effectively be considered to follow white noise

process. Table 2 shows the forecasting performances of selectedFFTeARIMA models against the indicators MAE, RMSE, MAPE andobserved/predicted mean. The MAPE values for 100 out of sampleforecasts were obtained as follows: 20%, 17.8%, 19.7% and 23.6%at Brussels (Molenbeek), Brussels (PARL.EUROPE), London (Brent)and London (Bloomsbury). The first 20 out of sample one day aheadFFTeARIMA forecasts with their forecast confidence intervals havebeen shown in Figs. 9 and 10. Only the first 20 out of sample forecastshave been shown in Figs. 9 and 10 so that a clear comparison offorecast confidence intervals can be made between those obtainedfrom FFTeARIMA and FFTeARIMAeGARCH models (Section 4.5).

Page 8: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

a b

c d

Fig. 7. ACF (autocorrelation function) of ARMA residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury)(L2). The two (blue) straight lines parallel to x-axis show the 95% confidence bounds. For the interpretation of the reference to color in this figure legend the reader is referred to theweb version of this article.)

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4259

4.5. GARCH modelling, forecast confidence intervalsand probability forecasts of O3-episodes

Fig. 7(a)e(d) shows the ACF of the residuals of appliedFFTeARIMA models while Fig. 8(a)e(d) shows the ACF of thesquared residuals of the same. In Fig. 7(a)e(d), all the ACF values arewithin the confidence bounds, i.e., there exist no autocorrelationamong the FFTeARIMA residuals. In other words, FFTeARIMAresiduals follow the white noise process. However, Fig. 8(a)e(d)clearly shows that a significant number of ACF values are out ofconfidence bounds, i.e., squared residuals do not obey the whitenoise assumption and exhibit correlation in the variances. Thisclearly indicates that heteroskedasticity exists in the process.The correlation structure of the squared residuals can be exploitedusing the GARCH modelling process. To reaffirm that the squaredresiduals exhibit ARCH/GARCH effects, we’ve also applied “Engle’shypothesis test to detect ARCH/GARCH effects” (Section 3) on thesesquared residuals. Table 3 presents the test results, it clearly showsthat the null hypothesis of “no ARCH/GARCH effects” is rejected ineach case (H ¼ 1 in each case). Hence, the application of GARCH

Table 2Forecasting Performance of FFTeARIMA Models for 100 out of sample forecasts.

Brussels(Molenbeek)(B1)

Brussels(PARL.EUROPE)(B2)

London(Brent)(L1)

London(Bloomsbury)(L2)

MAE (mg m�3) 15.5 15.5 12.6 10.9RMSE (mg m�3) 21.2 20.9 16.7 14.6MAPE (%) 20.0 17.8 19.7 23.6Observed Mean

(mg m�3)80.2 92.8 65.6 53.5

Predicted Mean(mg m�3)

77.0 92.5 67.6 52.2

modelling may be useful to remove the ARCH/GARCH effectspresent in the corresponding time series.

GARCH models have been estimated on the assumption thatthe conditional distribution of FFTeARIMA residuals follow theGaussian process. For each time series of squared FFTeARIMAresiduals, GARCH(1,1) models were estimated first because theyare parsimonious and are often the most likely candidates in theapplied analysis (Aradhyula and Holt, 1988). After these initialestimates were obtained, several alternative specifications of theconditional variance equationwere examined. Each alternative wasexamined for improvements in model fit and parameter signifi-cance relative to the GARCH(1,1) process. Following this identifi-cation and selection process, it was determined that a GARCH(1,1)process was adequate for explaining the conditional variances atsites B2, L1 and L2while GARCH(2,1)model is a good fit for residualstime series at site B1. Table 4 reports the GARCH model parametersand their statistics for each time series. By looking at t-statistics andp-values, it is evident that each coefficient of the respective GARCHmodels is statistically significant. It can be easily demonstrated thatað1Þ þ bð1Þ < 1 ½að1ÞhGARCHð1Þ; bð1ÞhARCHð1Þ� in each case,i.e., for each site the defined GARCH processes are clearly stationary.To verify that no further ARCH/GARCH effects (or, heteroskedasticeffect) are present in the model-results, we have applied Engle’shypothesis test to detect the presence of ARCH/GARCH effects(Engle, 1982) on the standardized residuals of the GARCHmodels. Table 5 reports the test results for 10, 15 and 20 lags ofsquared sample residuals at 0.01 and 0.05 level of significance.Table 5 shows that the null hypothesis of “no ARCH/GARCHeffects” in case of site B1 is accepted at 10 lags but rejected at 15,20 lags at 0.05 level of significance, however, the same nullhypothesis is accepted at all lags at 0.01 level of significance.Moreover, the t-stat is quite close to the critical value at 0.05level of significance for lag 15. Thus, for practical purposes, we

Page 9: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

a

c d

Fig. 8. ACF (autocorrelation function) of squared ARMA residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London(Bloomsbury) (L2). The two (blue) straight lines parallel to x-axis show the 95% confidence bounds. (For the interpretation of the reference to color in this figure legend the reader isreferred to the web version of this article.)

Fig. 9. Representation of one day ahead 20 out of sample forecasts for daily maximum O3 concentration at Brussels (Molenbeek) (B1) and Brussels (PARL.EUROPE) (B2) usingFFTeARIMA and FFTeARIMAeGARCH.

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e42654260

Page 10: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

Fig. 10. Representation of one day ahead 20 out of sample forecasts for daily maximum O3 concentration at London (Brent) (L1) and London (Bloomsbury) (L2) using FFTeARIMAand FFTeARIMAeGARCH.

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4261

can accept that the GARCH(2,1) model explains the hetero-skedasticity present in the squared FFTeARIMA residuals at siteB1. For all the other sites, it is amply clear from Table 5 that thenull hypothesis of “no ARCH/GARCH effects” is accepted in eachcase (H ¼ 0 in each case), hence the selected GARCH models [i.e.,GARCH(1,1)] sufficiently explain the heteroskedasticity present in

Table 3Engle’s hypothesis test to detect the presence of ARCH/GARCH effects on the residuals orejecting the null hypothesis].

Lag (M) Null hypothesis (H) P value

(Level of significance)

0.05 0.01

B110 1 1 015 1 1 020 1 1 0

B210 1 1 015 1 1 020 1 1 0

L110 1 1 015 1 1 020 1 1 0

L210 1 1 015 1 1 020 1 1 0

the time series of the squared FFTeARIMA residuals at the sitesB2, L1 and L2.

Since GARCH model gives the estimate of conditional varianceson the basis of its own lagged values and the lagged squaredresiduals, the conditional standard deviations can easily beobtained by taking the square root of the estimated conditional

f FFTeARIMA models [H ¼ 0 means accepting the null hypothesis and H ¼ 1 means

Stat Critical Value

(Level of significance)

0.05 0.01

233.732 18.307 23.209252.176 24.996 30.577259.465 31.410 37.566

122.432 18.307 23.209142.610 24.996 30.577154.404 31.410 37.566

303.498 18.307 23.209306.662 24.996 30.577318.340 31.410 37.566

186.150 18.307 23.209192.948 24.996 30.577196.209 31.410 37.566

Page 11: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

Table 4GARCH model parameters and their statistics.

Coefficients Std error t-statistics P value

Brussels (Molenbeek) (B1)GARCH(1) 0.381 0.115 3.3 0.0010GARCH(2) 0.363 0.102 3.6 0.0003ARCH(1) 0.161 0.018 8.9 <0.0001

Brussels (PARL.EUROPE) (B2)GARCH(1) 0.866 0.023 36.7 <0.0001ARCH(1) 0.090 0.015 5.8 <0.0001

London (Brent)(L1)GARCH(1) 0.733 0.017 41.5 <0.0001ARCH(1) 0.171 0.012 13.9 <0.0001

London (Bloomsbury) (L2)GARCH(1) 0.833 0.013 60.8 <0.0001ARCH(1) 0.118 0.011 10.7 <0.0001

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e42654262

variances. These conditional standard deviations have also beenestimated for 100 out of sample one day ahead forecasts. Subse-quently, they have been used to estimate the 95% forecast confi-dence intervals. The first 20 out of sample forecasts (explainedbelow) with newly estimated 95% forecast confidence intervalshave been presented in Figs. 9 and 10 for all the sites B1, B2, L1 andL2, respectively. Since the long term conditional variances tend tobe equal to unconditional variance in the limit (Section 3), so theforecast confidence intervals obtained from both FFTeARIMA andFFTeARIMAeGARCH tend to merge in the long run. Hence, differ-ence between the two forecast confidence intervals is practicallyindiscernible in long time series plots, so we limit to show first 20out of sample forecasts on the plots.

The newly constructed forecast confidence intervals usingGARCH models can be compared with those of earlier estimatedforecast confidence intervals (using FFTeARIMA) [Figs. 9 and 10].Figs. 9 and 10 show that the forecast confidence intervals obtainedfrom FFTeARIMAeGARCH is smaller than the forecast confidenceintervals obtained from FFTeARIMA. Though both forecast confi-dence intervals tend to merge after a sufficient long period, there isa clear improvement in forecast confidence intervals for short-term.It is to note that shorter is the forecast confidence interval, greater isthe reliability of the forecasts. For example, the information that “thenext day max O3 concentration at site B1 is likely to be 57 and with95% confidence it is predicted that max O3 concentration is likely to

Table 5Engle’s hypothesis test to detect the presence of ARCH/GARCH effects on the standardizemeans rejecting the null hypothesis].

Lag (M) Null hypothesis (H) p Value

(Level of significance)

0.05 0.01

B110 0 0 0.14115 1 0 0.02920 1 0 0.017

B210 0 0 0.35315 0 0 0.11420 0 0 0.223

L110 0 0 0.05315 0 0 0.06820 0 0 0.133

L210 0 0 0.87815 0 0 0.76620 0 0 0.853

stay in between 82.1 and 31.1” is more precise than the informationthat “the next day O3 concentration at site B1 is likely to be 57 andwith 95% confidence it is predicted that O3 concentration islikely stay in between 94.6 and 19.3”. The first information isfrom FFTeARIMAeGARCH model while the second one is fromFFTeARIMAmodel. Thus, the reliability of forecasts is improved ifweare able to provide better and smaller forecast confidence intervals.This has been made possible by exploiting the correlation structurein variances as is the case with FFTeARIMAeGARCH models [Figs. 9and 10]. This usefulness of FFTeARIMAeGARCH model will furtherbecome more evident whenwemake probability forecasts of ozoneepisodes.

The main purpose of air pollutants forecasts is to issue a fore-warning to the public whether air pollutant concentration exceedsthe prescribed threshold or not. Thus, the information that whetherthe next day O3 is likely to exceed the prescribed threshold or notmight be more useful than the point forecasts of O3 concentration.With this in mind, we try to make probability forecasts ofO3-episodes first with the help of FFTeARIMA models and thenwith the help of FFTARIMAeGARCH models. To test the utility ofsuch probability forecasts, we’ll estimate it for the 30 out of samplemembers also. In this study, we’ve followed the WHO air qualityguidelines and the UK air quality standards that prescribe theambient O3 e concentration to be less than 100 mg m�3 as safe(details in Section 2). Since, the models have been constructed onthe assumption that the conditional distribution of residuals follow“Gaussian process”, the probability of O3-episode occurrence(�100 mg m�3) has also been calculated using Gaussian probabilityfunction. To validate the approach, those 30 continuous days wereselected from the time series when O3-episodes were morefrequent. These 30 days were treated as “out of sample” and havebeen kept out of modelling procedure and the models were con-structed using all the time series data before these 30 days. It is tonote that model structure essentially remains the same as we’veconstructed the model using sufficiently long time series and stillhave sufficient length of time series. For site B1, these 30 days werefrom 28-Jun-06 to 27-July-06, for site B2 these days were from9-Jun-06 to 8-July-06, for site L1 these days were from 18-Jun-06 to17-July-06 and for site L2 these 30 days were from 3-July-06 to1-Aug-06. When GARCH modelling was ignored, probability (p) ofO3-episode occurrence was calculated on the basis unconditionalstandard deviation obtained using FFTeARIMA models. When

d residuals of GARCH models [H ¼ 0 means accepting the null hypothesis and H ¼ 1

Stat Critical Value

(Level of significance)

0.05 0.01

14.755 18.3070 23.20926.866 24.9958 30.57735.570 31.4104 37.566

11.049 18.3070 23.20922.734 24.9958 30.57724.453 31.4104 37.566

18.097 18.3070 23.20923.816 24.9958 30.57727.056 31.4104 37.566

5.188 18.3070 23.20910.806 24.9958 30.57713.908 31.4104 37.566

Page 12: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

Table 6O3-Episodes Analysis Results: Percentage of O3-episodes correctly forecasted and the percentage of false alarms for 30 out of sample one day ahead forecasts.

Sites O3 Critical Limit ¼ 100 mg m�3

Percentage of O3-episodes Correctly Forecasted False Alarms

FFTeARIMA FFTeARIMAeGARCH FFTeARIMA FFTeARIMAeGARCH

B1(28-Jun-06 to 27-July-06) 82.6% 91.3% 9.5% 8.7%B2(9-Jun-06 to 8-July-06) 80% 90% 20% 18.2%Lr(18-Jun-06 to 17-July-06) 58.8% 70.6% 9.1% 7.7%Ll(3-July-06 to 1-Aug-06) 38.4% 53.8% 16.7% 12.5%

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4263

GARCH modelling was taken into account, conditional standarddeviations were used to calculate the probability (p) of O3-episodeoccurrence. We say that O3-episode is likely to occur if p � 0.6, i.e.,O3 concentration is going to be �100 mg m�3 if calculated p � 0.6. Aforecast is said to be a successful forecast if both of these (i.e.,p� 0.6 and O3 concentration� 100 mgm�3) occur together the nextday. The criterion p � 0.6 is qualitative but reasonable in the sensethat probability p � 0.6 represents more likelihood of occurrenceof a phenomenon than its nonoccurrence. We define a forecast tobe a “false alarm” if model gives p � 0.6 but the observed O3concentration turn out to be <100 mg m�3 the next day. Now thetotal percentage of correctly forecasted O3-episodes and the totalpercentage of false alarms have been calculated as follows:

Percentage of successful forecasts¼ 100 �ðNo: of successful forecasts=Total No: of O3 � episodes actually occurredÞ;

Fig. 11. Representation of one day ahead 20 out of sample daily maximum O3 concentratiprobabilistic forecasts of O3-episodes have been made using FFTeARIMA and FFTeARIMAe

Percentage of false alarms

¼ 100 �ðNo: of false alarms=

Total No: of O3 � episodes forecastedÞ;

Both of these quantities have been calculated for the considered30 out of sample days. The results for each site have been presentedin Table 6. The observed, forecasts and the forecast confidenceintervals (of FFTeARIMA and FFTeARIMAeGARCH) for the corre-sponding days have been depicted in Figs.11 and 12. Table 6 reportsthat the percentage of correct probability forecasts of O3-episoderanges from 53.8% to 91.3% using GARCH and the performanceof GARCH model is better up to 8.7 to 15.4%. A comparison of thepercentage of false alarms raised by both the models can also bemade using Table 6. The results show that the no. of false alarmsraised by FFTeARIMAeGARCH is either less or comparable to falsealarms raised by FFTeARIMA models at all the sites.

on forecasts at Brussels (Molenbeek) (B1) and Brussels (PARL.EUROPE) (B2) for whichGARCH.

Page 13: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

Fig. 12. Representation of one day ahead 20 out of sample daily maximum O3 concentration forecasts at London (Brent) (L1) and London (Bloomsbury) (L2) for which probabilisticforecasts of O3-episodes have been made using FFTeARIMA and FFTeARIMAeGARCH.

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e42654264

Thus, at all the sites, there are significant improvement both inthe correctly forecasted episodes as well as in the reduction of no.of false alarms by introducing GARCH modelling procedure.

5. Conclusion

The present study has applied GARCH modelling technique inassociationwith FFTeARIMA in order to forecast dailymaximumO3concentration and tomake probabilistic forecasts of ozone episodesat four urban sites of two major European cities (London andBrussels). In the modelling process, the ARIMA model structure[ARIMA(1,0,2)] is same for all the sites, however, GARCH modelstructure differs at site B1 where it is GARCH(2,1) while it is GARCH(1,1) for all the other sites. This might be related to the fact that thesite B1 is a busy traffic site while rest of the sites are of urbanbackground characteristics. At a busy traffic site, innumerable no. offactors play their role in governing the air pollutants concentration.Many of the times, local disturbances such as traffic-jam etc mayalso play a significant role in governing the transport, dispersal ofair pollutants. These all might introduce some different character-istic in the O3 time series which might be absent at urban back-ground sites. In other words, there are much more randomperturbations at a traffic site than those of urban background sites.This also possibly makes the nature of heteroskedasticity at a trafficsite different than those of urban background sites as hetero-skedasticity in a time series is mainly an outcome of short-termrandom perturbations introduced in the time series. Thus, GARCHmodel structure begs to differ at a busy traffic site. On the otherhand, FFT captures long term cyclic trends in a time series while

ARIMA exploits the long term stationary characteristics of the timeseries which, for a traffic site, might remain similar/comparable tothose of urban background sites. This possibly explains whyFFTeARIMA model structure remains the same at all the sites.

However, the results clearly reveal that modelling hetero-skedastic effects using GARCH in O3 time series not only improvesthe short-term forecast confidence intervals but also makes moreaccurate short-term probability forecasts of O3-episodes. At all thesites, introduction of GARCH models have significantly improvedthe probability forecasts of ozone episodes. In addition to theimprovement of correctly forecasted O3-episodes, the no. of falsealarms has also reduced at these sites. Although the present studyhas been conducted for the four urban sites, the methodology isquite general in nature and can be extended to many other siteswhere similar structure of daily maximum O3 time series can easilybe exploited.

References

Aradhyula, S.V., Holt, M.T., 1988. GARCH time-series models: an application to retaillivestock prices. Western Journal of Agricultural Economics 13, 365e374.

Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity.Journal of Econometrics 31, 307e327.

Breusch, T.S., Pagan, A.R., 1978. A simple test for heteroscedasticity and randomcoefficient variation. Econometrica 46, 1287e1294.

Breusch, T.S., Pagan, A.R., 1980. The Lagrange multiplier test and its application tomodel specification. Review of Economic Studies 47, 239e254.

Brockwell, J.B., Davis, R.A., 2002. Introduction to Time Series and Forecasting.Springer-Verlag Inc, New York.

Chatfield, C., 2004. The Analysis of Time Series: An Introduction. Chapman & Hall/CRC, New York Washington, D.C. (also published in the Taylor & Francise-Library, 2009).

Page 14: GARCH Modelling in Association With FFT ARIMA to Forecast Ozone Episodes 2010 Atmospheric Environment

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 4265

Choon, O.H., Chuin, J.L.T., 2008. A Comparison of Neural Network Methods and Box-Jenkins Model in Time Series Analysis. From Proceeding (605) Advances inComputer Science and Technology - 2008, 605-024.

Cobourn, W.G., 2007. Accuracy and reliability of an automated air quality forecastsystem for ozone in seven Kentucky metropolitan areas. Atmospheric Envi-ronment 41, 5863e5875.

Demuzere, M., van Lipzig, N.P.M., 2010. A new method to estimate air-quality levelsusing a synoptic-regression approach. Part I: present-day O3 and PM10 anal-ysis. Atmospheric Environment 44, 1341e1355.

Denby, B., Schaap, M., Segers, Arjo, Builtjes, Peter, Horálek, Jan, 2008. Comparison oftwo data assimilation methods for assessing PM10 exceedances on the Euro-pean scale. Atmospheric Environment 42, 7122e7134.

Engle, R.F., 1979. A General Approach to the Construction of Model DiagnosticsBased upon Lagrange Multiplier Principle. University of California, San Diego.Discussion Paper 79-43.

Engle, R.F., 1982. Autoregressive conditional heteroskedasticity with estimates ofthe variance of United Kingdom inflation. Econometrica 50, 987e1007.

Godfrey, L.G., 1978. Testing against general autoregressive and moving average errormodels when the regressors include lagged dependent variables. Econometrica46, 1293e1302.

Ho, S.L., Xie, M., Goh, T.N., 2002. A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Computers and IndustrialEngineering 42, 371e375.

Honoré, C., Rouil, L., Vautard, R., et al., 2008. Predictability of European air quality:assessment of 3 years of operational forecasts and analyses by the PREV’AIRsystem. Journal of Geophysical Research 113 (D04301). doi:10.1029/2007JD008761.

Hubbard, M.C., Cobourn, M.C., 1998. Development of a regression model to forecastground-level ozone concentration in Louisville, KY. Atmospheric Environment32, 2637e2647.

Kumar, U., Prakash, A., Jain, V.K., 2009. A multivariate time series approach to studythe Interdependence among O3, NOx and VOCs in ambient urban atmosphere.Environmental Modeling and Assessment 14, 631e643.

Kumar, U., Jain, V.K., 2009. ARIMA forecasting of ambient air pollutants (O3, NO, NO2and CO). Stochastic Environmental Research and Risk Assessment. doi:10.1007/s00477-009-0361-8.

Ljung, L., 1999. System Identification e Theory for the User. NJ, Prentice Hall PTR.Masters, G.M., 1998. Introduction to Environmental Engineering and Science.

Pearson Education, Singapore.Press, W.H., Teukolsky, S.A., Vellerling, W.T., Flannery, B.P., 2002. Numerical Recipes in

Cþþ: the Art of Scientific Computing. Cambridge University Press, Cambridge.Prior, E.J., Schiess, J.R., McDougal, D.S., 1981. Approach to forecasting daily maximum

ozone levels in St. Louis. Environmental Science and Technology 15, 430e436.Robeson, S.M., Steyn, D.G., 1989. A conditional probability density function for

forecasting ozone air quality data. Atmospheric Environment 23, 689e692.Robeson, S.M., Steyn, D.G.,1990. Evaluation and comparisonof statistical forecastmodels

for dailymaximumozone concentrations. Atmospheric Environment 24B, 303e312.Schmidt, H., Derognat, C., Vautard, R., Beekmann, M., 2001. A comparison of

simulated and observed O3 mixing ratios for the summer of 1998 in WesternEurope. Atmospheric Environment 35, 6277e6297.

Shabri, A., 2001. Comparison of time series forecasting methods using neuralnetworks and Box-Jenkins models. Mathematika 17, 25e32.

Shumway, R.H., Stoffer, D.S., 2006. Time Series Analysis and its Applications e WithR Examples. Springer ScienceþBusiness Media, LLC.

Simpson, R.W., Layton, A.P., 1983. Forecasting peak ozone levels. AtmosphericEnvironment 17, 1649e1654.

Slini, Th., Karatzas, K., Moussiopoulos, N., 2002. Statistical analysis of Environmentaldata as the basis of forecasting: an air quality application. The Science of theTotal Environment 288, 227e237.

Tang, Z., Almeida, C.De, Fishwick, P.A., 1991. Time series forecasting using neuralnetworks vs. Box-Jenkins methodology. Simulation 57, 303e310.

Tsai, C.-h., Chang, L.-c., Chiang, H.-c., 2009. Forecasting of ozone episode days by cost-sensitiveneuralnetworkmethods.Scienceof theTotalEnvironment407,2124e2135.

van Loon, M., Builtjes, P.J.H., Segers, A., 2000. Data assimilation of ozone in theatmospheric transport chemistry model LOTOS. Environmental ModelingSoftware 15, 603e609.