View
49
Download
0
Category
Preview:
Citation preview
Detecting outliers at the end of the series using forecast intervals
Dario BuonoEurostat, Unit B.1: Methodology and corporate architecture
Fabrice GrasEurostat, Unit B.1: Methodology and corporate architecture
Enrico InfanteEurostat, Unit C.1: National Accounts methodology, Sector Accounts, Financial IndicatorsUniversità degli Studi di Napoli Federico II, dipartimento di scienze economiche e statistiche
Germana ScepiUniversità degli Studi di Napoli Federico II, dipartimento di scienze economiche e statistiche
DSSR 2016, Napoli, 17-19 February 2016
Eurostat
Content• Introduction
• Big Data?• Basic Idea
• Methodology• 3 Steps• Further Considerations
• Case Study• 1 – Parmigiano Reggiano• 2 – Compensation of Employees of Household Sector
• Research Findings
2
Eurostat
Introduction• Earlier version of this work was presented at the NTTS2013• It was aimed at identifying "market risk" for specific commodities
prices by using the forecast volatility degree• The presented methodology is then generalised to be re-used for
outlier identification and treatment when new observations occurs• This is a recurrent issue for the production of official statistics,
where a large amount of time series are to be validated
3
Eurostat
Introduction – Big Data?• Normally big data are defined according to their characteristics,
like volume, velocity, variety, timeliness, exhaustively, flexibility, etc. (Kitchin)
• The amount of data used in our analysis can be considered big from official statistics point of view in terms of volume and velocity (thousands of time series frequently updated, up to a daily basis)
• However, the methodology is designed for time series, so it can only be applied to data with a known structure
4
Eurostat
Introduction – Basic Idea
When the observed data differs considerably from the expected forecasted trend, then an outlier is identified
Information about which type of outlier could also be derived
5
Identification of the model
Estimating forecast intervals
Detecting the volatility degree
Step 1
Step 2
Step 3
Eurostat
Methodology: Step 1• The first Step is to model the price series Xt without the last r
observations, using a seasonal ARIMA(p,d,q)(P,D,Q):
• The model is dynamic in the sense that it is estimated every time new information is available
• For pre-treatment (already known outliers, calendar effects, etc.), the RegARIMA model is used
6
**11 tPs
QtDsd
ps
P BBXBBBB
rtt *
Eurostat
Methodology: Step 2• In the second Step, for each h=(1,…,r) observations not
considered in the model during Step 1, the SARIMA forecast intervals at 5% level are computed:
• The parameter r should be selected by the user. Our suggestion is to consider r=3 in the case of monthly series
• A dynamic selection of r could be used: starting with r=1, the algorithm goes to r=2 in case the last observation is an outlier, continuing till it stops finding outliers
7
heVARzhx tt *2*ˆ
Eurostat
Methodology: Step 3• In the third Step, the observed values at time t*+h are compared
with the forecast intervals computed during the second Step• If the observed value at time t*+h is not inside the forecast
interval at time t*+h, then the outlier is detected and should be analysed
8
Value
Observed Value Forecast interval
OutlierNOT detected
Outlier detected
Eurostat
Methodology: Further Considerations
When the observed value falls outside the forecast interval, it is classified as an outlier. The type of outliers is detected by looking at all the r intervals together
The table shown here describes how to detect an outlier in the case of r=3
9
t*+1 t*+3 t*+3 Outlier
Is the pric
e outsidethe interval
?
Y N N AON Y N AON N Y AOY Y N TCY N Y AO (2)N Y Y LSY Y Y LSN N N -
Eurostat
Case Study: Prices of Parmigiano/1• As a first case study, the price time series of the Italian Parmigiano
Reggiano is analysed. The time span is from January 2000 to June 2012 (150 observations)
10
Eurostat
Case Study: Prices of Parmigiano/2• The first step is to model the series without the last r=3
observations• The model estimated is a SARIMA(2,1,0)(0,0,0)
• The forecast intervals are computed on the r=3 forecasted values of the series with 147 observations, and then compared with the observed prices
11
Month Ob. Price MIN MAXApr-12 9.57 9.66 10.07May-12 9.23 9.37 10.30Jun-12 9.20 9.10 10.56
Eurostat
Case Study: Prices of Parmigiano/3• The basic idea is that when the observed data differs considerably
from the expected forecasted trend, then the commodity risk may be present
12
The observed price is outside the forecast interval in April and May 2012, but it is inside the interval in June 2012
A transitory change has been identified in April 2012
Eurostat
Case Study: D1R_S1M/1 • As a second case study, the compensation of employees received
by household sector is analysed. The time span is from the first quarter of 1999 to the third quarter of 2009 (43 observations)
13
Eurostat
Case Study: D1R_S1M/2
14
Quarter Ob. Value MIN MAX2009Q1 1076641 1099909 11224992009Q2 1133088 1158914 11904982009Q3 1090471 1108281 1145282
• The first step is to model the series without the last r=3 observations
• The model estimated is a SARIMA(0,1,1)(0,1,1)
• The forecast intervals are computed on the r=3 forecasted values of the series with 40 observations, and then compared with the observed prices
Eurostat
Case Study: D1R_S1M/3
15
• The basic idea is that when the observed data differs considerably from the expected forecasted trend, then the outlier is detected
The observed price is outside the forecast interval in all the three quarters considered
A level shift has been identified in 2009Q1......or before?
Eurostat
Case Study: D1R_S1M/4
16
Using a dynamic selection of r, we arrived to select r=5, identifying when the level shift starts
Eurostat
Research Findings• When assessing the quality of big sets of time series, it is vital to
have an automatic procedure which allows a detection of outliers within the end-series observations, as analysts usually tend to focus their attention on the most recent part of the time series
• National statistical offices, among other organisations, face this challenge on a daily basis
• This paper proposes a possible approach to identify the presence of outliers within the end-series observations using forecast intervals
• The model used is updated every time new information is available. We aim at finding the best estimation possible, as close as possible to the truth (Giovannini)
17
Eurostat
18
Thank you for your attention! Благодаря ви за вниманието! Tack för er uppmärksamhet!Děkuji vám za pozornost! Tak for jeres opmærksomhed!Dank u voor uw aandacht! Tänan tähelepanu eest!Kiitos huomiota! Merci pour votre attention!Vielen Dank für Ihre Aufmerksamkeit! Σας ευχαριστώ για τηνπροσοχή σας!Köszönöm a figyelmet! Go raibh maith agat as do aird!Grazie per l'attenzione! Paldies par jūsu uzmanību!Ačiū už Jūsų dėmesį! Grazzi għall-attenzjoni tiegħek! Takk foroppmerksomheten!Dziękuję za uwagę! Obrigado pela vossa atenção!Vă mulţumesc pentru atenţie! Ďakujem vám za pozornosť!Hvala za vašo pozornost! Gracias por su atención!
Recommended