17
Outlier Treatment in HCSO Present and future

Outlier Treatment in HCSO

  • Upload
    azure

  • View
    66

  • Download
    0

Embed Size (px)

DESCRIPTION

Outlier Treatment in HCSO. Present and future. Outline. Outlier detection – types, editing, estimation Description of the current method Alternatives Future work Introduction of a new tool: R and Rstudio. Outlier detection and treatment. Purpose of outlier detection. Estimation. - PowerPoint PPT Presentation

Citation preview

Page 1: Outlier Treatment in HCSO

Outlier Treatment in HCSO

Present and future

Page 2: Outlier Treatment in HCSO

Outline

• Outlier detection – types, editing, estimation

• Description of the current method

• Alternatives

• Future work

• Introduction of a new tool: R and Rstudio

UNECE Statistical Data Editing 2014 2

Page 3: Outlier Treatment in HCSO

Outlier detection and treatmentPurpose of outlier detection

Identify errors

Estimation

Editing

• Representative outliers• Non Representative outliers

• Decreasing weights• Changing the values• Using robust estimations

Source: MEMOBUST

UNECE Statistical Data Editing 2014 3

Page 4: Outlier Treatment in HCSO

Monthly Survey of Manufacturing

• Take-all part• Survey part:

• less than 50 employees (and more than 5, because the smallest businesses are not in the scope of the survey).

• The sampling frame is based on the Register of Enterprises (~10 thousand units)

• The sampling ratio is about 15%• Stratified sample (a lot of NACE categories, categories

of the number of employees, and two territorial strata: the capital and everything else). (Telegdi 2004.)

UNECE Statistical Data Editing 2014 4

Page 5: Outlier Treatment in HCSO

Monthly Survey of Manufacturing: data

Distribution of some variables• Skewed distribution• Visible outliers

UNECE Statistical Data Editing 2014 5

Page 6: Outlier Treatment in HCSO

Current method of outlier detection

• The aim of the outlier treatment is improving the estimation. (Csereháti 2004.)

• Steps of the method:1) Computing the outlier indicators

2) Manual outlier detection by the methodologist/expert

3) Transfer of the result to the subject matter statistician

4) Discussion of the result by the subject matter statistician (possible modifications), resembles to the process of selective editing

UNECE Statistical Data Editing 2014 6

Page 7: Outlier Treatment in HCSO

Outlier indicators

• LNSQRT: main indicator

• Grubbs crit. value

• Standardized value of the variables

• SQUARED: identifying highest values

• MEANX is the ratio of the observed value of

the unit and the weighted mean of the

stratum without this unit value.

• VALOUT indicator shows the difference

between the estimation of the total with and

without the given value in a given stratum.

𝐿𝑁𝑆𝑄𝑅𝑇 𝑗𝑖=𝐿𝑛𝑌 𝑗𝑖 ∙√ 𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝑗𝑖

𝐺𝑐𝑟𝑖𝑡 , 𝑗

𝑆𝑄𝑈𝐴𝑅𝐸𝐷 𝑗𝑖=𝑌 𝑗𝑖∙ (𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝑗𝑖

𝐺𝑐𝑟𝑖𝑡 , 𝑗)2

𝑀𝐸𝐴𝑁𝑋 𝑗𝑖=𝑌 𝑗𝑖

𝑀𝐸𝐴𝑁 𝑗 ∙𝑁 𝑗−𝑌 𝑗𝑖(𝑁 𝑗−1 )

𝑉𝐴𝐿𝑂𝑈𝑇 𝑗𝑖=𝑁 𝑗−𝑛 𝑗

𝑛 𝑗−1 (𝑌 𝑗𝑖−𝑀𝐸𝐴𝑁 𝑗 ∙𝑁 𝑗

𝑛 𝑗)

𝑃𝑉𝐴𝐿𝑂𝑈𝑇 𝑗𝑖=𝑉𝐴𝐿𝑂𝑈𝑇 𝑗𝑖

𝑁 𝑗2 ∙𝑀𝐸𝐴𝑁 𝑗

∙𝑛 𝑗

UNECE Statistical Data Editing 2014 7

Page 8: Outlier Treatment in HCSO

The main indicator: LNSQRT

UNECE Statistical Data Editing 2014 8

Page 9: Outlier Treatment in HCSO

Outlier treatment

• Weight trimming: weights of the outliers are changed to 1• Number of outliers: avg. 2% of the cases• Change in the estimates:

• Mean: -15% (in avarage)• Variance: serious decrease

UNECE Statistical Data Editing 2014 9

Page 10: Outlier Treatment in HCSO

Alternative methods

• One dimensional methods• Median absolute deviation• Custom indicator: share in total• Quantile

Disadvantage: applying to many variables • Multidimensional method:

• Mahalanobis distance based outlier detection

UNECE Statistical Data Editing 2014 10

Page 11: Outlier Treatment in HCSO

Share in total, a custom indicator

• To consider the individual value and the size of the stratum in the same formula

• inspired by the current indicators• The possible outlier:

• shares a considerably great amount of the total• In a big stratum

• The indicator computed for each stratum

UNECE Statistical Data Editing 2014 11

Page 12: Outlier Treatment in HCSO

Results

• Quantile method• Threshold 99%• The method can identify almost the same

outliers as the current one.• Easy to implement

• MAD• Problem of the k (threshold)• Too many cases were selected

UNECE Statistical Data Editing 2014 12

Page 13: Outlier Treatment in HCSO

Results (2)

• Share in total• Threshold value: 0.5• Smaller number of outliers

• Mahalanobis distance• We used the robust Mahalanobis distance• 3 key variables (Total revenue etc.)

• These are not involved in the current method• avoiding missing values

• Similar results (2/3 of the current outliers are detected)

UNECE Statistical Data Editing 2014 13

Page 14: Outlier Treatment in HCSO

UNECE Statistical Data Editing 2014 14

Page 15: Outlier Treatment in HCSO

Future plans

• Development of methodology:– More analysis of the effect on estimates– Winsorization

• Development of the process– Automation and reproducibility– More informative report on the process, to help

better understand and analyse the process steps

UNECE Statistical Data Editing 2014 15

Page 16: Outlier Treatment in HCSO

Experimental tools

• Outlier treatment is separated from other steps of data

process, belongs to the methodology

• Possible new tool: R (with Rstudio)

• Advantage: ease of development

• Ready-to-use functions for outlier detection

• Disadvantage: need of „expert” user, not a usual tool

UNECE Statistical Data Editing 2014 16

Page 17: Outlier Treatment in HCSO

Thank you for your attention!

UNECE Statistical Data Editing 2014 17