Data Quality: Issues and Fixes

Preview:

DESCRIPTION

 

Citation preview

ILCS Raking

Motivate Need and Illustrate Basic Approach

Dr. Ali Mushtaq

July 3, 2009(for academic purposes only)

RCRC

What is Raking?• A way to Adjust Survey totals “t” to

Independent Controls “T”• Takes existing Survey Weights,

usually wij = 1/pij, where pij is probability of selection

• Ratios them up to each total T in turn, until results are as close as wanted

What is the Value?• Can increase stability of survey

resultsReduce Sample Variance

• Get results that are close to desired outcomes

Reduce bias arising from minor operational errors

What Results to Expect?

• If Controls are Reasonable, Raking Process will converge

(“Hit” all controls)

• And improve survey results related to Control Totals

More Information Quality

• Only Weights are Changed by Raking, not Survey Data

• Data Quality is thus unchanged

• But Information Quality is usually Improved

What Does Raking Cost?

• Usually Done quickly on a PC• Independent Controls Need to be

consistent with each other• Sample must be reasonably large

for Raking to be Safely Applied• Some Costs incurred to explain

Method

Raking Made Simple

• “Fudge” Factor Intuition

• Develop a ratio of target total divided by sample total

• Repeat this process with each of the controls in turn

NSS Example from ILCS

While the NSS RA survey is raked across 4 dimensions (age, gender, marz and urban/rural), the example we’ll use here will just use two dimensions.

Table 1. Raking Example – Source Survey Data

Table 2: Desired Marginals

First Ratio Adjustment

Second Ratio Adjustment

After Second Iteration

ISLS Benefits Achieved

• Reduction in Bias

• Reduction (hopefully) in Variance

• Survey Results are Consistent with Census Projections

Again Many Thanks

Data Quality and Record Linkage Techniques

Springer 2007

Recommended