Upload
ernest-brown
View
221
Download
0
Embed Size (px)
Citation preview
Misinterpretation of data,Misinterpretation of data, the importance of metadata the importance of metadata
and STC mathand STC math
DLI Atlantic TrainingDLI Atlantic Training
April 2005April 2005Vicki Crompton and Mike SivyerVicki Crompton and Mike Sivyer
Data Misinterpretation: Data Misinterpretation: Crime RatesCrime Rates
Ebert & Roeper review of Ebert & Roeper review of Michael Wilson movie “Michael Michael Wilson movie “Michael Moore hates America” Ebert Moore hates America” Ebert doubted claim that Canadian doubted claim that Canadian crime rate 2X the USA ratecrime rate 2X the USA rateMoorelies.com | News: Whoa; Moorelies.com | News: Whoa; Stuart Didn't See That One CoStuart Didn't See That One ComingmingEbert conceded that the Ebert conceded that the statistics supported claim - statistics supported claim - figures were rightfigures were rightBUT - comparison of STC and BUT - comparison of STC and US Bureau of Justice website US Bureau of Justice website shows how statistics shows how statistics misinterpretedmisinterpreted
Crimes per 100,000 population - 2003Crimes per 100,000 population - 2003
CanadaCanada USAUSA
All CrimesAll Crimes 8,5308,530 4,2674,267
Violent crimesViolent crimes 958958 523523
Property Property crimescrimes 4,2754,275 3,7443,744
Comparative Crime RatesComparative Crime Rates
Simplistic comparisonSimplistic comparison– Similar category titles on Similar category titles on
violent and property crimes violent and property crimes but different definitionsbut different definitions
– Violent crime 2-3 times Violent crime 2-3 times higher in US, property higher in US, property crimes closecrimes close
– Bureau of Justice StatisticsBureau of Justice Statistics Crime & Justice Data Onli Crime & Justice Data Onlinene
– Canadian Statistics - CrimeCanadian Statistics - Crimes by type of offences by type of offence
Crimes per 100,000 population - 2002Crimes per 100,000 population - 2002
CanadaCanada USAUSA
Violent crimeViolent crime
homicidehomicide 1.91.9 5.65.6
robberyrobbery 8585 146146
comparison of US (rape and aggravated comparison of US (rape and aggravated assault) difficult with Cdn sexual assault assault) difficult with Cdn sexual assault and assaults) and assaults)
Property CrimeProperty Crime
B & E (Cdn) – Burglary B & E (Cdn) – Burglary (US)(US) 879879 746746
Theft (Cdn) - Larceny Theft (Cdn) - Larceny & Theft (US)& Theft (US) 2,1912,191 2,4462,446
Motor Vehicle theftMotor Vehicle theft 516516 432432
US Crime DataUS Crime Data
Canadian Crime DataCanadian Crime Data
Data Misinterpretation:Data Misinterpretation:Drinking Habits of CanadiansDrinking Habits of Canadians
Initial analysis of the 1990 Health Initial analysis of the 1990 Health Promotion Survey, indicated Canadians Promotion Survey, indicated Canadians enjoyed an average 60 drinks per day….enjoyed an average 60 drinks per day….
Data Misinterpretation:Data Misinterpretation:Importance of MetadataImportance of Metadata
1990 Health Promotion Survey there were a series of questions about alcohol consumption. 1990 Health Promotion Survey there were a series of questions about alcohol consumption.
First they asked if the respondent EVER drank alcohol, First they asked if the respondent EVER drank alcohol, and if YES asked if they drank within the last 12 monthsand if YES asked if they drank within the last 12 monthsand if YES asked for number of drinks for each day for the past 7 days. and if YES asked for number of drinks for each day for the past 7 days. The code book showed number of drinks per day as:The code book showed number of drinks per day as:
81 F4MON 2 0096‑0097 HOW MANY DRINKS DID YOU HAVE ON: MONDAY81 F4MON 2 0096‑0097 HOW MANY DRINKS DID YOU HAVE ON: MONDAY 00 00 NONE NONE 4651 4651 7334907 7334907 01:40 01:40 NUMBER OF DRINKS NUMBER OF DRINKS 403 403 25850802585080 41 41 MORE THAN 40 DRINKS MORE THAN 40 DRINKS 1 1 106106 98 QUESTION NOT ASKED 98 QUESTION NOT ASKED 7648 7648 05679100567910 99 NOT STATED 99 NOT STATED 89 89 155377155377 82 F4TUE 2 0098‑0099 HOW MANY DRINKS DID YOU HAVE ON: TUESDAY 82 F4TUE 2 0098‑0099 HOW MANY DRINKS DID YOU HAVE ON: TUESDAY
00 NONE 00 NONE 4608 4608 7306101 7306101 01:40 NUMBER OF DRINKS 01:40 NUMBER OF DRINKS 1447 1447 26139912613991
98 QUESTION NOT ASKED 98 QUESTION NOT ASKED 76487648 1056791010567910 99 NOT STATED 99 NOT STATED 89 89 155377155377
((Raw Raw WeightedWeighted))
Metadata for PUMFSMetadata for PUMFS
With Public Use Microdata Files, the code With Public Use Microdata Files, the code book is very importantbook is very important– Gives questions asked and codes used for Gives questions asked and codes used for
responsesresponses– ““Missing values”, “refusals”, “don’t know” and Missing values”, “refusals”, “don’t know” and
“not applicable” numeric codes are often “not applicable” numeric codes are often assignedassigned
– Not consistent in the numeric codes usedNot consistent in the numeric codes used– Numeric codes that to most software would Numeric codes that to most software would
seem to be valid responseseem to be valid response
MetadataMetadata
STC Policy on Informing Users of Data QualitySTC Policy on Informing Users of Data Quality
In place since 1978In place since 1978
Tightened up 2000 in response to 1999 AG reportTightened up 2000 in response to 1999 AG report
Recognition that “All statistics are to some extent Recognition that “All statistics are to some extent estimates”estimates”
Statistics to be used with awareness of strengths and Statistics to be used with awareness of strengths and weaknesses – “fitness for use”weaknesses – “fitness for use”
Key tool is the Integrated Meta DatabaseKey tool is the Integrated Meta Database
(Definitions, data sources and methods)(Definitions, data sources and methods)
MetadataMetadata
Important to find STC metadata and use itImportant to find STC metadata and use it
Definitions, Data Sources and Methods Definitions, Data Sources and Methods – Questionnaire and reporting guides Questionnaire and reporting guides
Survey DescriptionSurvey Description
Data sources and methodologyData sources and methodology
Data AccuracyData Accuracy
DocumentationDocumentation
Contact usContact us
Definitions, Data Sources and MethodsDefinitions, Data Sources and Methods
Online CatalogueOnline Catalogue Canadian Community Health Survey: public use microdata Canadian Community Health Survey: public use microdata file: Product main pagefile: Product main page
DLI WebsiteDLI WebsiteDLI - Canadian Community Health Survey Cycle 1.1DLI - Canadian Community Health Survey Cycle 1.1
DLI listserv: Ask and we will find out from the Division!DLI listserv: Ask and we will find out from the Division!
Data Quality SymbolsData Quality Symbols
. not available for any reference period
.. not available for a specific reference period
... not applicable
p preliminary
r revised
x suppressed to meet the confidentiality requirements of the Statistics Act
A, B, C, D
specific levels of data quality*
E use with caution
F too unreliable to be published
0 true zero or a value rounded to zero
0s value rounded to 0 (zero) where there is a meaningful distinction between true zero and the value that was rounded
Use metadata to avoid key pitfallsUse metadata to avoid key pitfalls
Collection methodologyCollection methodologyQuestionnaireQuestionnaireData quality: sample size, response ratesData quality: sample size, response ratesDefinitionsDefinitionsConceptual changes Conceptual changes Survey coverageSurvey coverageReweighting/rebasingReweighting/rebasing
STC MathSTC Math
Random roundingRandom rounding
Percentages and percentage pointsPercentages and percentage points
Central tendencies Central tendencies (mean, median and mode)(mean, median and mode)
Current vs constant dollarsCurrent vs constant dollars
Raw vs seasonally adjustedRaw vs seasonally adjusted