Upload
rochelle-meunier
View
27
Download
0
Embed Size (px)
DESCRIPTION
Quantifying Data Advanced Social Research (soci5013). Peter Njuguna Source: Course Pack Chapter 14 (Page 383 – 395). Overview. Data Analysis Statistical Quantitative Mostly computer aided nowadays Process Mass observations Quantification (through coding) - PowerPoint PPT Presentation
Citation preview
Quantifying Data
Advanced Social Research (soci5013)
Peter Njuguna
Source: Course Pack Chapter 14 (Page 383 – 395)
Overview
Data Analysis Statistical Quantitative
Mostly computer aided nowadays
Process1. Mass observations2. Quantification (through coding)3. Coding error reduction (Data cleaning)4. Data Analysis
Introduction
Social Science data (largely non-numeric) Machine Readability, Manipulation Logic of data manipulation in quantitative
analysis Biological & Physical science data (mostly
numeric attributes, eg counts, pH, length, temp.,..)
Baseline: The logic remains same even with development of more powerful technology
Computers are tools to enhance research. They understand only the basics
Computers in Social Research
France (1801) Joseph marie-Jacquard (automatic loom, punched cards, weaving patterns)
USA (1790) 10-year census – under 4 mil. People 1880 Over 62 million. (9 years to tabulate!) 1890 Herman Hollerith: Punched card system
(Results reported in 6 weeks) Tabulating Machine Co. + mergers = IBM
Baseline: Information coding, storage, Retrieval. Today’s computer data analysis: Converting
observations into machine readable form, electronic data storage, retrieval, manipulation and presentation
Statistical Analysis (Some programs specific for social Science eg SPSS)
Coding for Quantitative Analysis
Social science methods (interviews, questionnaires, .) Open-ended & closed-ended questions : Non-numeric
responses Coding reduces responses to limited set of attributes to
enable analysis use pre-established coding: Comparable with others coding from the data set (responses): Flexibility
response coverage Coding system should be appropriate to theoretical
concepts If data coded to maintain detail, can be combined
where detail not necessary, but not vice versa
Developing code categories
1. Well developed coding scheme Derived from research purpose Existing coding scheme (comparable)
2. Generate codes from your data Many possible schemes (cf. pg 388, 389), specific to
your research purpose Review for recoding as you progress
Code categories should be;a) Exhaustive
b) Mutually exclusive Coder reliability (including yourself) crucial
Codebook construction
Codebook (describes location of variables; assignment of codes to attributes) Primary guide in coding process Guide for locating variables & interpreting codes in
data file during analysis Contains
Variable names, Full descriptions (cf. exact wording of questions) Categorized response options
Coding and data entry options (1)
1. Transfer sheets
Useful technique especially with complex questionnaires and other data sources
Source Course pack pg 391
Case # 01 (Variable1 eg Gender)
02 (Variable 2 eg educ)
03 (Variable3 eg religiosity)
Case 1(eg Peter)
Attribute 1 (Male)
Case 2 etcCase 3
Etc..
Coding and data entry options (2)
2. Edge-coding3. Direct data entry (pre-coded questionnaires)4. Data entry by interviewers
e.g. CATIs Closed-ended data ready for analysis Open-ended responses - additional coding step before
analysis5. Coding to optical scan sheets
Coder error high Low scanner tolerance
6. Direct coding on op sheets by respondent7. Connecting with data analysis program
eg SPSS – blank data sheets – entry – analysis Create data set (spreadsheet, etc) – import & export Compatibility options well developed
Screening and elimination of errors (Data cleaning)
Errors almost inevitable Incorrect coding Incorrect reading of codes Sensing of marks, etc
Two types of data cleaning methods1. Possible code cleaning
By checking for errors as data is entered (“beep!”) Testing for illegitimate codes in stored data files
2. Contingency cleaning That only cases relevant to attribute have such entries (cf. No of pregnancies
in men) inappropriate. Can be ignored sometimes (significance, discretion)
Remember that “dirty” data almost always produces misleading results ….
AT LONG LAST, ….
YOUR DATA IS READY FOR ANALYSIS …
GO!