Research Methods, 9th Edition
Theresa L. White and Donald H. McBurney
Chapter 14Data Exploration Part 1: Graphic
and Descriptive Techniques
Preparing Data for Analysis
Once the data is collected: Put the data into a summary data sheet. Do preliminary statistics and plots. Check for invalid data Check for missing data Check for wild data Describe data numerically. Describe data graphically. Perform inferential statistics.
Data Reduction
Process of transcribing data from individual data sheets to a summary form or data file.
Data Reduction 謄錄到資料表
Contain all the data in a matrix format Rows indicated subjects Columns indicate variables
上頁案例: 某一教授開設ES課程,有兩個班級,分別是早上8點、11點上課,他每門課考了2次考試,每次考試總分為20分。他要分析這兩個班級考試成績差異
。
Coding Guide
List that specifies the variables of the study, columns that the variables occupy in the data file, and their possible values.
Located either on summary form, in notebook, or both!
建立資料檔—法 1 直接在 SPSS建立 案例
雲林科技大學 SPSS 研習營
資料編碼 將答案轉換成數字或其他符號,以利分析 常將受訪者回應可分成有限的類別 ( 分類 ) 分類是指根據某項變數,將一組資料分成幾個
部分,亦即運用規則區分資料的程序 分類可能犧牲某些資料細節,卻也能提升資料
分析效率 封閉式、開放式問卷皆須編碼
編碼表 (codebook) 設計 編碼表也稱為編碼架構 (coding scheme) 包含每個研究變數,並描述如何應用編碼規則於變數
上。 研究者利用編碼表,使資料輸入更精確、更有效率 編碼表也可包含各變數資料在資料檔中位置 無論是否電腦化,編碼表應包含問卷題號、變數名稱
、變數在輸入媒體中( SPSS 資料檔、 EXCEL )所佔欄位、各選項描述、各問題敘述、資料資料類型(數字或字串)
預試可檢測編碼表是否正確
編碼表範例
雲林科技大學 SPSS 研習營
Data source: http://metaconnects.org/survey-analysis
編碼表範例編碼表範例
雲林科技大學 SPSS 研習營
統計分類
Descriptive statistics Summarize a set of data
Inferential statistics Help us to draw conclusions about populations
Descriptive statistics
常用的敘述統計量 : Average (measures of central tendency) Variability (measure of variability)
Measures of Central Tendency
Descriptive statistic that is the average of the distribution.
Mode = Most common score
Median = Middlemost score
Mean = Sum of all the scores divided by the number of scores.
Measures of Central Tendency
中位數 不受到其他值與中位數之差距,只在乎高於、
低於中位數之個數 平均數
對於極端質敏感
Measures of Variability
Range Highest score – Lowest score
Percentile Score below which a certain number of cases in a
distribution fall
Interquartile Range 75th percentile – 25th percentile Q3 – Q1
Semi-interquartile range (Q3 – Q1)/2
Measures of Variability
Measures of Variability
值域 高度不穩定 由兩個極端值決定
標準差 也容易受到 outlier 影響
變異數、標準差…最常用
Most Common Measures of Variability
Variance Average of the squared deviations from the
mean.
Standard Deviation Square root of the variance.
Tables
Table: a display of data in a matrix format
Graphs
Graph: a representation of data by spatial relationships in a diagram
Table, Graph
Help us summarize data and understand the relationships between variables.
A picture is worth a thousand words 表 :
水平軸— X 軸,常呈現自變數值 垂直軸— Y 軸,常呈現依變數值
Frequency Table
The professor wants to see how many people earned each test score.
Frequency Distribution
Graph that shows how many scores fall into particular bins, or divisions of the variable
Histogram
Frequency Distribution
Frequency Polygon
A frequency distribution in which the frequencies are connected by straight lines
The Shape of Distributions
Normal Negative Skew
Positive Skew
前一頁是哪一型 ?
The Shape of Distributions
前一頁圖 A
mode = median=mean B (left skew)
mode __ median__mean
C (right skew) mode __ median__mean
Cumulative Frequency Distribution
a = Normal Curveb = Positively Skewed c = Negatively Skewed
A frequency distribution that shows the number if scores that fall at or below a certain score
Scattergram
A graph showing the responses of a number of individuals on two variables; visual display of correlational data
Scattergram
Often used with correlation coefficient A correlation is a statistic indicating the
strength of the relationship between two variables
Prediction of one of the variables can be achieved with regression
Correlation Coefficient
Measures strength of association between variables. Does NOT indicate causation Most commonly used is Pearson’s r Value is between – 1.0 and +1.0
Scattergram of Paired Values of x and y; (a) r = +1.00, (b) r=−1.00, (c) r = 0.50, (d) r = 0, and (e) r = 0
Correlation Coefficient Correlation
Less than 0.2: weak 0.2-0.4: moderately weak 0.4-0.6: moderate 0.6-0.8: moderately strong 0.8-1: strong
Pearson correlation coefficient is a measure of a linear (straight-line) function. It can not reflect the curvilinear relationship
Regression
Predicting the value of one variable from another from the equation for a line
Slope of the line (m) reflects Correlation Scale of measurement for the two variables
Squaring the correlation (determination coefficient) yields a goodness of fit measure
Regression
determination coefficient The proportion of the variability in y that is
accounted by x
Line Graph
A graphical representation using lines to show relationships between quantitative variables
Y-axis is the dependent variable X-axis is the independent variable
Bar Graph
Used to represent categorical data
Frequency Data and Graphs
Test Score as a Function of Class Membership
Frequency Distribution of Test Scores by Class Membership
Time Series Graph
X-axis represents the passage of time
Time-Series Graph Cumulative Record
Indicating Variability
Error Bars Vertical lines above and below each point or
bar on a graph that show +/- one standard deviation from the mean.
Box and Whisker Plot
Graph based on median and percentiles rather than mean and standard deviation.
Box and Whisker Plot
Data source: http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/representingdata3hirev6.shtml
Checking for Problem Data
Invalid Data Outside the range of possible values Find and correct
Missing Data Empty cells If necessary, replace with code
Outliers Possible, but improbable answers Check to see if they are different enough to
remove
Style Guide for Figures
Be clear Use black ink Label both axes Label units of measurement Provide a caption for the figure Beware of chartjunk (parts that aren’t
necessary to understand the chart)