45
Research Methods, 9th Edition Theresa L. White and Donald H. McBurney Chapter 14 Data Exploration Part 1: Graphic and Descriptive Techniques

Ch14 data exploration (i)

Embed Size (px)

Citation preview

Page 1: Ch14 data exploration (i)

Research Methods, 9th Edition

Theresa L. White and Donald H. McBurney

Chapter 14Data Exploration Part 1: Graphic

and Descriptive Techniques

Page 2: Ch14 data exploration (i)

Preparing Data for Analysis

Once the data is collected: Put the data into a summary data sheet. Do preliminary statistics and plots. Check for invalid data Check for missing data Check for wild data Describe data numerically. Describe data graphically. Perform inferential statistics.

Page 3: Ch14 data exploration (i)

Data Reduction

Process of transcribing data from individual data sheets to a summary form or data file.

Page 4: Ch14 data exploration (i)

Data Reduction 謄錄到資料表

Contain all the data in a matrix format Rows indicated subjects Columns indicate variables

上頁案例: 某一教授開設ES課程,有兩個班級,分別是早上8點、11點上課,他每門課考了2次考試,每次考試總分為20分。他要分析這兩個班級考試成績差異

Page 5: Ch14 data exploration (i)

Coding Guide

List that specifies the variables of the study, columns that the variables occupy in the data file, and their possible values.

Located either on summary form, in notebook, or both!

Page 6: Ch14 data exploration (i)

建立資料檔—法 1 直接在 SPSS建立 案例

雲林科技大學 SPSS 研習營

Page 7: Ch14 data exploration (i)
Page 8: Ch14 data exploration (i)

資料編碼 將答案轉換成數字或其他符號,以利分析 常將受訪者回應可分成有限的類別 ( 分類 ) 分類是指根據某項變數,將一組資料分成幾個

部分,亦即運用規則區分資料的程序 分類可能犧牲某些資料細節,卻也能提升資料

分析效率 封閉式、開放式問卷皆須編碼

Page 9: Ch14 data exploration (i)

編碼表 (codebook) 設計 編碼表也稱為編碼架構 (coding scheme) 包含每個研究變數,並描述如何應用編碼規則於變數

上。 研究者利用編碼表,使資料輸入更精確、更有效率 編碼表也可包含各變數資料在資料檔中位置 無論是否電腦化,編碼表應包含問卷題號、變數名稱

、變數在輸入媒體中( SPSS 資料檔、 EXCEL )所佔欄位、各選項描述、各問題敘述、資料資料類型(數字或字串)

預試可檢測編碼表是否正確

Page 10: Ch14 data exploration (i)

編碼表範例

雲林科技大學 SPSS 研習營

Data source: http://metaconnects.org/survey-analysis

Page 11: Ch14 data exploration (i)

編碼表範例編碼表範例

Page 12: Ch14 data exploration (i)
Page 13: Ch14 data exploration (i)

雲林科技大學 SPSS 研習營

Page 14: Ch14 data exploration (i)

統計分類

Descriptive statistics Summarize a set of data

Inferential statistics Help us to draw conclusions about populations

Page 15: Ch14 data exploration (i)

Descriptive statistics

常用的敘述統計量 : Average (measures of central tendency) Variability (measure of variability)

Page 16: Ch14 data exploration (i)

Measures of Central Tendency

Descriptive statistic that is the average of the distribution.

Mode = Most common score

Median = Middlemost score

Mean = Sum of all the scores divided by the number of scores.

Page 17: Ch14 data exploration (i)

Measures of Central Tendency

中位數 不受到其他值與中位數之差距,只在乎高於、

低於中位數之個數 平均數

對於極端質敏感

Page 18: Ch14 data exploration (i)

Measures of Variability

Range Highest score – Lowest score

Percentile Score below which a certain number of cases in a

distribution fall

Interquartile Range 75th percentile – 25th percentile Q3 – Q1

Semi-interquartile range (Q3 – Q1)/2

Page 19: Ch14 data exploration (i)

Measures of Variability

Page 20: Ch14 data exploration (i)

Measures of Variability

值域 高度不穩定 由兩個極端值決定

標準差 也容易受到 outlier 影響

變異數、標準差…最常用

Page 21: Ch14 data exploration (i)

Most Common Measures of Variability

Variance Average of the squared deviations from the

mean.

Standard Deviation Square root of the variance.

Page 22: Ch14 data exploration (i)

Tables

Table: a display of data in a matrix format

Page 23: Ch14 data exploration (i)

Graphs

Graph: a representation of data by spatial relationships in a diagram

Page 24: Ch14 data exploration (i)

Table, Graph

Help us summarize data and understand the relationships between variables.

A picture is worth a thousand words 表 :

水平軸— X 軸,常呈現自變數值 垂直軸— Y 軸,常呈現依變數值

Page 25: Ch14 data exploration (i)

Frequency Table

The professor wants to see how many people earned each test score.

Page 26: Ch14 data exploration (i)

Frequency Distribution

Graph that shows how many scores fall into particular bins, or divisions of the variable

Histogram

Page 27: Ch14 data exploration (i)

Frequency Distribution

Frequency Polygon

A frequency distribution in which the frequencies are connected by straight lines

Page 28: Ch14 data exploration (i)

The Shape of Distributions

Normal Negative Skew

Positive Skew

前一頁是哪一型 ?

Page 29: Ch14 data exploration (i)

The Shape of Distributions

前一頁圖 A

mode = median=mean B (left skew)

mode __ median__mean

C (right skew) mode __ median__mean

Page 30: Ch14 data exploration (i)

Cumulative Frequency Distribution

a = Normal Curveb = Positively Skewed c = Negatively Skewed

A frequency distribution that shows the number if scores that fall at or below a certain score

Page 31: Ch14 data exploration (i)

Scattergram

A graph showing the responses of a number of individuals on two variables; visual display of correlational data

Page 32: Ch14 data exploration (i)

Scattergram

Often used with correlation coefficient A correlation is a statistic indicating the

strength of the relationship between two variables

Prediction of one of the variables can be achieved with regression

Page 33: Ch14 data exploration (i)

Correlation Coefficient

Measures strength of association between variables. Does NOT indicate causation Most commonly used is Pearson’s r Value is between – 1.0 and +1.0

Scattergram of Paired Values of x and y; (a) r = +1.00, (b) r=−1.00, (c) r = 0.50, (d) r = 0, and (e) r = 0

Page 34: Ch14 data exploration (i)

Correlation Coefficient Correlation

Less than 0.2: weak 0.2-0.4: moderately weak 0.4-0.6: moderate 0.6-0.8: moderately strong 0.8-1: strong

Pearson correlation coefficient is a measure of a linear (straight-line) function. It can not reflect the curvilinear relationship

Page 35: Ch14 data exploration (i)

Regression

Predicting the value of one variable from another from the equation for a line

Slope of the line (m) reflects Correlation Scale of measurement for the two variables

Squaring the correlation (determination coefficient) yields a goodness of fit measure

Page 36: Ch14 data exploration (i)

Regression

determination coefficient The proportion of the variability in y that is

accounted by x

Page 37: Ch14 data exploration (i)

Line Graph

A graphical representation using lines to show relationships between quantitative variables

Y-axis is the dependent variable X-axis is the independent variable

Page 38: Ch14 data exploration (i)

Bar Graph

Used to represent categorical data

Page 39: Ch14 data exploration (i)

Frequency Data and Graphs

Test Score as a Function of Class Membership

Frequency Distribution of Test Scores by Class Membership

Page 40: Ch14 data exploration (i)

Time Series Graph

X-axis represents the passage of time

Time-Series Graph Cumulative Record

Page 41: Ch14 data exploration (i)

Indicating Variability

Error Bars Vertical lines above and below each point or

bar on a graph that show +/- one standard deviation from the mean.

Page 42: Ch14 data exploration (i)

Box and Whisker Plot

Graph based on median and percentiles rather than mean and standard deviation.

Page 43: Ch14 data exploration (i)

Box and Whisker Plot

Data source: http://www.bbc.co.uk/schools/gcsebitesize/maths/statistics/representingdata3hirev6.shtml

Page 44: Ch14 data exploration (i)

Checking for Problem Data

Invalid Data Outside the range of possible values Find and correct

Missing Data Empty cells If necessary, replace with code

Outliers Possible, but improbable answers Check to see if they are different enough to

remove

Page 45: Ch14 data exploration (i)

Style Guide for Figures

Be clear Use black ink Label both axes Label units of measurement Provide a caption for the figure Beware of chartjunk (parts that aren’t

necessary to understand the chart)