69
Descriptive Methods 707.031: Evaluation Methodology Winter 2015/16 Eduardo Veas

Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

  • Upload
    ngohanh

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Descriptive Methods

707.031: Evaluation Methodology Winter 2015/16

Eduardo Veas

Page 2: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

what we do with the data depends on the scales

2

Page 3: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Measurement Scales

3

Page 4: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

The complexity of measurements

• Nominal

• Ordinal

• Interval

• Ratio

4

Sophisticated

Crude

Page 5: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Nominal data

• arbitrarily assigning a code to a category or attribute: postal codes, job classifications, military ranks, gender

• mathematical manipulations are meaningless

• mutually exclusive categories

• each category is a level

• use: freq, counts, 5

Page 6: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Ordinal data

• ranking of an attribute

• interval between points in scale not intrinsically equal

• comparisons < or > are possible

6

Page 7: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Interval data

• equal distances between adjacent values, but no absolute zero

• temperature in C or F

• mean can be computed

• Likert scale data ?7

Page 8: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Ratio

• absolute zero

• can be operated mathematically

• time to complete, distance or velocity of cursor,

• count, normalized count (count per something)

8

Page 9: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Frequencies

9

Title Text

Page 10: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Frequency tables

• tab.courses<-as.data.frame(freq(ordered(courses)), plot=FALSE)

• CumFreq= cumsum(tab.courses[-dim(tab.courses)[1],]$Frequency)

• tab.courses$CumFreq=c(CumFreq,NA)• tab.courses

10

Page 11: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Interpreting frequency tables

Frequency Percent CumPercent CumFreq1 2 20 20 22 3 30 50 53 4 40 90 94 1 10 100 10Total 10 100 NA NA

11

Page 12: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Contingency Tables

12

Right-handed Left-handed Total

Males 43 9 52

Females 44 4 48

Totals 87 13 100

sd

Page 13: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Modelling

13

Page 14: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Statistical models

• A model has to accurately represent the real world phenomenon.

• A model can be used to predict things about the real world.

• The degree to which a statistical model represents the data collected is called fit of the model

14

Page 15: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Frequency distributions

• plot observations on the x-axis and a bar showing the count per observation

• ideally observations fall symmetrically around the center

• skew and kurtosis describe abnormalities in the distributions

15

Page 16: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Histogram / Frequency distributions

16

Page 17: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Center of a distribution

• Mode: score that occurs most frequently in the dataset• it may take several values• it may change dramatically with a single added score

• Median: is the middle score (after ranking all scores)• for even nr of scores, add centric values and divide by

2 • good for ordinal, interval and ratios

• Mean: average score• can be influenced by extreme scores17

Page 18: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Dispersion of a distribution

• range: difference between lowest and highest score

• interquartile difference: mode + upper and lower quartiles

18

252 - 22 = 232 121 - 22 = 99

Page 19: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Fit of the mean

• deviance: mean - x

• sum of squared errors (SS)

• variance = SS / N-1

• stddev = sqrt(variance)19

Page 20: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Assumptions

20

Page 21: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Assumptions of parametric data

• normally distributed: sample or error in the model

• homogeneity of variance: • correlational: variance of one variable should be stable at all

levels of the other variable• groups: each sample comes from a population with same

variance

• interval data: at least interval data

• independence: the behaviour of one participant does not influence that of another21

Page 22: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Distributions for DLF

22

0.0

0.2

0.4

0.6

0 1 2 3 4Hygiene score on day1

Density

0.00

0.25

0.50

0.75

0 1 2 3Hygiene score on day 2

Density

0.0

0.3

0.6

0.9

1.2

0 1 2 3Hygiene score on day 3

Density

0

1

2

3

-2 0 2theoretical

sample

0

1

2

3

-3 -2 -1 0 1 2 3theoretical

sample

0

1

2

3

-2 -1 0 1 2theoretical

sample

Page 23: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Quantify normallity

23

Page 24: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Different groups

24

Page 25: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Exam histogram

25

0.000

0.005

0.010

0.015

0.020

0.025

25 50 75 100exam

density

Page 26: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Exam histogram

26

0.000

0.005

0.010

0.015

0.020

0.025

25 50 75 100exam

density

0.00

0.01

0.02

0.03

0.04

10 20 30 40 50 60 70exam

density

0.00

0.02

0.04

0.06

60 70 80 90 100exam

density

Page 27: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Shapiro-Wilk test

• # Shapiro-Wilk• shapiro.test(rexam$exam)

• • #if we are comparing groups, what is important

is the normallity within each group• by(rexam$exam, rexam$uni, shapiro.test)

27

Page 28: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Reporting Shapiro-Wilk

• A Shapiro-Wilk test on the R exam, W=0.96, proved a significant deviation from normality (p<0.05).

28

Page 29: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Homogeneity of variance

• Levene’s test:• leventTest(rexam$exam, rexam$uni,

center=mean)

• Reporting: for the percentage on the R exam, the variances were similar for KFU and TUG students, F(1,98)=2.09

29

Page 30: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Homogeneity of variance

• Levene in large datasets may give sig for small variations

• Double check Variance ratio (Hartley’s Fmax)

30

Page 31: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Correlations

31

Title Text

Page 32: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Everything is hard to begin with, but the more you practise the easier it gets

32

Page 33: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Relationships

• Everything is hard to begin with, but the more you practise the easier it gets

• increase in practice, increase in skill

• increase in practice, but skill remains unchanged

• increase in practice, decrease in skill33

Page 34: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Correlations

• Bivariate: correlation between two variables

• Partial: correlation between two variables while controlling the effect of one or more additional variables

34

Page 35: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Covariance

• are changes in one variable met with similar changes in the other variable

• cross product deviations= multiply deviations of the two variables

• covariance= CPD / (N-1)

35

Page 36: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Covariance II

• Positive: both variables vary in the same direction

• Negative: variables vary in opposite directions

• Covariance is scale dependent and cannot be generalized

36

Page 37: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Pearson correlation coefficient

• cov/sxsy

• Data must be at least interval

• Value between -1 and 1

• 1 -> variables positively correlated• 0 -> no linear relationship• -1 -> variables negatively correlated

37

Page 38: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Dataset Exams and Anxiety

• effects of exam stress and revision on exam performance

• questionnaire to assess anxiety relating to exams (EAQ)

38

Page 39: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Enter data

• examData<-read.delim("ExamAnxiety.dat", header=TRUE)

• examData2<-examData[,c(“Exam”,"Anxiety","Revise")]

• cor(examData2)

39

Page 40: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Pearson correlation

• Exam Anxiety Revise• Exam 1.0000000 -0.4409934 0.3967207• Anxiety -0.4409934 1.0000000 -0.7092493• Revise 0.3967207 -0.7092493 1.0000000

40

Page 41: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Confidence values

• rcorr(as.matrix(examData[,c(“Exam","Anxiety","Revise")]))

• Exam Anxiety Revise• Exam 0 0 • Anxiety 0 0 • Revise 0 0

41

Page 42: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Reporting Pearson’s CC

A Pearson correlation coefficient indicated a significant correlation between anxiety performance and time spent revising, r=-.44, p<0.01

42

Page 43: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Spearman’s correlation coefficient

• non parametric test

• first rank the data and then apply Pearson cc

43

Page 44: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Liar Dataset

• contest for storytelling the biggest lie

• 68 participants, ranking, and creativity questionnaire

44

Page 45: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Spearman test

• liarData=read.delim("biggestLiar.dat", header=TRUE)

• rcorr(as.matrix(liarData[,c(“Position","Creativity")]))

• Position Creativity• Position 1.00 -0.31• Creativity -0.31 1.00

45

Page 46: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Reporting spearman

A Spearman non-parametric correlation test indicated a significant correlation between creativity and ranking in the world’s biggest liar contest, r=-.37, p<0.001

46

Page 47: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Kendall’s tau non-parametric

• used for small datasets

• cor.test(liarData$Position, liarData$Creativity, alternative="less", method="kendall")

• z = -3.2252, p-value = 0.0006294• alternative hypothesis: true tau is less than 0• sample estimates:• tau • -0.3002413 47

Page 48: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Reporting Kendall’s test

A Kendall tau correlation coefficient indicated a correlation between creativity and performance in the World’s biggest liar contest, t=-.30, p<0.001

48

Page 49: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Biserial and point-biserial correlations

• one variable is dichotomous (categorical with 2 categories)

• point biserial: for discrete dichotomy (e.g., dead)

• biserial: for continuous dichotomy (e.g., pass exam)

49

Page 50: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Readings

• Discovering statistics using R (Andy Field, Jeremy Miles, Zoe Field)

50

Page 51: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

R

51

Title Text

Page 52: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

set work directory

• setwd("/new/work/directory")

• getwd()

• ls()    # list the objects in the current workspace

52

Page 53: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

packages

• install.packages(“package.name") #installing packages

• library(package.name) # loading a package

• package::function() # disambiguating functions

53

Page 54: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Nominal and Ordinal data

• mydata$v1 <- factor(mydata$v1,levels = c(1,2,3),labels = c("red", "blue", “green"))

• mydata$v1 <- ordered(mydata$y,levels = c(1,3, 5),labels = c("Low", "Medium", "High"))

54

Page 55: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Missing data

• is.na(var) #tests for missing valua/ also in rows

• mydata$v1[mydata$v1==99] <- NA # select rows where v1 is 99 and recode column v1

• x <- c(1,2,NA,3)mean(x) # returns NA mean(x, na.rm=TRUE)

• newdata <- na.omit(mydata) # spawn dataset without missing data55

Page 56: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Install and load packages

• install.packages(“car”); install.packages(“ggplot2”); install.packages(“pastecs”); install.packages(“psych”); install.packages(“descr”)

• library(car);library(ggplot2);library(pastecs);library(psych);library(Rcmdr);library(descr)

56

Page 57: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Enter data

• id<-c(1,2,3,4,5,6,7,8,9,10)• sex<-c(1,1,1,1,1,2,2,2,2,2)• courses<-c(2.0,1.0,1.0,2.0,3.0,3.0,3.0,2.0,4.0,3.0)• sex<-factor(sex, levels=c(1:2), labels=c("M", "F"))• example<-

data.frame(ID=id,Gender=sex,Courses=courses)

57

Page 58: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Frequency Distributions

• facebook<-c(22,40,53,57,93,98,103,108,116,121,252)

• library(modeest)• mfv(facebook)

• mean(facebook)

• median(facebook)58

Page 59: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Enter data

• quantile (facebook)

• IQR (facebook)

• var(facebook)

• sd(facebook)

59

Page 60: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

describing your data

• #load meaningful data• lecturer<-read.csv(“lecturerData.csv”,

header=TRUE)

• #get statistics• stat.desc(lecturerData[,c("friends", "income")],

basic=FALSE, norm=TRUE)

60

Page 61: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

describing your data II

• # print frequency table

• tab.friends<-as.data.frame(freq(ordered(lecturerData$friends)), plot=FALSE)

• tab.friends.cumsum<-cumsum(tab.friends[-dim(tab.friends)[1],]$Frequency)

• tab.friends$CumFreq=c(tab.friends.cumsum,NA)• tab.friends

61

Page 62: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Testing Normally Distributed

• Load DLF data• dlf<-read.delim("DownloadFestival.dat",

header=TRUE)

• Data about hygiene collected during a festival (3days)

62

Page 63: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Enter data

• hist.day1 <- ggplot (dlf, aes(day1)) + theme(legend.position = "none") + geom_histogram(aes(y = ..density..), colour="black", fill="white")+ labs(x="Hygiene score on day1", y="Density")

• hist.day1 + stat_function(fun = dnorm, args=list(mean=mean(dlf$day1,na.rm=TRUE), sd=sd(dlf$day1, na.rm = TRUE)), colour="blue", size=1)

• qqplot.day1 <-qplot(sample=dlf$day1, stat="qq")63

Page 64: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Plot day 1

64

0.0

0.1

0.2

0.3

0.4

0.5

0 5 10 15 20Hygiene score on day1

Density

Page 65: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Offending score

• # print bad score• dlf$day1[dlf$day1>5]

• #correct bad score• dlf$day1[dlf$day1>5]<-2.02

65

0

5

10

15

20

-2 0 2theoretical

sample

Page 66: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Quantify normallity

• describe(cbind(dlf$day1, dlf$day2, dlf$day3))

• stat.desc(dlf[,c("day1","day2","day3")p], basic = FALSE, norm= TRUE)

66

Page 67: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Groups

• rexam<-read.delim("rexam.dat", header=TRUE)

• # obtain statistics for exam, computer, lectures and numeracy• round(stat.desc(rexam[,c("exam","computer","lectures","numer

acy")], basic=FALSE, norm=TRUE), digits=3)

• hist.exam <-ggplot (rexam, aes(exam)) + theme(legend.position = "none") + geom_histogram(aes(y = ..density..), colour="black", fill="white") + labs(x="exam", y="density") + stat_function(fun=dnorm, args=list(mean=mean(rexam$exam,na.rm=TRUE), sd=sd(rexam$exam, na.rm=TRUE)), colour="blue", size=1)67

Page 68: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Add factors

• # set uni to be a factor• rexam$uni <-factor(rexam$uni, levels = c(0:1),

labels = c("KFU", “TUG"))

• by (rexam[, c("exam", "computer", "lectures", "numeracy")], rexam$uni, stat.desc, basic=FALSE, norm = TRUE)

68

Page 69: Descriptive Methods - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/eveas/courses/evalme/slides/707.031-EvalMe-07.pdf · Descriptive Methods 707.031: Evaluation Methodology

Get subsets and individual histograms

• # now we create subsets of the example datasets for each factor

• kfu<-subset(rexam, rexam$uni=="KFU")• tug<- subset(rexam, rexam$uni==“TUG")

• # now we can create histograms for each subset• hist.exam.kfu <-ggplot (kfu, aes(exam)) +

theme(legend.position = "none") + geom_histogram(aes(y = ..density..), colour="black", fill="white") + labs(x="exam", y="density") + stat_function(fun=dnorm, args=list(mean=mean(kfu$exam,na.rm=TRUE), sd=sd(kfu$exam, na.rm=TRUE)), colour="blue", size=1)69