View
214
Download
0
Tags:
Embed Size (px)
Citation preview
A fuzzy clustering approach to improve the accuracy of Italian students’data
An experimental procedure to correct the impact of the outliers on assessment test
scores
Claudio Quintano, Rosalia Castellano, Sergio LongobardiClaudio Quintano, Rosalia Castellano, Sergio Longobardi
UNIVERSITY OF NAPLES “PARTHENOPE” UNIVERSITY OF NAPLES “PARTHENOPE”
[email protected] [email protected] [email protected]@uniparthenope.it
-2-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
OUTLINE
This work considers data on students’ performance assessments collected by the
Italian National Evaluation Institute of the Ministry of Education (INVALSI)
• OUTLIER UNITSOUTLIER UNITS, at class level, which brings to biased , at class level, which brings to biased distributions of the average scores by class distributions of the average scores by class
• The AIM is to The AIM is to MITIGATE THE PRESENCEMITIGATE THE PRESENCE of outliers and of outliers and correcting the overestimation of children abilitycorrecting the overestimation of children ability
THE INVALSI SURVEY
3 AREASreading, mathematics and science
5 SCHOOL LEVELS
–2th and 4th year of primary school–1th year of lower secondary –1th and 3th year of upper secondary
-3-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
VAR00005100,0080,0060,0040,0020,000,00
Fre
qu
en
cy
500
400
300
200
100
0
Histogram
avergita100806040200
Fre
qu
en
cy
1.250
1.000
750
500
250
0
HistogramMATHEMATICS CLASS MEAN SCORE - S.Y 2004/05
III CLASS UPPER
SECONDARY SCHOOL
I CLASS UPPER SECONDARY
SCHOOL
I CLASS LOWER
SECONDARY SCHOOL
IV CLASS PRIMARY SCHOOL
avergita100806040200
Fre
qu
ency
1.250
1.000
750
500
250
0
Histogram
II CLASS
PRIMARY SCHOOL
VAR00005100,0080,0060,0040,0020,000,00
Fre
qu
en
cy
500
400
300
200
100
0
Histogram
VAR00002100,0080,0060,0040,0020,000,00
Fre
qu
en
cy
2.000
1.500
1.000
500
0
Histogram
DISTRIBUTIONS OF MEAN SCORESAT CLASS LEVEL (MATHEMATICS ASSESSMENT)
-4-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
avergita100806040200
Fre
qu
ency
3.000
2.000
1.000
0
Histogram
Mean =87,81Std. Dev. =8,14
N =30.031
avergita
100806040200
Fre
qu
ency
1.250
1.000
750
500
250
0
Histogram
Mean =74,71Std. Dev. =14,133
N =30.097
avergita
100806040200
Fre
qu
ency
2.500
2.000
1.500
1.000
500
0
Histogram
Mean =76,66Std. Dev. =14,615
N =30.094
CLASS MEAN SCOREII
CLA
SS
- P
RIM
AR
Y S
CH
OO
L
avergita100806040200
Freq
uenc
y
2.000
1.500
1.000
500
0
Histogram
Mean =77,72Std. Dev. =13,029
N =29.802
avergita
100806040200
Freq
uenc
y
2.000
1.500
1.000
500
0
Histogram
Mean =81,5Std. Dev. =11,439
N =29.816
avergita100806040200
Fre
qu
ency
5.000
4.000
3.000
2.000
1.000
0
Histogram
Mean =88,54Std. Dev. =8,832
N =29.831
Reading s.y. 2004/05 Mathematics s.y. 2004/05 Science s.y. 2004/05
Reading s.y. 2005/06 Mathematics s.y. 2005/06 Science s.y. 2005/06
-5-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
Deletion of micro units –students- Deletion of micro units –students- considered as considered as
““PSEUDO NON RESPONDENTS”PSEUDO NON RESPONDENTS”
Students who haven’t given the minimum number of answers to compute a performance score
The presence of these units varies The presence of these units varies from 9% to 16%from 9% to 16%
STEP I
-6-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
Class mean score Class mean score ::
N j
iji 1
jj
p
pN
COMPUTATION OF CLASS LEVEL INDICATOR
SCORE OF ISCORE OF ITHTH STUDENT OF J STUDENT OF JTHTH CLASSCLASS
NUMBER OF RESPONDENTNUMBER OF RESPONDENTSTUDENTS OF JSTUDENTS OF JTHTH CLASS CLASS
For each student class the For each student class the following indexes are computed:following indexes are computed:
Standard deviation of mean score
N j 2
jiji 1
jj
p p
N
N j
iji 1
jj
M
MCN Q
Class non response rateClass non response rateNUMBER BOTH OF ITEM NON NUMBER BOTH OF ITEM NON REPSONSES AND OF INVALID REPSONSES AND OF INVALID
RESPONSES FOR THE IRESPONSES FOR THE ITHTH STUDENT OF STUDENT OF THE JTHE JTHTH CLASS CLASS
NUMBER OF RESPONDENTNUMBER OF RESPONDENTSTUDENTS OF JSTUDENTS OF JTHTH CLASS CLASS
NUMBER OF ADMINISTERED NUMBER OF ADMINISTERED ITEMS TO JITEMS TO JTHTH CLASS CLASS
Q
sjs 1
J
E
EQ
Index of answers’ homogeneityIndex of answers’ homogeneity
GINI MEASURE OF HETEROGENEITY COMPUTED FOR EACH STH TEST QUESTION ADMINISTERED TO
EACH STUDENT OF JTH CLASS
SUMMARY
At first step the micro units At first step the micro units considered as “pseudo-non considered as “pseudo-non
respondents” have been respondents” have been dropped from dataset then dropped from dataset then the following indexes, at the following indexes, at
class level, are computed:class level, are computed:
Class mean scoreClass mean score
Standard deviation Standard deviation of mean scoreof mean score
Class non response rateClass non response rate
Index of answers’Index of answers’homogeneityhomogeneity
-7-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
PRINCIPAL COMPONENT ANALYSIS (PCA)
By the PCA we are able to describe the
answer behaviour of each student class
through two variables
CONTRAPOSITION
FIRST ComponentSECOND
Component
OUTLIERSIDENTIFICATION
AXIS
INDEX OF CLASS COLLABORATION
TO SURVEY
Class non response rate
-8-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
It is possible to detect, graphically, the outlier classes of students
Projection on the first two factorial
axes plane of second class
primary students
PRINCIPAL COMPONENT ANALYSIS (PCA)
OUTLIER CLASSES
-9-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
Computation of Computation of fuzzy fuzzy partition matrixpartition matrix where for where for
each students’ class (rows of each students’ class (rows of the matrix) the degree of the matrix) the degree of belonging to each cluster belonging to each cluster (columns of the matrix) is (columns of the matrix) is
computed computed
THE FUZZY K-MEANS APPROACHFUZZY K-MEANS APPROACH
On the basis of the two On the basis of the two factorial dimensions the factorial dimensions the
students’classes are classified students’classes are classified in 8 clusters by a in 8 clusters by a FUZZY K-FUZZY K-
MEANSMEANS algorithm algorithm
-10-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
DETECTION OF OUTLIERSDETECTION OF OUTLIERS
Projection of Projection of centroids computed centroids computed
by fuzzy k-meansby fuzzy k-means
High negative scores on High negative scores on “outliers identification “outliers identification
axis” (x-axis) that axis” (x-axis) that indicates a high class indicates a high class average scores and average scores and
minimum within minimum within variabilityvariability respect to respect to
scores and test answersscores and test answers
OUTLIER CLUSTER
Factorial scores close to zero Factorial scores close to zero respect to the “index of class respect to the “index of class
collaboration to survey”collaboration to survey”
-11-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
Indicating with “a” the outlier cluster, the degree of belonging to this cluster is:
µja
Otherwise it can be interpreted as the “outlier level” of each class
This measure is considered as the “outlier probability” of jth class
DETECTION OF OUTLIERSDETECTION OF OUTLIERS
-12-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
WWjj varies from 0 to 1 varies from 0 to 1 The students’ class with high probability to belong The students’ class with high probability to belong to outlier cluster will have a low weight while the to outlier cluster will have a low weight while the class very far from this cluster will have a weight class very far from this cluster will have a weight
close to 1close to 1
CORRECTION PROCEDURE
On the basis of the outlier cluster degree,On the basis of the outlier cluster degree, a weighting factor is developed:a weighting factor is developed:
WWj j =1 -=1 - µ µjaja
Weighting factor Outlier probability
-13-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
EFFECTS OF THE CORRECTION PROCEDURE
ORIGINAL DISTRIBUTION ADJUSTED DISTRIBUTION
-14-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
THE INSPIRATION PRINCIPLE
OUTLIEROUTLIER
NOT NOT OUTLIEROUTLIER
Go over the dichotomous logic
FUZZY FUZZY APPROACHAPPROACH
Compute an “OUTLIER LEVEL” measure for each unit to
calibrate the correction
-15-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
RELATIONSHIP BETWEEN THE SCHOOL LOCALIZATION AND THE PRESENCE OF OUTLIER CLASSES
Box plot of outlier level
µja
Degree to belonging to the outlier
cluster (cluster n.2)
-16-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
CLASS AVERAGE CLASS AVERAGE SCORE SCORE
DISTRIBUTIONS ONLY DISTRIBUTIONS ONLY FOR THE NORTHERN FOR THE NORTHERN
AND CENTRAL AND CENTRAL REGIONSREGIONS
RELATIONSHIP BETWEEN THE SCHOOL LOCALIZATION AND THE PRESENCE OF OUTLIER CLASSES
-17-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
REGIONAL SCORES
50
52
54
56
58
60
62
64
66
68
70
72
74
76
78
80
82
84
86
Media
II elem MAT 0405
50
52
54
56
58
60
62
64
66
68
70
72
74
76
78
80
82
84
86
Piem
onte
Vall
e D'A
osta
Lom
bard
ia
Tre
ntino
Alto
Adig
e
Ven
eto
Friu
li Ven
ezia
Giulia
Ligu
ria
Em
ilia R
omag
na
Tos
cana
Um
bria
Mar
che
Laz
io
Abr
uzzo
Moli
se
Cam
pania
Pug
lia
Bas
ilicat
a
Cala
bria
Sici
lia
Sar
degn
a
Media ponderata
II elem MAT 0405
NOT WEIGHTED AVERAGE
WEIGHTED AVERAGE
-18-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
denotes the ratio of students of jth class that has given the tth answer to sth question
Index of answers’ homogeneity
Index of answers’ homogeneityQ
EE
Q
ssj
j
1
Where Esj is a Gini measure of heterogeneity:
2
1
1
h
t j
tsj N
nE
j
t
N
n
The Gini measure is equal to zero when all students of jth class have given
the same answer to the sth question. It reaches the maximum value: h-1/h (h
is the number of alternative answers to question sth) when there is perfect
heterogeneity of answers to sth question in the jth class
The mean of the Q Gini indexes (Esj) computed for each sth test Question administered to each student of jth class:
-19-
A fuzzy clustering approach to improve the accuracy of Italian students’dataAn experimental procedure to correct the impact of the outliers on assessment test scores
C.Quintano, R.Castellano, S.Longobardi - University of Naples “Parthenope”
Original distribution Adjusted distribution
MEAN 74,71 71,67
MODE 100,00 68,75
I QUARTILE 64,42 63,12
MEDIAN 73,61 71,09
III QUARTILE 85,94 80,69
KURTOSIS
SKEWNESS
EFFECTS OF THE CORRECTION PROCEDURE