13
ARTICLE IN PRESS JID: KNOSYS [m5G;March 4, 2018;8:59] Knowledge-Based Systems 000 (2018) 1–13 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys A three learning states Bayesian knowledge tracing model Kai Zhang a,b,, Yiyu Yao b a National Engineering Research Center for E-Learning, Central China Normal University, China b Department of Computer Science, University of Regina, Canada a r t i c l e i n f o Article history: Received 7 September 2017 Revised 27 February 2018 Accepted 1 March 2018 Available online xxx Keywords: Bayesian knowledge tracing Three-way decisions a b s t r a c t This paper proposes a Bayesian knowledge tracing model with three learning states by extending the original two learning states. We divide a learning process into three sections by using an evaluation function for three-way decisions. Advantages of such a trisection over traditional bisection are demon- strated by comparative experiments. We develop a three learning states model based on the trisection of the learning process. We apply the model to a series of comparative experiments with the original model. Qualitative and quantitative analyses of the experimental results indicate the superior performance of the proposed model over the original model in terms of prediction accuracies and related statistical measures. © 2018 Elsevier B.V. All rights reserved. 1. Introduction Knowledge is regarded as a collection of rules or skills in the knowledge tracing model. Knowledge tracing is a process of esti- mating the probability that a student has learned a rule or skill of knowledge in intelligent tutoring systems. More generally, knowl- edge is regarded as a collection of knowledge components (KCs). As pointed out by Koedinger et al. [8], each of the KCs is defined as “an acquired unit of cognitive function or structure that can be inferred from performance on a set of related tasks”. For example, “Addition”, “Subtraction”, “Multiplication” and “Division” are com- monly used KCs in math. The aim of knowledge tracing is to esti- mate the probability that a KC is learned. In 1995, Corbett and Anderson [4] proposed the origi- nal Bayesian knowledge tracing (BKT) model. Many extensions [14,18,19,23] of the original BKT model have been introduced in order to fit different learning environments. Pardos and Heffernan [18] individualized the prior learning parameter of the original BKT model by adding one student node. They also introduced the diffi- culty of a KC by adding one node into the topology of the original BKT model, and two performance parameters are individualized by the added KC difficulty node [19]. Taking into account instruc- tional interventions, Lin and Chi [14] proposed an Intervention-BKT model by adding an intervention node in the original BKT model. Wang et al. [23] proposed a Multi-Grained-BKT model by follow- ing the definition of multi fine-grained KCs, and a Historical-BKT model to incorporate previous question responses. Corresponding author at: Department of Computer Science, University of Regina, Canada. E-mail address: [email protected] (K. Zhang). However, the original BKT model intuitively introduced two learning states without a formal definition. Furthermore, the origi- nal BKT model and its extensions regard the two learning states as unlearned-state and learned-state for granted. The unlearned-state indicates that a student unlikely masters a KC, and the learned- state indicates that a student certainly masters a KC. One can eas- ily see that the two learning states may not completely reflect the evolution of learning. There exists a transitional learning state between the unlearned-state and learned-state. In the transitional learning state, the student probably masters a KC. Motivated by the observation of the existence of the transi- tional learning state, we introduce a method to divide a learning process into three learning states by ideas from three-way deci- sions [24–28]. Consequently, we improve the original BKT model into a new three learning states BKT (TLS-BKT) model. By extracting the commonly used idea of thinking in threes across many disciplines, Yao [28] proposed a theory of three-way decisions. A trisecting-and-acting model of three-way decisions in- volves dividing a universal set into three parts and designing ef- fective strategies to process the three parts. Yu et al. [29] pro- posed a three-way decisions based clustering method to discover the transitional regions of adjacent clusters. Li et al. [11] utilized sequential three-way decisions for cost-sensitive face recognition based on a sequence of image granulations. Savchenko [22] ap- plied sequential three-way decisions to multi-class recognition by trisecting the distance from query objects to reference objects. By using the thresholds from game-theoretic rough sets and infor- mation theoretic rough sets, Nauman et al. [17] divided applica- tion software behaviors into three parts. Li et al. [12] presented a cost-sensitive software defect prediction method by classifying the software modules into three different regions. Zhang and Min https://doi.org/10.1016/j.knosys.2018.03.001 0950-7051/© 2018 Elsevier B.V. All rights reserved. Please cite this article as: K. Zhang, Y. Yao, A three learning states Bayesian knowledge tracing model, Knowledge-Based Systems (2018), https://doi.org/10.1016/j.knosys.2018.03.001

ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

Knowledge-Based Systems 0 0 0 (2018) 1–13

Contents lists available at ScienceDirect

Knowle dge-Base d Systems

journal homepage: www.elsevier.com/locate/knosys

A three learning states Bayesian knowledge tracing model

Kai Zhang

a , b , ∗, Yiyu Yao

b

a National Engineering Research Center for E-Learning, Central China Normal University, China b Department of Computer Science, University of Regina, Canada

a r t i c l e i n f o

Article history:

Received 7 September 2017

Revised 27 February 2018

Accepted 1 March 2018

Available online xxx

Keywords:

Bayesian knowledge tracing

Three-way decisions

a b s t r a c t

This paper proposes a Bayesian knowledge tracing model with three learning states by extending the

original two learning states. We divide a learning process into three sections by using an evaluation

function for three-way decisions. Advantages of such a trisection over traditional bisection are demon-

strated by comparative experiments. We develop a three learning states model based on the trisection of

the learning process. We apply the model to a series of comparative experiments with the original model.

Qualitative and quantitative analyses of the experimental results indicate the superior performance of the

proposed model over the original model in terms of prediction accuracies and related statistical measures.

© 2018 Elsevier B.V. All rights reserved.

1

k

m

k

e

A

a

i

m

m

n

[

o

[

m

c

B

b

t

m

W

i

m

R

l

n

u

i

s

i

t

b

l

t

p

s

i

a

d

v

f

p

t

s

b

p

t

u

h

0

. Introduction

Knowledge is regarded as a collection of rules or skills in the

nowledge tracing model. Knowledge tracing is a process of esti-

ating the probability that a student has learned a rule or skill of

nowledge in intelligent tutoring systems. More generally, knowl-

dge is regarded as a collection of knowledge components (KCs).

s pointed out by Koedinger et al. [8] , each of the KCs is defined

s “an acquired unit of cognitive function or structure that can be

nferred from performance on a set of related tasks”. For example,

Addition”, “Subtraction”, “Multiplication” and “Division” are com-

only used KCs in math. The aim of knowledge tracing is to esti-

ate the probability that a KC is learned.

In 1995, Corbett and Anderson [4] proposed the origi-

al Bayesian knowledge tracing (BKT) model. Many extensions

14,18,19,23] of the original BKT model have been introduced in

rder to fit different learning environments. Pardos and Heffernan

18] individualized the prior learning parameter of the original BKT

odel by adding one student node. They also introduced the diffi-

ulty of a KC by adding one node into the topology of the original

KT model, and two performance parameters are individualized

y the added KC difficulty node [19] . Taking into account instruc-

ional interventions, Lin and Chi [14] proposed an Intervention-BKT

odel by adding an intervention node in the original BKT model.

ang et al. [23] proposed a Multi-Grained-BKT model by follow-

ng the definition of multi fine-grained KCs, and a Historical-BKT

odel to incorporate previous question responses.

∗ Corresponding author at: Department of Computer Science, University of

egina, Canada.

E-mail address: [email protected] (K. Zhang).

m

t

a

t

ttps://doi.org/10.1016/j.knosys.2018.03.001

950-7051/© 2018 Elsevier B.V. All rights reserved.

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

However, the original BKT model intuitively introduced two

earning states without a formal definition. Furthermore, the origi-

al BKT model and its extensions regard the two learning states as

nlearned-state and learned-state for granted. The unlearned-state

ndicates that a student unlikely masters a KC, and the learned-

tate indicates that a student certainly masters a KC. One can eas-

ly see that the two learning states may not completely reflect

he evolution of learning. There exists a transitional learning state

etween the unlearned-state and learned-state. In the transitional

earning state, the student probably masters a KC.

Motivated by the observation of the existence of the transi-

ional learning state, we introduce a method to divide a learning

rocess into three learning states by ideas from three-way deci-

ions [24–28] . Consequently, we improve the original BKT model

nto a new three learning states BKT (TLS-BKT) model.

By extracting the commonly used idea of thinking in threes

cross many disciplines, Yao [28] proposed a theory of three-way

ecisions. A trisecting-and-acting model of three-way decisions in-

olves dividing a universal set into three parts and designing ef-

ective strategies to process the three parts. Yu et al. [29] pro-

osed a three-way decisions based clustering method to discover

he transitional regions of adjacent clusters. Li et al. [11] utilized

equential three-way decisions for cost-sensitive face recognition

ased on a sequence of image granulations. Savchenko [22] ap-

lied sequential three-way decisions to multi-class recognition by

risecting the distance from query objects to reference objects. By

sing the thresholds from game-theoretic rough sets and infor-

ation theoretic rough sets, Nauman et al. [17] divided applica-

ion software behaviors into three parts. Li et al. [12] presented

cost-sensitive software defect prediction method by classifying

he software modules into three different regions. Zhang and Min

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 2: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

2 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

Hidden x1 x2 x3 xT

Observed y1 y2 y3 yT

Fig. 1. The graphical representation of the HMM.

Fig. 2. The instantiated of the HMM.

T

b

r

{

b

n

Q

A

N

A

w

t

V

A

N

B

w

b

c

b

t

p

e

t

i

o

a

s

r

2

m

s

b

a

g

m

b

a

i

i

P

t

[30] proposed several three-way recommender systems. Lang et al.

[10] proposed a three-way decisions based conflict analysis method

to divide conflict probabilities into three parts. Chen et al. [3] pro-

posed a multi-granular mining method to refine the transitional

regions of the three-way decisions model. Based on an L-level sim-

ilarity relation and a loss interval set, Liu et al. [15] proposed a new

loss function to determine the three parts of a universal set. Qi

et al. [21] proposed several algorithms for building three-way con-

cept lattices by the connections between three-way decisions and

classical concept lattices. Zhou [32] divided an email set into three

parts by minimizing the misclassification cost. Liang et al. [13] pro-

posed a relative value based loss function to divide a universal set

into three parts. To satisfy three different optimization objectives,

Zhang and Yao [31] proposed three different Gini functions that

can divide a universal set into three parts. Results of these studies

suggest that the philosophy of thinking in threes can be equally

applicable to model a learning process in intelligent tutoring sys-

tems. The introduction of the transitional state into the original

BKT model offers new research opportunities and challenges.

The rest of the paper is organized as follows. Section 2 in-

troduces the basics of the original BKT model and three-way de-

cisions. Section 3 proposes the definitions of a learning process

and a learning state. Based on them, an evaluation measure for

learning state partitions is derived. Thus, the original two learn-

ing states and the proposed three learning states are comparable.

Section 4 proposes the TLS-BKT model. Section 5 investigates the

performance of the proposed model by comparing with the orig-

inal BKT model on Assistments Math data sets and discusses the

implications of the experimental results. Section 6 presents our

conclusions.

2. Bayesian knowledge tracing model and three-way decisions

This section introduces basic concepts of hidden Markov mod-

els, the original Bayesian knowledge tracing model and three-way

decisions.

2.1. Hidden Markov models

A hidden Markov model (HMM) [2] is for the purpose of repre-

senting the probability distributions over a discrete time sequence

of observations Y = (y 1 , y 2 , . . . , y T ) . An HMM assumes that any ob-

servation in Y is determined by a Markov process sequence of

states X = (x 1 , x 2 , . . . , x T ) , and any state x i in X is hidden instead

of being observed. The joint distribution of X and Y , modeled by

an HMM, is factorized as follows:

P (X, Y ) = p(x 1 ) p(y 1 | x 1 ) T ∏

t=2

p(y t | x t ) p(x t | x t−1 ) , (1)

where p ( x 1 ) is the probability of the hidden state at time 1, p ( y t | x t )

is the emission probability of the observation y t at time t given the

hidden state x t , and p(x t | x t−1 ) is the transition probability from the

hidden state x t−1 to the hidden state x t .

The graphical representation of the factorization Eq. (1) is il-

lustrated in Fig. 1 .The gray circles, labeled by x t , t ∈ { 1 , 2 , . . . , T } ,represent the hidden states at time t . The rounded rectangles, la-

beled by y t , t ∈ { 1 , 2 , . . . , T } , represent the observations at time t .

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

he arrows between the gray circles indicate the transition proba-

ilities p(x t | x t−1 ) . The arrows from the gray circles to the rounded

ectangles indicate the emission probabilities p ( y t | x t ).

An HMM also assumes that any hidden state variable x t , t ∈ 1 , 2 , . . . , T } and any observation variable y t , t ∈ { 1 , 2 , . . . , T } are

oth discrete. Specifically, suppose that x t takes on N values, de-

oted as follows:

= { q 1 , q 2 , . . . , q N } . ll the transition probabilities of an HMM can be represented as a

× N transition matrix as follows:

= [ a i j ] N×N =

[p(x t = q j | x t−1 = q i )

]N×N

,

here i = 1 , 2 , . . . , N; j = 1 , 2 , . . . , N. Likewise, suppose that y takes on M values, denoted as follows:

= { v 1 , v 2 , . . . , v M

} . ll the emission probabilities of an HMM can be represented as a

× M emission matrix as follows:

= [ b i ( j)] N×M

=

[p(y t = v j | x t = q i )

]N×M

,

here i = 1 , 2 , . . . , N; j = 1 , 2 , . . . , M.

An instantiated HMM is shown in Fig. 2 . The gray circles, la-

eled by q 1 , . . . , q N , indicate the values that any hidden state x t an take on at a particular time. The rounded rectangles, labeled

y v 1 , . . . , v M

, indicate the values that the corresponding observa-

ion y t can take on at that time. The solid arrows denote transition

robabilities. The dashed arrows denote emission probabilities.

Assume that the initial hidden state value probability vector is

= [ π(i )] N , where π(i ) = p(x 1 = q i ) , i ∈ { 1 , 2 , . . . , N} . The param-

ters of an HMM are represented by λ = (A, B, �) , that is a triplet

hat includes a transition matrix A , an emission matrix B and an

nitial hidden state value probability vector �. Given a sequence

f observations, the most likely λ can be estimated by a learning

lgorithm. Given λ and a sequence of observations, the most likely

equence of hidden states can be deduced by an inference algo-

ithm.

.2. The Bayesian knowledge tracing model

The original BKT model and its extensions have been imple-

ented on the basis of HMMs. The observations of HMMs repre-

ent a sequence of a student’s performances, any of which is la-

eled as correct or incorrect. The hidden states of HMMs represent

sequence of the learning states, any of which is intuitively re-

arded as unlearned-state or learned-state.

The aim of the original BKT model and its extensions is to esti-

ate the learned probability of each KC at opportunity t , denoted

y a learning parameter P ( L t ), t ≥ 0. P ( L t ) means the probability of

KC is learned after the t -th opportunity of applying it, and P ( L 0 )

s the initial probability of a KC before any opportunity of applying

t. Furthermore, the model introduces another learning parameter

( T ) and two performance parameters P ( G ) and P ( S ). The parame-

er P ( T ) is the probability that the learning state will transit from

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 3: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 3

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

Fig. 3. The original BKT model.

u

K

w

w

u

w

t

e

o

i

g

s

o

r

p

t

f

t

g

l

t

2

[

t

i

t

d

p

s

c

t

fi

c

i

c

t

T

d

t

r

t

r

s

t

n

b

s

o

u

u

s

b

W

u

o

t

R

f

u

i

I

t

t

a

t

a

w

3

w

t

t

a

f

s

b

m

3

t

s

nlearned-state to learned-state after an opportunity to apply a

C, P ( G ) means the probability that a KC will be guessed correctly

hen it is in unlearned-state, and P ( S ) is the probability that a KC

ill be slipped when it is in learned-state. The probability P ( L t ) is

pdated after each opportunity to apply a KC as follows [1] :

P (L t−1 | Correct t ) =

P (L t−1 )(1 − P (S))

P (L t−1 )(1 − P (S)) + (1 − P (L t−1 )) P (G ) ,

P (L t−1 | Incorrect t ) =

P (L t−1 ) P (S)

P (L t−1 ) P (S) + (1 − P (L t−1 ))(1 − P (G )) ,

P (L t ) = P (L t−1 | evidence t ) + (1 − P (L t−1 | evidence t )) ∗ P (T ) ,

here Correct t means that the performance of the t -th opportunity

o apply the KC is correct, Incorrect t means the contrary case, and

vidence t ∈ {Correct t , Incorrect t } means the performance of the t -th

pportunity to apply the KC.

The original BKT model can be illustrated in Fig. 3 . Fig. 3 (a)

s the graphical representation of the original BKT model. The

ray circles, labeled by kc t , t ∈ { 1 , 2 , . . . , T } , represent the learning

tates of a certain kc at time t . The rounded rectangles, labeled by

t , t ∈ { 1 , 2 , . . . , T } , represent the performances at time t . The ar-

ows between the gray circles mean the learning state transition

robabilities p(kc t | kc t−1 ) from time t − 1 to time t . The arrows be-

ween the gray circles to the rounded rectangles mean the per-

ormance emission probabilities p ( o t | kc t ) given a learning state at

ime t . Fig. 3 (b) is the instantiation of the original BKT model. The

ray circles, labeled by u and l , represent the unlearned-state and

earned-state. The rounded rectangles, labeled by c and i , represent

he correct and incorrect performances.

.3. Three-way decisions

An underlying idea of three-way decisions is thinking in threes

28] . In contrast to dichotomous thinking in terms of two op-

ions, three-way decisions introduce a third option. More specif-

cally, we move from true/false, black/white, yes/no etc. into

rue/unsure/false, black/grey/white, yes/maybe/no etc. Three-way

ecisions were initially proposed by giving a sound semantical ex-

lanation of the positive, boundary and negative regions in rough

ets [20] . Subsequent studies showed that three-way decisions are

ommonly used across many disciplines and are an effective way

o complex problem solving. Three-way decisions are based on

ndings from human cognition [7] . Due to limited information pro-

ess capacity, we can normally process a small number of units of

nformation, ranging from two to seven [5,16] . Thinking in threes

omes naturally in our daily lives. The added third option provides

he necessary flexibility and universality of three-way decisions.

here is a fast growing interest in theory and practice of three-way

ecisions [24–26] .

A trisecting-and-acting model of three-way decisions consists of

wo basic components [28] , as illustrated by Fig. 4 . The rounded

ectangle, labeled by ‘A universal set’, represents a universe ( U )

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

hat includes all the elements in a problem domain. The rounded

ectangles, labeled by ‘Region I’, ‘Region II’, and ‘Region III’, repre-

ent three subsets of U . The three regions are pairwise disjoint and

he union of the three regions equals to U . The three regions do

ot necessarily form a partition of U , since one or two of them can

e the empty set. Strategy I, Strategy II and Strategy III are three

trategies for acting on the corresponding regions. In this paper, we

nly employ the trisecting part of the trisecting-and-acting model.

In order to trisect a universal set U into three regions, one may

se an evaluation function and a pair of thresholds α, β on val-

es of the evaluation function. Suppose ( L , �) is a totally ordered

et, that is, � is reflexive, antisymmetric, transitive and compara-

le. (i.e., for any pair of elements a, b ∈ L , either a �b or b �a holds).

e can define a relation ≺ as: holds a ≺b if a �b ∧ ¬( b �a ). An eval-

ation function v : U → L maps the element in U to L . Given a pair

f thresholds α, β ∈ L with β≺α, we divide the universal set U into

hree regions as follows:

Region I (v ) = { x ∈ U | v (x ) � β} , Region II (v ) = { x ∈ U | β ≺ v (x ) ≺ α} , egion III (v ) = { x ∈ U | α � v (x ) } . (2)

A learning process about a KC is a sequence of a student’s per-

ormances about the KC. The trisecting-and-acting model can be

sed in trisecting and interpreting of a learning process. A learn-

ng process can be divided into three regions according to Eq. (2) .

n the Region I( v ) of a learning process, a student does not mas-

er a KC and the performances may be poor. In the Region II( v ) of

he learning process, the student probably knows more and more

bout the KC and the performances become better and better. In

he Region III( v ) of the learning process, the student knows about

KC and the performances may be good and stable. The details

ill be discussed in the next section.

. Interpretation of three learning states

In this section, we first formulate a learning process. Second,

e present two examples to illustrate the original bisection par-

ition of a learning process [4] and the proposed trisection parti-

ion of a learning process. Third, we formulate a learning state as

n interval of a learning process and define a distinction measure

or evaluating a partition. Fourth, we propose the methods for bi-

ection partition and trisection partition. Finally, we compare the

isection partition with the trisection partition by the distinction

easure.

.1. Examples for bisection and trisection of a learning process

In a Bayesian knowledge tracing model, we represent the his-

ory of a student’s learning process by her/his performance on a

eries of exercises regarding a KC. If the student provides a right

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 4: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

4 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

A universal set U

Region I Region II Region III

Strategy I Strategy II Strategy III

Trisecting

Acting

Fig. 4. The trisecting-and-acting model of three-way decisions [28] .

1 0 0 0 0 1 0 1 0 1 1 1

unlearned-state learned-state

A Learning Process(a) The original bisection partition.

1 0 0 0 0 1 0 1 0 1 1 1

unlearned-state learning-state learned-state

A Learning Process(b) The proposed trisection partition.

Fig. 5. Two partition examples.

j

p

T

p

a

s

e

s

c

a

i

p

a

O

D

s

d

a

a

0

D

O

a

d

l

e

s

l

l

D

m

t

D

t

F

p

t

i

answer to the exercise, it is denoted as a correct performance. Oth-

erwise, it is denoted as an incorrect performance. Generally, incor-

rect performances are in the majority in the beginning of a series

of exercise, and correct performances are keeping rising as the stu-

dent continues learning. In this paper, we make the same assump-

tion as the original bisection partition, which is that the perfor-

mances in a learning process are no forgetting, gradually improv-

ing and eventually good. Formally, a sequence of performances ( o )

is defined as a learning process ( O ) as follows:

O = (o 1 , o 2 , . . . , o n ) , n ≥ 1 , o i ∈ { 0 , 1 } , where o i = 0 indicates an incorrect performance and o i = 1 indi-

cates a correct performance.

The original bisection partition formulates the evolution of a

learning process with two learning states: unlearned-state and

learned-state. For a learning process, such as (1, 0, 0, 0, 0, 1, 0,

1, 0, 1, 1, 1), in the beginning of the learning process, a student

is in the unlearned-state, and the student’s performances are poor.

After that, the student is in the learned-state, and the student’s

performances are good. This example is illustrated in Fig. 5 (a).

Unlike the original bisection partition, the proposed trisection

partition formulates the evolution of a learning process with three

learning states: unlearned-state, learning-state and learned-state.

For the same learning process, a student is in the unlearned-state

in the beginning of the learning process, and the student’s perfor-

mances are poor. After that, the student is in the learning-state,

and the student’s performances are improved gradually from poor

to good. In the last interval of the learning process, the student is

in the learned-state, and the student’s performances are good. This

example is illustrated in Fig. 5 (b).

3.2. Evaluation of a partition

Intuitively, the case that adjacent learning states of a partition

are similar indicates that the partition unclearly formulates the

evolution of a learning process. On the contrary, the case that ad-

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

acent learning states of a partition are distinct indicates that the

artition clearly formulates the evolution of a learning process.

herefore, in order to clearly formulate the evolution of a learning

rocess, adjacent learning states of a partition should be as distinct

s possible.

Based on the above idea, we propose a definition of a learning

tate and a measure for evaluating the distinction between differ-

nt learning states. In this way, the distinction of adjacent learning

tates of a partition can be computed. Thus, two partitions become

omparable.

As can be seen in Fig. 5 , a learning state can be characterized by

n interval of a learning process. Therefore, we formulate a learn-

ng state with an interval of a learning process. Suppose a learning

rocess is O = (o 1 , o 2 , . . . , o n ) , a learning state is an interval of O

s follows:

i : j = (o i , o i +1 , . . . , o j ) , 1 ≤ i ≤ j ≤ n.

efinition 1 (The average performance) . Let O i : j be a learning

tate in a learning process O , the average performance of O i : j is

efined as follows:

p(O i : j ) =

o i + o i +1 + . . . + o j

j − i + 1

, 1 ≤ i ≤ j ≤ n.

It is an attribute of a learning state, and its value reflects the

verage performance of the learning state. For example, ap(O i : j ) = . 6 reflects that the average performance of O i : j is 0.6.

efinition 2 (The distinction between learning states) . Let O a : b ,

c : d be two learning states. The distinction between O a : b and O c : d

re defined as follows:

[ O a : b , O c: d ] = [ ap(O a : b ) − ap(O c: d ) ] 2 .

It is a measure for evaluating the distinction between two

earning states. It evaluates the distinction by the squared differ-

nce between the respective average performances of two learning

tates. Its value reflects the degree of the distinction between two

earning states. The larger its value is, the more distinct the two

earning states are.

efinition 3 (The distinction of a partition) . Let O =(o 1 , o 2 , . . . , o n ) be a learning process, a partition divide it into learning states O 1: i 1

, O i 1 +1: i 2 , O i 2 +1: i 3

, . . . , O i m −1 +1: n . The distinc-

ion of the partition are defined as follows:

=

d [ O 1: i 1 , O i 1 + 1: i 2 ] + d [ O i 1 + 1: i 2 , O i 2 + 1: i 3 ] + . . . + d[ O i m −2 + 1: m −1 , O i m −1 + 1: n ]

m −1

It is an attribute of a partition. Its value reflects the degree of

he average distinction of adjacent learning states of a partition.

or example, for a learning process, the distinction of one partition

1 is 0.3, the distinction of another partition p2 is 0.7. It indicates

hat adjacent learning states of p2 have larger distinction and p2

s a better partition.

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 5: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 5

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

3

s

n

o

w

i

v

s

a

O

w

s

u

s

m

a

t

i

t

p

t

W

i

l

c

p

i

m

t

p

K

u

β

a

O

w

u

a

m

a

t

p

d

o

W

O

a

3

b

t

w

i

w

r

t

s

t

b

p

s

t

a

i

f

t

s

b

i

t

n

4

s

d

m

b

p

4

B

e

m

E

n

b

.3. Methods for the bisection and trisection

In the original bisection partition context, the two learning

tates are taken for granted, but the method to derive them is

ever mentioned before. In this subsection, we propose two meth-

ds that really divide a learning process into two learning states as

ell as three learning states.

Suppose that an evaluation function v maps entries of a learn-

ng process O = (o 1 , o 2 , . . . , o n ) to their subscripts, that is

(o i ) = i, 1 ≤ i ≤ n.

In the context of the original bisection partition, two learning

tates divided from a learning process can be represented by using

threshold 1 ≤γ < n as follow:

O 1: γ =

{x ∈ O | 1 ≤ v (x ) ≤ γ } = { o 1 , o 2 , . . . , o γ

},

(γ +1): n =

{x ∈ O | γ + 1 ≤ v (x ) ≤ n } = { o γ +1 , o γ +2 , . . . , o n

}, (3)

here O 1: γ and O γ +1 : n divide a learning process into a beginning

tate and an end state. The beginning state O 1: γ is the original

nlearned-state, and the end state O γ +1 : n is the original learned-

tate.

According to Definition 1 , the corresponding average perfor-

ances of Eq. (3) is represented as follows:

ap(O 1: γ ) =

o 1 + o 2 + . . . + o γ

γ,

p(O (γ +1): n ) =

o γ +1 + o γ +2 + . . . + o n

n − γ. (4)

As can be seen in (3) , since 1 ≤γ < n , there exist n − 1 methods

o bisect a learning process. On the other side, it is worth not-

ng that the optimal bisection partition has the maximum distinc-

ion among all possible bisection partitions. Therefore, a learning

rocess can be divided into two learning states by optimizing the

hreshold γ as follows:

arg max γ

d [O 1: γ , O (γ +1): n

]s.t. 1 ≤ γ < n. (5)

ith the optimal γ in Eq. (5) , the two learning states O 1: γ , O γ +1 : n

n Eq. (3) are solved, and so are ap ( O 1: γ ), ap(O γ +1 : n ) in Eq. (4) .

In the context of the proposed trisection partition, we divide a

earning process into three learning states by following the prin-

iple of three-way decisions. In the unlearned-state, a student’s

erformances are poor and relatively stable. It suggests that a KC

s unlikely mastered. In the learning-state, the student’s perfor-

ances are improved gradually from poor to good. It suggests that

he KC is probably mastered. In the learned-state, the student’s

erformances are good and relatively stable. It suggests that the

C is certainly learned. Such a trisection of a learning process nat-

rally reflects the beginning, middle, and end of learning.

Inspired by Eq. (2) , given a pair of thresholds α and β with

< α, we can divide a learning process into three learning states

s follows:

O 1: β = { x ∈ O | 1 ≤ v (x ) ≤ β} =

{o 1 , o 2 , . . . , o β

},

(β+1): α = { x ∈ O | β + 1 ≤ v (x ) ≤ α} =

{o β+1 , o β+2 , . . . , o α

},

O (α+1): n = { x ∈ O | α + 1 ≤ v (x ) ≤ n } = { o α+1 , o α+2 , . . . , o n } , (6)

here 1 ≤β < α < n . Obviously, the beginning state O 1: β is the

nlearned-state, the transitional state O β+1 : α is the learning-state,

nd the end state O γ +1 : n is the learned-state.

According to Definition 1 , the corresponding average perfor-

ances of Eq. (6) is represented as follows:

ap(O 1: β ) =

o 1 + o 2 + . . . + o β

β,

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

p(O (β+1): α) =

o β+1 + o β+2 + . . . + o α

α − β,

ap(O (α+1): n ) =

o α+1 + o α+2 + . . . + o n

n − α. (7)

Similar to the bisection partition, since 1 ≤β < α < n , there exist n −2 i =1 i ∗ (n − i − 1) methods to trisect a learning process. The op-

imal trisection partition has the maximum distinction among all

ossible trisection partitions. Therefore, a learning process can be

ivided into three learning states by optimizing the pair of thresh-

lds α, β as follows:

arg max α,β

d [O 1: β, O (β+1): α

]+ d

[O (β+1): α, O (α+1): n

]2

s.t. 1 ≤ β < α < n. (8)

ith the optimal α, β in Eq. (8) , the three learning states O 1: β ,

(β+1): α, O (α+1): n in Eq. (6) are solved, and so are ap ( O 1: β ),

p(O (β+1): α) , ap(O (α+1): n ) in Eq. (7) .

.4. Comparisons between bisection and trisection

In this subsection, the contrast tests compare the distinctions

etween the bisection partition with the trisection partition. The

ests are conducted by using artificial data here. The real data tests

ill be conducted in Section 5 . In order to simulate various learn-

ng processes, four length types of artificial learning processes,

hose lengths are respectively 10 0, 50 0, 10 0 0, 20 0 0, are generated

andomly. In order to balance the accuracy and time-consuming of

he experiment, each length type of the learning process has 10 0 0

amples.

The distinctions of the trisection partition are plotted as a func-

ion of each artificial learning process, and the distinctions of the

isection partition are plotted as a function of the same learning

rocess, illustrated in Fig. 6 . As can be seen from Fig. 6 , the tri-

ection partition offers a larger distinction than the bisection par-

ition in all types of learning processes. It is because that the aver-

ge performance of the unlearned-state in the trisection partition

s smaller than that in the bisection partition, and the average per-

ormance of the learned-state in the trisection partition is larger

han that in the bisection partition.

Based on the results of this section, we conclude that the tri-

ection partition characterizes the evolution of a learning process

etter than the bisection partition. Therefore, we improve the orig-

nal two learning states BKT model into a new BKT model with

hree learning states. In next section, the specific structure of the

ew model is proposed.

. A three learning states BKT model

This section proposes a new BKT model with three learning

tates. The probabilities of the three learning states at time t are

erived by the forward and backward probabilities of HMMs. A

ost likely sequence of learning states from time 1 to t is derived

y the inference algorithm of HMMs. The parameter λ of the pro-

osed model is derived by the learning algorithm of HMMs.

.1. The topology of the proposed model

Same as the original BKT model, the instantiation of the TLS-

KT model has two observations: correct and incorrect. Differ-

nt from the original BKT model, the instantiation of the TLS-BKT

odel has three hidden states: unlearned, learning and learned.

ach hidden state has two transition probabilities to itself and the

ext hidden state, and each hidden state has two emission proba-

ilities to the two observations.

The TLS-BKT model is illustrated in Fig. 7 .

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 6: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

6 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

0 250 500 750 10000.02

0.04

0.06

0.08

The

dis

tinct

ion

The Length of learning processes: 100

TrisectionBisection

0 250 500 750 10000

0.005

0.01

0.015

0.02The Length of learning processes: 500

TrisectionBisection

0 250 500 750 1000The number of learning processes

0

0.005

0.01

0.015

0.02

The

dis

tinct

ion

The Length of learning processes: 1000

TrisectionBisection

0 250 500 750 1000The number of learning processes

0

0.002

0.004

0.006The Length of learning processes: 2000

TrisectionBisection

Fig. 6. The distinction comparisons between bisection and trisection.

Fig. 7. The TLS-BKT model.

V

A

w

t

B

w

l

w

B

B

4

t

g

α

B

Fig. 7 (a) is the graphical representation of the TLS-BKT model.

It is same with Fig. 3 (a), and the same parameters in the two fig-

ures have the same meaning. Fig. 7 (b) is the instantiation of the

TLS-BKT model. The gray circles, labeled by u, e and l , represent

the unlearned-state, learning-state and learned-state. The rounded

rectangles, labeled by c and i , represent the correct and incor-

rect performances. p uu and p ue denote the transition probability

from unlearned-state to itself and learning-state. p ee and p el denote

the transition probability from learning-state to itself and learned-

state. p ll denotes the transition probability from learned-state to

itself. p uc and p ui denote the emission probability from unlearned-

state to correct and incorrect performances, respectively. p ec and

p ei denote the emission probability from learning-state to correct

and incorrect performances, respectively. p lc and p li denote the

emission probability from learned-state to correct and incorrect

performances, respectively.

In the TLS-BKT model, the learning state of a KC takes on three

values, denoted as follows:

Q = { q 1 = u, q 2 = e, q 3 = l} , where u, e and l represent unlearned-state, learning-state, and

learned-state, respectively. The performance of a KC takes on two

values, denoted as follows:

= { v 1 = c, v 2 = i } , where c and i represent the correct performance and the incor-

rect performance, respectively. Let O = (o 1 , o 2 , . . . , o T ) be a learn-

ing process from time 1 to T , and KC = (kc 1 , kc 2 , . . . , kc T ) be a se-

quence of the learning states at the corresponding time.

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

The transition matrix A of the proposed model is defined by

= [ a i j ] 3 ×3 =

[

p uu p ue 0

0 p ee p el

0 0 p ll

]

,

here a i j = p(kc t+1 = q j | kc t = q i ) , i, j ∈ { 1 , 2 , 3 } . The emission ma-

rix B of the proposed model is defined by

= [ b j (k )] 3 ×2 =

[

p uc p ui

p ec p ei

p lc p li

]

,

here b j (k ) = p(o t = v k | kc t = q j ) , k ∈ { 1 , 2 } , j ∈ { 1 , 2 , 3 } . The initial

earning state probability vector is

= [ π(i )] 3 ,

here π(i ) = p(x 1 = q i ) , i ∈ { 1 , 2 , 3 } . The parameters of the TLS-

KT model, such as the transition matrix A , the emission matrix

and �, are denoted as λ = (A, B, �) .

.2. The probabilities of the three learning states at time t

Let (o 1 , o 2 , . . . , o t ) be part of the learning process from time 1

o t . Given the parameter λ, the forward probability of kc t = q i is

iven as follows:

t (i ) = P ( o 1 , o 2 , . . . , o t , kc t = q i | λ) . (9)

ased on Eq. (9) , given part of the learning process

(o 1 , o 2 , . . . , o t+1 ) , the probability of kc t+1 = q i is given as fol-

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 7: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 7

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

0 500 1000 1500 2000The number of learning processes

00.10.20.30.40.50.60.7

The

dis

tinct

ion

Date set: 2004

TrisectionBisection

0 200 400 600 800The number of learning processes

0

0.1

0.2

0.3

0.4

The

dis

tinct

ion

Date set: 2005

TrisectionBisection

0 500 1000 1500 2000The number of learning processes

0

0.1

0.2

0.3

0.4

The

dis

tinct

ion

Date set: 2006

TrisectionBisection

Fig. 8. The distinction comparisons between trisection and bisection in 2004 - 2006.

0 20 40 60 80 100The number of experiments

0.6

0.8

All-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

The

acc

urac

y

D-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

G-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

M-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0.6

0.8

The

acc

urac

y

N-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0.6

0.8

P-KC

TLS-BKTOriginal BKT

Fig. 9. The accuracy comparisons in 2004.

Table 1

The AUC comparisons in 20 04–20 06.

Year Model D-KC G-KC M-KC N-KC P-KC All-KC

2004 Original BKT 0.6349 0.5876 0.6031 0.6477 0.6239 0.6123

TLS-BKT 0.8036 0.7603 0.7505 0.8153 0.8367 0.8056

2005 Original BKT 0.6339 0.5732 0.5900 0.6649 0.6852 0.6338

TLS-BKT 0.7504 0.7055 0.6790 0.7609 0.7471 0.7547

2006 Original BKT 0.6378 0.6109 0.6100 0.6663 0.6755 0.6491

TLS-BKT 0.7898 0.7653 0.7510 0.8394 0.8222 0.7996

l

α

w

s

p

p

β

B

β

w

r

s

P

T

l

P

T

c

P

ows:

t+1 (i ) =

[

N ∑

j=1

αt ( j) a ji

]

b i (o t+1 ) ,

here a ji and b i (o t+1 ) are the corresponding entries in A and B , re-

pectively. Similarly, let (o t+2 , o t+3 , . . . , o T ) be part of the learning

rocess from time t + 2 to T . Given the parameter λ, the backward

robability of kc t+1 = q i is as follows:

t+1 (i ) = P ( o t+2 , o t+2 , . . . , o T , kc t+1 = q i | λ) . (10)

ased on Eq. (10) , given part of the learning process

(o t+1 , o t+2 , . . . , o T ) , the probability of kc t = q i is given by

t (i ) =

N ∑

j=1

a i j b j (o t+1 ) βt+1 ( j) ,

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

here a ji and b i (o t+1 ) are the corresponding entries in A and B ,

espectively.

Given the learning process O , the probability of the learning

tate at time t is given as follows:

( kc t = q i | O, λ) =

P ( kc t = q i , O | λ)

P (O | λ) . (11)

his equation can be derived according to Eqs. (9) and (10) as fol-

ows:

( kc t = q i , O | λ) = αt (i ) βt (i ) .

he probability of the learning state at time t , given by Eq. (11) ,

an be represented as:

( kc t = q i | O, λ) =

αt (i ) βt (i ) ∑ N j=1 αt ( j) βt ( j)

. (12)

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 8: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

8 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

0 20 40 60 80 100The number of experiments

0.6

0.8

All-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8T

he a

ccur

acy

D-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

G-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

M-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0.6

0.8

The

acc

urac

y

N-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0.6

0.8

P-KC

TLS-BKTOriginal BKT

Fig. 10. The accuracy comparisons in 2005.

0 20 40 60 80 100The number of experiments

0.6

0.8

All-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

The

acc

urac

y

D-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

G-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000.6

0.8

M-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0.6

0.8

The

acc

urac

y

N-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0.6

0.8

P-KC

TLS-BKTOriginal BKT

Fig. 11. The accuracy comparisons in 2006.

s

A

l

K

The probabilities of kc t = q i , i ∈ { 1 , 2 , 3 } can be solved given a

learning process O and λ.

4.3. The prediction of the most likely sequence of learning states

Besides the probabilities of the three learning states at time t

in Eq. (12) , the most likely sequence of learning states from time 1

to T can be estimated by the inference algorithm of HMMs given a

learning process O from time 1 to T .

Given the learning process O = (o 1 , o 2 , . . . , o T ) and λ, the infer-

ence procedure searches for the most likely sequence of learning

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

tates KC , which can be computed as follows [6] :

KC = arg max KC

P ( KC| O, λ)

= arg max KC

T

P (O | KC, λ) P ( KC| λ) . (13)

ccording to Eq. (13) , the estimate of a most likely sequence of the

earning states from time 1 to T can be computed as follows:

C = arg max KC

P ( KC| O, λ)

= arg max KC

T

P (O | KC, λ) P ( KC| λ)

= arg max kc 1 ,kc 2 , ... ,kc T

T ∏

t=1

p(o t | kc t , λ) p(kc t | λ) , (14)

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 9: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 9

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

0 500 1000 1500 2000The number of students

0.4

0.45

0.5

0.55

0.6All-KC

TLS-BKTOriginal BKT

0 150 300 450 6000.4

0.45

0.5

0.55

0.6

RM

SE

D-KC

TLS-BKTOriginal BKT

0 50 100 1500.4

0.45

0.5

0.55

0.6

0.65

G-KC

TLS-BKTOriginal BKT

0 10 20 30 400.2

0.3

0.4

0.5

0.6M-KC

TLS-BKTOriginal BKT

0 150 300 450 600The number of students

0.45

0.5

0.55

0.6

RM

SE

N-KC

TLS-BKTOriginal BKT

0 150 300 450 600The number of students

0.4

0.45

0.5

0.55

0.6P-KC

TLS-BKTOriginal BKT

Fig. 12. The RMSE comparisons in 2004.

0 250 500 750 1000The number of students

0.4

0.45

0.5

0.55

0.6All-KC

TLS-BKTOriginal BKT

0 50 100 1500.45

0.5

0.55

0.6

RM

SE

D-KC

TLS-BKTOriginal BKT

0 2 4 60.4

0.45

0.5

0.55

0.6G-KC

TLS-BKTOriginal BKT

0 10 20 30 40

0.4

0.45

0.5

0.55M-KC

TLS-BKTOriginal BKT

0 50 100 150 200The number of students

0.4

0.45

0.5

0.55

0.6

RM

SE

N-KC

TLS-BKTOriginal BKT

0 150 300 450 600The number of students

0.4

0.45

0.5

0.55

0.6P-KC

TLS-BKTOriginal BKT

Fig. 13. The RMSE comparisons in 2005.

w

d

m

s

4

s

f

b

c

e

λ

w

d

i

L

P

here kc t ∈ Q , and O t ∈ P . p(kc t | λ) = a kc t−1 kc t is the conditional

istribution of the learning state kc t given the current λ.

p(o t | kc t , λ) = b kc t o t is the conditional distribution of the perfor-

ance o t given the hidden state kc t and current λ. A most likely

equence of the learning states can be solved by Eq. (14) .

.4. Solution of the parameters

Prior to the estimate of the probabilities of the three learning

tates at time t or a most likely sequence of the learning states

rom time 1 to T , the parameter λ of the proposed model need to

e solved by the learning algorithm of HMMs.

Given a learning process O = (o 1 , o 2 , . . . , o T ) , the learning pro-

edure of HMMs looks for the optimal λ to satisfy the following

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

quation [6] :

= arg max λ

E KC| O, ̄λ log L (λ; O, KC)

= arg max λ

KC

P (KC| O, ̄λ) log L (λ; O, KC) , (15)

here λ̄ is the current estimate of λ, P (KC| O, ̄λ) is the conditional

istribution of the sequence of learning states KC given the learn-

ng process (o 1 , o 2 , . . . , o t ) under the current estimate of the λ, and

( λ; O, KC ) is the likelihood function, which is equal to P ( O, KC | λ).

Since

(KC| O, ̄λ) =

P (KC, O | ̄λ)

P (O | ̄λ) ,

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 10: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

10 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

0 500 1000 1500 2000The number of students

0.45

0.5

0.55

0.6

0.65All-KC

TLS-BKTOriginal BKT

0 100 200 3000.45

0.5

0.55

0.6

0.65

RM

SE

D-KC

TLS-BKTOriginal BKT

0 20 40 600.4

0.45

0.5

0.55

0.6G-KC

TLS-BKTOriginal BKT

0 10 20 30 400.4

0.45

0.5

0.55

0.6M-KC

TLS-BKTOriginal BKT

0 200 400 600 800The number of students

0.45

0.5

0.55

0.6

RM

SE

N-KC

TLS-BKTOriginal BKT

0 200 400 600 800The number of students

0.45

0.5

0.55

0.6P-KC

TLS-BKTOriginal BKT

Fig. 14. The RMSE comparisons in 2006.

0 20 40 60 80 100The number of experiments

0

1

2

3

4

5

All-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4

The

sta

ndar

d de

viat

ion D-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3G-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3M-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0

1

2

3

4

5

The

sta

ndar

d de

viat

ion N-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0

1

2

3

4

5

P-KC

TLS-BKTOriginal BKT

Fig. 15. The SD comparisons in 2004.

a

l

t

t

l

t

t

B

u

g

t

i

t

where P (O | ̄λ) is a constant. Eq. (15) is equivalent to:

λ = arg max λ

KC

P (O, KC| ̄λ) log P (O, KC| λ)

= arg max λ

kc i ∈ Q,o t ∈ V P (o 1 , . . . , o T , kc 1 , . . . , kc T | ̄λ

)log P ( o 1 , . . . , o T , kc 1 , . . . , kc T | λ) . (16)

The parameter λ can be solved by Eq. (16) .

4.5. Discussions

First, the original BKT model and its extensions intuitively re-

gard learning states as unlearned-state and learned-state without

a clear formulation. The proposed learning state is formulated by

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

n interval of a learning process. Based on it, the evolution of a

earning process can be characterized accurately.

Second, based on the proposed learning state formulation, the

risection partition is proposed by maximizing the average dis-

inction of the adjacent learning states. Following that, it achieves

arger distinction than the bisection partition. It indicates that the

risection partition refines the learning states more exactly than

he bisection partition.

Third, compared to the original BKT model, the proposed TLS-

KT model increases the new learning-state. Correspondingly, this

pdates the parameters as well as the inference and prediction al-

orithms of the model. The parameters consist of the increasing

ransition and emission parameters of the new learning-state. The

nference procedure of the proposed model takes the incremen-

al information to derive the probabilities of all the three learn-

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 11: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 11

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

0 20 40 60 80 100The number of experiments

0

1

2

3

4All-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4

The

sta

ndar

d de

viat

ion D-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4G-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4M-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0

1

2

3

4

The

sta

ndar

d de

viat

ion N-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0

1

2

3

4P-KC

TLS-BKTOriginal BKT

Fig. 16. The SD comparisons between in 2005.

0 20 40 60 80 100The number of experiments

0

1

2

3

4All-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4

The

sta

ndar

d de

viat

ion D-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4G-KC

TLS-BKTOriginal BKT

0 20 40 60 80 1000

1

2

3

4M-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0

1

2

3

4

5

The

sta

ndar

d de

viat

ion N-KC

TLS-BKTOriginal BKT

0 20 40 60 80 100The number of experiments

0

1

2

3

4

5

P-KC

TLS-BKTOriginal BKT

Fig. 17. The SD comparisons between in 2006.

i

m

s

5

u

p

s

i

m

f

t

a

f

B

1

t

5

i

s

M

2

b

A

N

ng states at time t , and the prediction procedure of the proposed

odel takes the incremental information to predict the most likely

equence of learning states.

. Experiments

In this section, a series of experiments were conducted to eval-

ate the effectiveness and efficiency of the TLS-BKT model by com-

aring with the original BKT model. The distinctions of the tri-

ection and bisection partition were compared. Following compar-

sons of prediction accuracy and area under the curve (AUC), root

ean square error (RMSE) and standard deviation (SD) were used

or the purposes of demonstrating the proposed model from sta-

istical perspectives. Except for the comparisons on the distinction,

ll other experiments were carried out by 10-fold cross-validation

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

or validating the proposed model. We implemented the original

KT model and the TLS-BKT model on a MAC with macOS Sierra

0.12.6, Inter Core i5 CUP@ 2.70 GHz and 8.0 GB of memory, and

he software platform is MATLAB R2016a.

.1. Data sets

The data sets involved are WPI-Assistments [9] from DataShop,

ncluding the Assistments Math 20 04–20 05 (912 students), the As-

istments Math 20 05–20 06 (3136 students) and the Assistments

ath 20 06–20 07 (5046 students), which are denoted as 20 04,

0 05 and 20 06, respectively. The experiments were conducted

ased on five KCs in the data sets. The five KCs are ‘D-Data-

nalysis-St atistics-Probability’, ‘G-Geometry’, ‘M-Measurement’, ‘N-

umber-Sense-Operations’ and ‘P-Patterns-Relations-Algebra’, de-

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 12: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

12 K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

t

l

a

m

a

t

s

6

i

w

p

p

p

t

a

p

A

S

a

M

2

[

c

R

noted as ‘D-KC’, ‘G-KC’, ‘M-KC’, ‘N-KC’ and ‘P-KC’, respectively. In

our experiments, a learning process which contains less than 100

performances is excluded in order to estimate probability accu-

rately.

5.2. Comparisons of the distinction

For each learning process, the distinctions of the trisection and

bisection partition are shown in Fig. 8 . The trisection partition ob-

tains a larger distinction than the bisection partition on each real

learning process. Besides showing superior in the artificial data

sets in Fig. 6 , the trisection partition offers a better distinction in

the real data sets.

5.3. Comparisons of student performance prediction

To compare the prediction accuracies between the TLS-BKT

model and the original BKT model, the accuracies of each of the

five KCs were computed separately. Furthermore, the accuracies of

all the five KCs combination were evaluated. The accuracies were

averaged across 1 to 100 experiments in order to display the stable

prediction performance of the two models.

The accuracies of the TLS-BKT and original BKT model are de-

picted in Figs. 9–11 . The proposed model is superior to the orig-

inal BKT model in predicting the single KC as well as the combi-

nation KCs. This is because that the TLS-BKT model has one more

learning-state than the original-BKT model. When a student maybe

learn a KC, the learning state of the student will be transited into

the learning-state instead of the unlearned-state or learned-state.

The prediction accuracies can also be illustrated in AUC respec-

tively, as can be seen in the Table 1 . The AUC maximum of each KC

is annotated in bold. The proposed model achieves the maximum

in the single KC and the combination KCs, which is consistent with

Figs. 9–11 . Interestingly, the AUC values of 2005 are overall lower

than that of 2004 and 2006. Furthermore, it keeps the trend that

the AUC values of the proposed model are larger than the origi-

nal BKT model. This observation is related to the number of KCs.

2005 has much less data than 2004 and 2006, and the two models

cannot be trained sufficiently and lead to higher deviations.

5.4. Comparisons of statistical measures

Besides the comparisons of the prediction between the two

models, comparisons of statistical measures offer the evaluation in

an alternative perspective. In this section, RMSE and SD are em-

ployed to assess the proposed model and the original BKT model.

The RMSE on each of the five KCs was computed separately.

Furthermore, the RMSE on all the five KCs combination was evalu-

ated. The RMSE was averaged across 1 to 100 experiments in order

to display the stable performance of the two models. The RMSE

was computed for each student in order to show the detailed per-

formance of the two models.

The RMSE comparisons between the two models are shown in

Figs. 12–14 .

The results show the RMSE of the proposed model prediction is

lower than that of the original BKT model in the single KC as well

as the combination KCs. It suggests that the TLS-BKT model of-

fers closer predictions than the original BKT model. This is because

that the transitional state of the TLS-BKT model offers the better

prediction, which makes the prediction of the proposed model is

closer to the real situation.

The SD on each of the five KCs was computed separately. Fur-

thermore, the SD on all the five KCs combination was evaluated.

The SD was averaged across 1 to 100 experiments in order to dis-

play the stable performance of the two models.

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

Figs. 15–17 depict the standard deviation comparisons between

he proposed model and the original BKT model.

The results show the SD of the proposed model prediction is

ower than that of the original BKT model in the single KC as well

s the combination KCs. It suggests that the TLS-BKT model offers

ore stable predictions than the original BKT model. The reason is

lso related to the learning-state of the TLS-BKT model. The predic-

ion results of the proposed model are coordinated by the learning-

tate.

. Conclusions

In this paper, we divided the learning process into three learn-

ng states by the evaluation function based on ideas from three-

ay decisions. Based on the three learning states, we have pro-

osed the new TLS-BKT model. The results of the comparative ex-

eriments demonstrate clearly that the proposed model improves

rediction accuracies and shows superior robustness on the statis-

ical measures than the original BKT model. Such improvement is

scribed to the proposed three learning states which offer a more

recise formulation than the original two learning states.

cknowledgment

This research was partially supported by the program of China

cholarship Council (CSC) under the Grant No. 201606775044 , and

Discovery Grant from NSERC, Canada. We used the Assistments

ath 20 04–20 05 (912 Students), 20 05–20 06 (3136 Students),

0 06–20 07 (5046 Students) and dataset accessed via DataShop

[9] ]. We would like to thank the anonymous reviewers for their

onstructive advice.

eferences

[1] R.S. Baker , A.T. Corbett , V. Aleven , More accurate student modeling through

contextual estimation of slip and guess probabilities in Bayesian knowledgetracing, in: Proceedings of ITS’08, 5091, Springer, 2008, pp. 406–415 .

[2] L.E. Baum , T. Petrie , Statistical inference for probabilistic functions of finitestate Markov chains, Ann. Math. Stat. 37 (6) (1966) 1554–1563 .

[3] J. Chen , Y.P. Zhang , S. Zhao , Multi-granular mining for boundary regions inthree-way decision theory, Knowl.-Based Syst. 91 (2016) 287–292 .

[4] A.T. Corbett , J.R. Anderson , Knowledge tracing: modeling the acquisition of

procedural knowledge, User Model User-adapt. Interact. 4 (4) (1995) 253–278 . [5] N. Cowan , The magical number 4 in short-term memory: a reconsideration of

mental storage capacity, Behav. Brain. Sci. 24 (1) (2001) 87–114 . [6] Z. Ghahramani , An introduction to hidden Markov models and Bayesian net-

works, Int. J. Pattern Recognit. Artif. Intell. 15 (01) (2001) 9–42 . [7] R.W. Keidel , Strategy made simple: thinking in threes, Bus. Horiz. 56 (1) (2013)

105–111 .

[8] K. Koedinger , A.T. Corbett , C. Perfetti , The knowledge-learning-instructionframework: bridging the science-Practice chasm to enhance robust student

learning, Cogn. Sci. 36 (5) (2012) 757–798 . [9] K.R. Koedinger , R.S. Baker , K. Cunningham , A. Skogsholm , B. Leber , J. Stamper ,

A data repository for the EDM community: The PSLC datashop, CRC Press, BocaRaton, FL, 2010 .

[10] G.M. Lang , D.Q. Miao , M.J. Cai , Three-way decision approaches to conflict anal-

ysis using decision-theoretic rough set theory, Inf. Sci. 406 (2017) 185–207 . [11] H.X. Li , L.B. Zhang , B. Huang , X.Z. Zhou , Sequential three-way decision and

granulation for cost-sensitive face recognition, Knowl.-Based Syst. 91 (2016)241–251 .

[12] W.W. Li , Z.Q. Huang , Q. Li , Three-way decisions based software defect predic-tion, Knowl.-Based Syst. 91 (2016) 263–274 .

[13] D.C. Liang , W. Pedrycz , D. Liu , Determining three-way decisions with deci-

sion-theoretic rough sets using a relative value approach, IEEE Trans. Syst.,Man, Cybern. A, Syst. Hum. 47 (8) (2016) 1785–1799 .

[14] C. Lin , M. Chi , Intervention-BKT: incorporating instructional interventions intoBayesian knowledge tracing, in: Proceedings of ITS’16, 9684, Springer, 2016,

pp. 208–218 . [15] D. Liu , D.C. Liang , C.C. Wang , A novel three-way decision model based on in-

complete information system, Knowl.-Based Syst. 91 (2016) 32–45 . [16] G.A. Miller , The magical number seven, plus or minus two: some limits on our

capacity for processing information., Psychol. Rev. 63 (2) (1956) 81–97 .

[17] M. Nauman , N. Azam , J.T. Yao , A three-way decision making approach to mal-ware analysis using probabilistic rough sets, Inf. Sci. 374 (2016) 193–209 .

[18] Z.A. Pardos , N.T. Heffernan , Modeling individualization in a Bayesian networksimplementation of knowledge tracing, in: Proceedings of UMAP’10, 6075,

Springer, 2010, pp. 255–266 .

sian knowledge tracing model, Knowledge-Based Systems (2018),

Page 13: ARTICLE IN PRESSstatic.tongtianta.site/paper_pdf/140a8146-a8f4-11e9-b323-00163e08… · Available online xxx Keywords: Bayesian proposedknowledge overtracing original Three-way decisions

K. Zhang, Y. Yao / Knowledge-Based Systems 0 0 0 (2018) 1–13 13

ARTICLE IN PRESS

JID: KNOSYS [m5G; March 4, 2018;8:59 ]

[

[

[

[

[

[

[

[

[

[

[19] Z.A. Pardos , N.T. Heffernan , KT-IDEM: introducing item difficulty to theknowledge tracing model, in: Proceedings of UMAP’11, 6787, Springer, 2011,

pp. 243–254 . 20] Z. Pawlak , Rough sets, Int. J. Parallel. Program. 11 (5) (1982) 341–356 .

[21] J.J. Qi , T. Qian , L. Wei , The connections between three-way and classical con-cept lattices., Knowl.-Based Syst. 91 (1) (2016) 143–151 .

22] A. Savchenko , Fast multi-class recognition of piecewise regular objects basedon sequential three-way decisions and granular computing, Knowl.-Based Syst.

91 (2016) 252–262 .

23] Z. Wang , J.L. Zhu , X. Li , Z.T. Hu , M. Zhang , Structured knowledge tracing mod-els for student assessment on coursera, in: Proceedings of L@S’16, ACM, 2016,

pp. 209–212 . 24] Y.Y. Yao , Three-way decision: an interpretation of rules in rough set theory, in:

Proceedings of RSKT’09, Springer, 2009, pp. 642–649 . 25] Y.Y. Yao , Three-way decisions with probabilistic rough sets, Inf. Sci. 180 (3)

(2010) 341–353 .

Please cite this article as: K. Zhang, Y. Yao, A three learning states Baye

https://doi.org/10.1016/j.knosys.2018.03.001

26] Y.Y. Yao , The superiority of three-way decisions in probabilistic rough set mod-els, Inf. Sci. 181 (6) (2011) 1080–1096 .

[27] Y.Y. Yao , An outline of a theory of three-way decisions, in: Proceedings ofRSCTC’12, 7413, Springer, 2012, pp. 1–17 .

28] Y.Y. Yao , Three-way decisions and cognitive computing, Cognit. Comput. 8 (4)(2016) 543–554 .

29] H. Yu , C. Zhang , G.Y. Wang , A tree-based incremental overlapping clusteringmethod using the three-way decision theory, Knowl.-Based Syst. 91 (2016)

189–203 .

30] H.R. Zhang , F. Min , Three-way recommender systems based on random forests,Knowl.-Based Syst. 91 (2016) 275–286 .

[31] Y. Zhang , J.T. Yao , Gini objective functions for three-way classifications, Int. J.Approx. Reason. 81 (2017) 103–114 .

32] B. Zhou , Y.Y. Yao , J.G. Luo , Cost-sensitive three-way email spam filtering, J. In-tell. Inf. Syst. 42 (1) (2014) 19–45 .

sian knowledge tracing model, Knowledge-Based Systems (2018),