66
Predicting Student Performance Classification Recommendation System Conclusion Personalized Recommendation System for Students by Grade Prediction Pavan Kotha 113050010 Master Thesis Under the guidance of Prof. D.B PHATAK Computer Science Department, IIT Bombay 1 / 66

Personalized Recommendation System for Students …...Pavan Kotha 113050010 Master Thesis Under the guidance of Prof. D.B PHATAK Computer Science Department, IIT Bombay 1/66 Predicting

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Personalized Recommendation System for Students by GradePrediction

Pavan Kotha

113050010

Master Thesis

Under the guidance ofProf. D.B PHATAK

Computer Science Department,IIT Bombay

1 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Outline

Motivation

Problem Statement

Prediction of Endsem Marks and Grades

Theoretical Model for personalized recommendation system

Conclusion and Future Work

2 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Motivation

Different learners - diff modes of learning,educationalbackgrounds,previous experience,intelligence

Different people process knowledge in different ways.So,different peoplewill relate to a particular learning resource in different ways.

Human instructors can learn which style of presentation suits which learnerand adjust their mode of presentation.

Different learners need to focus on different material to achieve the samelearning objective.

Need to classify into different Learner types(Achiever,Struggler etc)

3 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Problem Statement

Proposed learning system:Gives learning support based on predicted gradeof student.

Different learning proposals provided to students in feedback according tostudent target class.

4 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Need for Prediction

ML in education- to study the data available in the educational field, bringout the hidden knowledge from it

Instructors can identify weak students, help them to score better marks inendsem by giving additional practise questions.

Appropriate measures - taken to improve their performance in endsem i.eprovide customized references so that he can read those materials andimprove his grade.

Eg:Additional customized practise questions, different text books fordifferent learners etc

5 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Learning from Data

ML branch of AI - deals with study of systems that can learn from data.

ML focuses on prediction based on known properties learned from thetraining data.Eg:Email (spam,non spam)

So learn from one offering of CS101 course and predict marks and gradesof another offering

6 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Training Data

Statistical data is represented in terms of tuples

Data consists of many attributes

Training data D = {(x1, y1), .., (xN , yN)}

Target Attribute to be predicted for test data

Training goal: To learn function f (x) → y s.t prediction error on unseeninstances is small

Learner extracts about the distribution.

7 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Proposed System

Raw Data Pre−Processing between attributesfrom train data usinglinear approximation

Predict EndsemMarks of testdata

Train Data

ClassificationAlgorithms

TestData

GeneratedModel

ModelGenerated

Predicts Gradesof test data of classifier

Find accuracy

Capture relationship

Figure: Schematic diagram of system

8 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Predicting Endsem Marks

Attributes taken: assignment, quiz, midsem. Exclude project.

Don’t have any information about the distribution of data

We can find mean,variance and correlation between attributes from thegiven training data

Relationship between the attributes.eg: Midsem good marks, mostlyendsem also good marks.

Need to capture such a relationship from training data

9 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Linear Model in Single Variable

Linear function: Y = a+ bX which can best predict the target variable

Y - endsem marks, X assignment or quiz or midsem marks.

How to find parameters a,b from training data?

To obtain the best linear function choose ’a’ and ’b’ s.t prediction error isminimised.

Optimization Objective: Minimise E [(Y − (a+ bX ))2]

a = E [Y ]− bE [X ], b = ρσy

σx

10 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Linear Model in Single Variable(Example)

4

5

6

7

8

9

10

1 2 3 4 5 0x

y

Figure: Best line fit for data points

Four (x , y) data points: (1, 6), (2, 5), (3, 7) and (4, 10)

11 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Linear Model in Single Variable(Example) contd.

We need to find a line y = a+ bx that best fits these four points(trainingdata).

Best approximation: minimize the sum of squared differences(i.e error)between the predicted data values and their corresponding actual values.

Error functionS(a,b) = [6− (a+b)]2 +[5− (a+2b)]2 +[7− (a+3b)]2 +[10− (a+4b)]2

Equate partial derivatives of S(a,b) with respect to a, b to zero. We geta = 3.5, b = 1.4

Best fit line is y = 3.5 + 1.4x

12 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Linear Model in Single Variable(Example) contd.

endsem=f (midsem)

endsem=f (assignment)

endsem=f (quiz)

Taken average of three endsem marks

13 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Linear Model in Multiple Variables

endsem=f (quiz , assignment,midsem)

Mathematically∑3

j=1(Xijbj + b0) = yi , (i = 1, 2, . . . ,m)

In matrix form Xb = y where X =

1 X12 X13 X14

1 X22 X23 X24

......

...1 Xm2 Xm3 Xm4

b =

b0b1b2b3

y =

y1y2...ym

Overdetermined system with m linear equations and 4unknowns(b0, b1, b2, b3).

14 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Linear Model in Multiple Variable contd.

Goal: Find the coefficients b which fit the equations best in the sense ofminimising mean square error(=E [(actualvalue − predictedvalue)2]).

b̂ = argminb

S(b), , b̂ refers to optimal value of b where the objective

function S is given by S(b) =∑m

i=1

∣yi −∑3

j=1(Xijbj + b0)∣

2=

∥y − Xb∥

2.

Differentiating results in b̂ = (XTX)−1XTy

endsemmarks = b0 + b1assignment + b2quiz + b3midsem

15 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Need for PredictionRegression

Root mean square error

Endsem Prediction Root Mean Square Error

Linear Model in Single Variable 5. 09

Linear Model in Multiple Variable 5. 1

Table: Root Mean Square Error

Good value for the RMS depends on data we are trying to model orestimate.

Analysis result: 5% to 20% error is good estimate.

Our error is 5%. So we have achieved good estimates of endsem marksusing our model.

16 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Classification

Classification -type of prediction where predicted variable is binary orcategorical.

Classification methods-decision trees,nearest neighbour,LDA,SVM -applied on educational data for predicting grade in examination

Classifier- mathematical function implemented by classification algorithmthat maps input data to a category.

No single classifier works best on all given problems

Determining a suitable classifier for a given problem is more an art than ascience

Type of classifer to be chosen depends on problem we are trying to solve.

17 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Dataset

Figure: Training Dataset for playing tennis

[Tom Mitchell]

18 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Play Tennis Dataset

Target attribute to be predicted is Play Tennis

Classification Problem bcz Play Tennis={yes,no}

Learned function is represented by a decision tree

Learned trees can also be represented by if-then rules

19 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Decision Tree for Dataset

Figure: Learned Decision Tree

[Tom Mitchell]

20 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Predict Target Variable

Decision tree classifies whether morning of a day is suitable for playingtennis or not

Attributes of morning of day• Outlook = Sunny• Temparature=Hot• Humidity=High• Wind=Strong

PlayTennis = no

Given a new tuple we predict whether he can play Tennis or not bytraversing decision tree.

21 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Statistical Test

Statistical test determines how well each attribute alone classifies thetraining examples.

Best attribute is selected to test at the root node of the tree

Training examples are splitted to the branch corresponding to theexample’s value for this attribute.

22 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Information Gain

Attribute to be tested at a node for classifying examples depends oninformation gain(statistical property) of attribute.

Information gain measures how well a given attribute separates thetraining examples according to their target classification.

23 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Decision Tree Learning Algorithm

S:Training Examples, Target attribute:c different values.

Entropy of S relative to this c-wise classification:• Entropy(S)=

∑ci=1 −pi log2pi where pi is proportion of S belonging to class i.

• Information gain of attribute A

• Gain(S,A)=Entropy(S)-∑

v∈Values(A)|Sv ||S|

Entropy(Sv ) where Values(A) is

the set of all possible values for attribute A• Sv is the subset of S for which attribute A has value v.

24 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Learning Algorithm

In given dataset, S is a collection of 14 examples.

Attribute Wind has the values Weak or Strong.

9 positive(i.e play tennis=yes) and 5 negative examples denoted as [9+,5-].

Of these 14 examples, 6 of the positive and 2 of the negative exampleshave Wind = Weak.

25 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Learning Algorithm

Information gain from attribute Wind is calculated as follows:

Values(Wind)=Weak,Strong

S=[9+,5-]

Sweak=[6+,2-]

Sstrong=[3+,3-]

Gain(S,Wind)=Entropy(S)-∑

v∈Weak,Strong

|Sv ||S|

Entropy(Sv )

=Entropy(S)-(8/14)Entropy(Sweak)-(6/14)Entropy(Sstrong )

=0.940-(8/14)0.811-(6/14)1.00

=0.048

26 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Information Gains of attributes

Information gain:Selects best attribute at each step in constructing the tree

Information gain values for all four attributes are:• Gain(S, Outlook) = 0.246• Gain(S, Humidity) = 0.151• Gain(S,Wind) = 0.048• Gain(S, Temperature) = 0.029

27 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Constructing Tree

Outlook attribute provides the best prediction of the target attribute.

Outlook is selected as the decision attribute for the root node

Training examples - splitted to the branch corresponding to the example’svalue.

Every example Outlook = Overcast is a positive example.

Outlook = Sunny and Outlook = Rain do not have all positive

Repeat procedure using only the training examples associated with thatnode.

28 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Partial Decision Tree

Figure: Partial Decision Tree

[Tom Mitchell]

29 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Grades prediction using Decision Trees

Showing marks of other students not good-not done in US

Build decision tree using historical information(model works per prof)

Attributes are assignment,quiz, midsem, project, predicted endsem marks.

Student yet to write endsem-know his grade approx

Improve performance in endsem and improve their grade

30 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy using Decision Tree Classifier

Classifier Name Accurate Accuracy(Single Variable) Approx. Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx. Accuracy(Multi Variable)

DecisionTree 65. 32 98. 24 65. 14 98. 42

Table: Accuracy using Decision Trees Classifier

0

20

40

60

80

100

DecisionTree

Acc

urac

y

Decision Tree Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using Decision Trees

31 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Nearest Neighbour Algorithm

Non-parametric method for classifying objects based on closest trainingexamples.

An object is assigned to the class of its nearest neighbor

Neighbors taken from a set of objects for which the correct classification isknown

Training examples:Vectors in multidimensional feature space each with aclass label.

Test point is classified by assigning the label which is closest to that pointin the training examples.(euclidean distance as distance metric.)

32 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Example for Nearest Neighbour Algorithm

x

y

Figure: Example for Nearest Neighbour

33 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy achieved using Nearest Neighbour Algorithm

Classifier Name Accurate Accuracy(Single Variable) Approx. Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx. Accuracy(Multi Variable)

Nearest Neighbour 66. 37 97. 89 66. 02 98. 59

Table: Accuracy using Nearest Neighbour Classifier

0

20

40

60

80

100

NearestNeigbour

Acc

urac

y

Nearest Neighbour Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using Nearest Neighbour

34 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Nearest Neighbour Algorithm with Clustering

Drawback: mis-classification occurs when class distribution is skewed

Distribution of grades is not uniform. Grades like BB and BC dominates

Examples of a more frequent class tend to dominate the prediction of thenew example

Reduced the data set by replacing a cluster of similar grades, regardless oftheir density in the original training data with single point which is itscluster center.

Nearest Neighbour is then applied to the reduced data set.

35 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy using Nearest Neighbour Algorithm with Clustering

Classifier Name Accurate Accuracy(Single Variable) Approx Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx Accuracy(Multi Variable)

Nearest Neighbour Cluster 66. 55 98. 77 63. 56 98. 94

Table: Accuracy using Nearest Neighbour Cluster Classifier

0

20

40

60

80

100

NearestNeighbourCluster

Acc

urac

y

Nearest Neigbour Cluster Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using Nearest Neighbour Cluster

36 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Linear Discriminant Analysis

LDA - find a linear combination of features which separates two or moreclasses of objects

Training set: Attributes : ~x . Known class label “y”

Classification problem :Find a good predictor for the class “y” given onlyan observation ~x

LDA assumes cpd P(~x |y = 0) ∼ N(~x , ~µ0,Σ) where µ0 is a vectorcontaining means of d-attributes, taking samples whose target label is 0.

P(~x |y = 1) ∼ N(~x , ~µ1,Σ) where µ1 is a vector containing means ofd-attributes, taking samples whose target label is 1.

37 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Linear Discriminant Analysis(contd)

Covariance is a d × d matrix where d refers to number of attributes.

To classify any new observation ~x , we assign label y=0 for thisobservation if

P(y = 0|~x) > P(y = 1|~x)

Solving above equation results in linear decision boundary.

For multi class LDA, grade is max(P(y = i |~x), i = 1..c)

38 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy using LDA(contd)

Classifier Name Accurate Accuracy(Single Variable) Approx Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx Accuracy(Multi Variable)

LDAC 63. 73 98. 06 64. 26 98. 59

Table: Accuracy using LDAC Classifier

0

20

40

60

80

100

ldac

Acc

urac

y

LDAC Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using LDAC

39 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Support Vector Machines

SVM based on the concept of decision planes that define decisionboundaries.

A decision plane separates objects which are having different classmemberships.

Many hyperplanes exist that classify the data

SVM tries to find a hyperplane with maximum margin

Most classification tasks - complex decision boundaries

40 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Support Vector Machines Example

1 23

X1

X2

Figure: Linearly Seperable Data

Figure: Non Linearly Seperable Data

41 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Support Vector Machines Example

Training Data:D = {(xi , yi ) | xi ∈ Rp, yi ∈ {−1, 1}}ni=1

Find hyperplane that seperates two classes and should maximize themargin.

equation of hyperplane(eg.line1) is of form w · x− b = 0

hyperplane2 - w · x− b = 1 (assume: green objects has class label=1)

hyperplane3 - w · x− b = −1 (assume: red objects has class label=-1)

distance between two hyperplanes 2 an 3 is 2‖w‖

.

For maximum margin - minimize ‖w‖.

42 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Support Vector Machines Example

As training set is linearly seperable we add the constraintsw · xi − b ≥ 1 for xi belongs to green classw · xi − b ≤ −1 for xi belongs to red class.Combining the above two equations, we getyi (w · xi − b) ≥ 1, for all 1 ≤ i ≤ n.

The optimization objective for SVM then becomesMinimize (in w, b) 1

2wTw

subject to (for any i = 1, . . . , n) constraints yi (w · xi − b) ≥ 1

43 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Working procedure of SVM

Input SpaceFeature Space

Figure: Linearly Seperable in Feature Space

44 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Support Vector Machines for Linearly non seperable data

Introduce non-negative slack variables, ξi ≥ 0, one slack variable for eachtraining point.

It measures the degree of misclassification of the data point xi .

Slack variable is 0 for data points that are on or inside the margin

ξi = |yi − (w · xi − b)| for other points.

Data point on decision boundary - w · xi − b = 0, hence ξi = 1

Points which have ξi ≥ 1 are misclassified

45 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Support Vector Machines for Linearly non seperable data

The optimization objective then becomesminimize 1

2wTw + C

∑N

i=1 ξisubject to constraintsyi (w

TΦ(xi)− b) ≥ 1− ξi , and ξi ≥ 0, i = 1, · · · ,N

46 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy using SVM Classifier

Classifier Name Accurate Accuracy(Single Variable) Approx Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx Accuracy(Multi Variable)

SVMLinear 64. 44 98. 59 64. 79 98. 94

SVMRbf 61. 8 97. 89 66. 37 98. 42

Table: Accuracy using SVM Classifier

0

20

40

60

80

100

LibsvmLinear

Acc

urac

y

Libsvm Linear Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using Linear Kernel Function

47 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy using SVM Classifier contd

0

20

40

60

80

100

LibsvmRbf

Acc

urac

y

Libsvm rbf Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using Gaussian Kernel Function

48 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Combination of Classifiers

Previous models:Single classifiers like decision trees,LDA,NN,SVM

To improve the accuracy-used combination of classifiers

Used many classifiers - not possible to come up with a single classifier thatcan give good results.

Optimal classifier is dependent on problem domain

49 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Combining multiple classifiers(CMC)

Method1:Choose classifier which has least error rate on given dataset• Nearest Neighbour algorithm with Clustering similar classes has least error

rate

Method2: Class getting the maximum votes.

If many classifiers predict student fails, then we assign “fail” label tostudent.

50 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Combination of Classifiers

Classifier Name Accurate Accuracy(Single Variable) Approx Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx Accuracy(Multi Variable)

Classifiers Combination 63. 38 97. 89 65. 49 98. 59

Table: Accuracy using Combination of Classifiers(Method2)

0

20

40

60

80

100

ClassifiersCombination

Acc

urac

y

Classifiers Combination Accuracy

Accurate Accuracy Single VariableApprox. Accuracy Single Variable

Accurate Accuracy Multi VariableApprox. Accuracy Multi Variable

Figure: Accuracy using Classifiers Combination

51 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Difficult Classification Problem

Tried various ways to improve accuracy, but we always got 63 to 67%

Analysis Result:p=11 — difficult classification problem

Even best methods achieve around 40% errors on test data.So we achievedgood accuracy. [18]

Only way to improve accuracy is to reduce the number of class labels.

52 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Target Classes Reduced

Grouped them into four classes

“high” grades - AB, AA, AP

“middle” grades - CC, BC, BB,

“low” grades - DD, CD

“fail” - FR and FF.

53 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Accuracy when target classes reduced

Classifier Name Accurate Accuracy(Single Variable)

Nearest Neighbour 89.43

Table: Accuracy using Nearest Neighbour Classifier when Labels Reduced

accuracy is improved to 89% as compared to 66% when there are 10 classlabels.

Clearly depicts that accuracy depends on number of target classes

54 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Decision TreesNearest NeighbourLDASVMClassifiers combination

Comparison Among Different Classifiers

Classifier Name Accurate Accuracy(Single Variable) Approx Accuracy(Single Variable) Accurate Accuracy(Multi Variable) Approx Accuracy(Multi Variable)

DecisionTree 65.32 98.24 65.14 98.42

SVMLinear 64.44 98.59 64.79 98.94

SVMRbf 61.8 97.89 66.37 98.42

Nearest Neighbour 66.37 97.89 66.02 98.59

Nearest Neighbour Cluster 66.55 98.77 63.56 98.94

ldac 63.73 98.06 64.26 98.59

Classifiers Combination 63.38 97.89 65.49 98.59

Table: Comparison Among Different Classifiers

55 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Motivation for Recommendation System

No two students are identical, they learn at different rates, differenteducational backgrounds, different intellectual capabilities, different modesof learning.

Need to design a real-time recommendation system which captures thecharacteristics of each student.

Need to ensure that every student progresses through the course materialso that he maximizes his learning.

It is an immensely complex task. No personal care on students

56 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Theoretical Model for Recommendation System

Customized suggestions: Need to know student mastery level on topic

Consider question difficulty level eg.2 students got 5/10 marks

Test questions - no uniform characteristics.

Model question level performance instead of aggregate perf.

Give different internal score - says mastery level in the topic

57 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Theoretical Model for Recommendation System contd

Extract info each question provides about a student.

prob of a correct response to a test question is a mathematical function ofparameters such as a person’s abilities and question characteristics (suchas difficulty, guessability, and specific to topic).

Model helps to better understand how a students performance on testsrelates to his ability

learn about the user, update its parameters about the user to suggest himproperly and make him to achieve his learning objectives.

58 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Theoretical Model for Recommendation System contd

Student struggles with a particular set of questions - our system shouldknow where that particular students weaknesses lie in relation to theconcepts assessed by those questions.

System should then deliver content to increase the students proficiency onthose concepts.

Personalized learning cannot be done just based on total marks.

Attributes for each question: difficulty level, time taken to answer, numberof attempts to get correct answer, marks he secured, probability ofguessing the answer

59 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Theoretical Example

C Declarations,Datatypes

For While

Arrays Strings Structures

PointersStructuresAndArrays

FunctionsandPointers

StructuresControlDecision

Case

StructuresControl

Figure: Dependency Graph for C Language Course

60 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Theoretical Example contd

Difficulty

of topic

Intelligencelevel in

Internalscore on topic

MarksSecuredin Topic

GuessedAnswer

prior topics

Figure: DAG for Internal Score

61 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Theoretical Example contd

Calculate Internal score of the student in that topic - mastery on the topic.

Each node in dependency graph, attributes are mastery level on topic,intelligence level of student, difficulty level of topic etc

Classification algorithms can be applied on this data. Classify him as weakor average or intelligent at topic level and accordingly suggest himpersonalized learning resources

Topic level classification: new video tutorials on this topic, personalizedmaterials, personalized problem sets etc.

62 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Conclusion

Predicting endsem marks, grade of student apriori, providing customizedlearning materials - improve performance in endsem,grade.

Learners -different backgrounds,previous experience, so different learnersneed to focus on different material to achieve the same learning objective.

Proposed learning system gives learning support based on individuallearning characteristic.

Different learning proposals provided to students in feedback - learnerslearning rate.

63 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

Future Work

Chapter Selection

Main menu interface

learningProblem based Exam

Feedback ( Customized References)

Figure: Proposed System

Personalized resource recommendation - many granularities-problem level,topic level, course level

Redirect him to a short segment video lecture - problem level

Continuously learn from the system –not single point learning

Struggling with particular questions:Identify them, how similar studentsimproved

64 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

References I

Anna Dollar et.al, Enhancing Traditional Classroom Instruction with Web-based Statics Course, 37th

ASEE/IEEE Frontiers in Education Conference,2007

Evgeniya Georgieva et.al, Design of an e-Learning Content Visualization Module, 3rd E-Learning

Conference,2006

Rodica Mihalca et.al, Overview of Tutorial Systems based on Information Technology, 3rd E-Learning

Conference,2006

Herbert A. Simon et.al, A Game Changer,The Open Learning Initiative, From www.cmu.edu,2012

Libing Jiang et.al, Development of electronic teaching material of modern P.E. educational technology upon

problem-based learning, International Conference on e-Education, Entertainment and e-Management,2011

Stephnaos Mavromoustakos et.al, Human And Technological Issues In The E-Learning Engineering Process,

Journal Article in ResearchGate

Jodi L. Davenport et.al, When do diagrams enhance learning? A framework for designing relevant

representations, Proceedings in ICLS’08

Tanja Arh et.al A Case Study of Usability Testing the SUMI Evaluation Approach of the EducaNext Portal,

Wseas Transactions on Information Science And Applications,February 2008

Mirjam Kock et.al Towards Intelligent Adaptative E-Learning Systems – Machine Learning for Learner

Activity Classification European Conference on Technology Enhanced Learning 2010

65 / 66

Predicting Student PerformanceClassification

Recommendation SystemConclusion

References II

Behrouz et.al Predicting Student Performance:An application of Datamining methods with an Educational

Web Based System IEEE Frontiers in Education, 2003.

Mingming Zhou et.al Data Mining and Student e-Learning Profiles International Conference on E-Business

and E-Government,2010

S. Anupama Kumar et.al Efficieny of Decision Trees in Predicting Students Academic Performance, CS And

IT-CSCP 2011

R.K Kabra et.al Performance Prediction of Engineering Students using Decision Trees, International Journal

of Computer Applications,December 2011

http://www.meritnation.com/(referred on 30th Aug 2012)

http://www.topperlearning.com/(referred on 30th Aug 2012)

http://www.learnzillion.com/(referred on 31st Aug 2012)

www.it.iitb.ac.in/ekshiksha/(referred on 31st Aug 2012)

The Elements of Statistical Learning by Friedman, Hastie, Tibshirani printed in 2001 , pg 85

66 / 66