16
Third Colloquium: Application of Data Mining in Education SITI KHADIJAH MOHAMAD FACULTY OF EDUCATION APRIL 10 & 11, 2018

Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Third Colloquium:

Application of Data Mining in Education

SITI KHADIJAH MOHAMAD

FACULTY OF EDUCATION

APRIL 10 & 11, 2018

Page 2: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Introduction Data Mining, Software, RQs,

1

Page 3: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Data Mining

Data Mining is a technique which use to discover patterns in data, gain knowledge.

Machine Learning is the algorithms used in data mining technique.

Types of DM: Decision tree, Association rules, Clustering, etc.

Supervised and Unsupervised Learning?

Cross validation?

Page 4: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Software

Types: WEKA, Microsoft SQL Server 2008, RapidMiner, Clementine, R

Download: http://www.cs.waikato.ac.nz/ml/weka/

Supported Platform: Linux, Windows, Mac OS

Created: Researchers at the University of Waikato, New Zealand

Page 5: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Research Question

Association, Clustering and Decision tree are NOT Cause - Effect analysis.

It is actually about relationship analysis.

Eg of RQs:

1. To develop a decision tree model that can predict student’s performance based on the

mechanisms of metacognitive scaffolding prompted by the instructor in Facebook discussion.

2. To formulate learning performance pathways based on the reflective thinking and types of

feedback through educational blogging

3. How the provision of feedback and reflective thinking shape the reflection process through

educational blogging

4. To develop deaf students’ learning patterns when using the e-learning environment in studying

Nuclear Energy

Page 6: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Decision Tree

• This is related to lifestyle and heart disease.

• Age, Smoker (y/n), Diet (good/poor), and a label Risk

(Less Risk/More Risk).

• The biggest influence on Risk turns out to be the

Smoker attribute.

• Smoker becomes the first branch in our tree.

• For Smokers, the next influential attribute happens to

be Age, however, for non smokers, the data indicates

that their diet has a bigger influence on the risk.

• The tree will branch into two different nodes until the

classification is reached.

• Decision tree can be a great way to visualize how a

decision is derived based on the attributes in your

data.

Credit to: refactorthis.net

Page 7: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Association Rules

Q1 Q2 T1 conf: (1)

Q7 T3 conf: (0.92)

T2 Q2 conf: (0.5)

Support (coverage) and Confidence (accuracy)

Page 8: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Clustering

Credit to: Almodiel

Page 9: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

WEKA Workbench 2

Page 10: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

WEKA Workbench (1) Performance Comparison

Graphical Interface

Classifiers

Command-line Interface

Page 11: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

WEKA Workbench (2)

Supply data here

Details of the data

Details of the data

• Attributes == Variables

• Instances == No of samples

Preprocess Tab

Page 12: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

4 options to

classify the data

WEKA Workbench (3)

Classify Tab (also known as postprocessing tab)

Results panel

Lists of algorithms

Right click here to

view the tree

Page 13: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique
Page 14: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

What Does Precision and Recall Tell Us?

Precision: Given all the predicted labels (for a given class X), how many

instances were correctly predicted?

Recall: For all instances that should have a label X, how many of these

were correctly captured?

Suppose a computer program for recognizing dogs in scenes from a

video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4

of the identifications are correct, but 3 are actually cats, the program's

precision is 4/7 while its recall is 4/9.

Application & Interpretation

True Positives and True Negatives: are correct classification

False Positives: when the outcome is incorrectly predicted as yes when it is actually no

False Negatives: when the outcome is incorrectly predicted as no when it is actually yes Credit to: wikipedia

Page 15: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Calculate Recall for Class A:

= TP_A / (TP_A+ FN_A)

= 10 / (10 + 2 )

= 0.83

Predicted Class

a b c Total

Actual

Class

a 10 1 1 12

b 2 0 1 3

c 1 0 0 1

Total 13 1 2 16

Application & Interpretation

Calculate Precision for Class A:

= TP_A / (TP_A+ FP_A)

= 10 / (10 + 3 )

= 0.769

Page 16: Third Colloquium: Application of Data Mining in Educationsmartdigitalcommunity.utm.my/cite/files/2018/05/THIRD-COLLOQUIUM-Application-of-DM-in...Data Mining Data Mining is a technique

Thank You! Questions?