34
Student Performance Analytics for Blended MOOCs on IITBombayX Submitted in partial fulfillment of the requirements of the degree of Master of Technology by Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak Department of Computer Science and Engineering Indian Institute of Technology Bombay 2016

Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Student Performance Analytics forBlended MOOCs on IITBombayX

Submitted in partial fulfillment of the requirements

of the degree of

Master of Technology

by

Rahul Dev Parashar

(Roll No. 13305R006)

Supervisor:

Prof. Deepak B Phatak

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

2016

Page 2: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Abstract

Multiple institutes are partnering with IIT Bombay to offer blended MOOCs. Students

will study the online course on IITBombayX, and will also study the same course normally

in their institute. Final grade will be based on the composite performance of students,

in the online assessment, and in their regular tests/exams at the institute. In blended

model, it is important to understand the learning of each student and their performances.

Considering large number of students in each MOOC, it is not possible to do manual

analysis. So, an automated system is needed to do this analysis. The objective is to

design and implement a system to facilitate performance analysis of students of different

participating institutions. Using this system, a teacher from such an institute will be able

to (i) compare performances of local students with that of other students, (ii) compare

performances of students in local and online assessments, and (iii) view the event log

analytics to compare learning habits of students.

i

Page 3: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Contents

Abstract ii

List of Figures iv

1 Introduction 1

1.1 MOOCs (Massive Open Online Courses) . . . . . . . . . . . . . . . . . . . 1

1.1.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Disadvantages and Challenges . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.4 Open EdX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Blended Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Benefits over normal MOOCs . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.3 IITBombayX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Performance Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Analytics on open edX . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.2 Objectives of performance analytics . . . . . . . . . . . . . . . . . . 5

1.4 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Literature Survey 7

2.1 IITBombayX Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Navigation Bar, Sequences, Units and Modules . . . . . . . . . . . . 7

2.1.2 Naming Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 IITBombayX Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

ii

Page 4: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

List of Figures CONTENTS

2.2.1 OpenEdX Components: . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Various Data Modules . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Events in Tacking Logs (Student Engagement) . . . . . . . . . . . . . . . . 10

2.4 Student Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Proposed Approach 13

3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.1 System Architecture: . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.2 Available Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.3 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.4 Data Analytic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Analytic Model 22

4.1 Understanding events and their use . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Timeline of students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Categorization of students . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 Other results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Obserations 23

5.1 Some Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Reporting Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 Future Work and Conclusion 26

6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

iii

Page 5: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

List of Figures

2.1 Typical courseware on IITBombayX . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Sample log record for video interaction event . . . . . . . . . . . . . . . . . 11

3.1 Architecture of data analytic system . . . . . . . . . . . . . . . . . . . . . 15

3.2 Various data modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Typical analytic report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.1 Invalidated JSON object . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iv

Page 6: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 1

Introduction

In this preliminary chapter, we introduce MOOCs and blended learning and their exam-

ples. We have also discussed what student performance analytics is and why it is useful.

At last, problem statement used for our work is explained. In section 1.1 and 1.2, working

of MOOCs and Blended learning is discussed, in section 1.3, performance analytics of

students is discussed, and in section 1.4 problem statement is discussed.

1.1 MOOCs (Massive Open Online Courses)

Massive Open Online Courses (MOOCs) enable learners to study any topic of their de-

sire online. They Provide flexibility to view and access content anytime, anywhere. In

addition to traditional course materials such as lectures, reading materials, exams, and

class discussions, MOOC provides a discussion forum to interact with instructors, teach-

ing assistants, and other participants. Since a few years, it has emerged as a popular

mode of distant learning. They have some signature characteristics that include: lectures

formatted as short videos combined with formative quizzes; automated assessment and/or

peer and self–assessment, and an online forum for peer support and discussion.

1.1.1 Advantages

MOOCs are delivered by top-tier institutions and not to just a few hundred students in

a lecture hall on campuses, but via the Internet to thousands or even millions around

1

Page 7: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 1. Introduction 2

the world. Typically, students watch short video lectures and complete assignments that

are graded either by machines or by other participants of the course. That way a lone

professor can support a large class with the help of Teaching Assistants (TAs).

1.1.2 Disadvantages and Challenges

One of the biggest challenges is, that how can you effectively teach thousands of students

simultaneously, where each student’s learning style and capabilities are different. As the

class size is large, and if the teacher is not aware of learning style of students, then the

effectiveness of learning can be low.[1]

1.1.3 Examples

• EdX

EdX is a non-profit organization which is formed by Harvard University and Mas-

sachusetts Institute of Technology(MIT) in May 2012. EdX differs from other

MOOC providers, in that it is a nonprofit organization and runs on open source

software. ”As of 24 March 2016, edX has more than 7 million students taking more

than 700 courses online.”[? ]

• Coursera

Coursera is venture-backed, a for-profit organization which provides MOOCs and

has a large number of registered students and courses. ”As of May 2015, Coursera

had more than 1000 courses from 119 institutions and 13 million users from 190

countries.”[? ]

• Udcity

Udacity is also a venture-backed, for-profit organization which provides MOOCs and

has 1.6 million users in 12 full courses and 26 free courseware as of 28 April 2014.”[?

]

2

Page 8: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 1. Introduction 3

1.1.4 Open EdX

Open edX is the open source platform that powers edX courses. EdX code is freely

available to the community. Institutions can host their own instances of Open edX and

offer their own classes. Educators can extend the platform to build learning tools that

precisely meet their needs. And developers can contribute new features to the Open edX

platform. IITBombayX uses the architecture of open edX.

These are the features in open edX:

• Open edX Studio

• The Open edX LMS (Learning Management System)

• Discussion forum

• Open edX Insights

1.2 Blended Learning

Blended learning is an education program where a student learns through MOOC, as well

as through their regular course in their institute, involving face-to-face interaction[2].

1.2.1 Benefits over normal MOOCs

Blended model is more effective than either regular class based learning alone or MOOC

alone. Learning only through a traditional class may be hindered by the ability of teacher

to teach the subject; where as, learning through MOOCS, depends highly on the moti-

vation and self-learning of participants. Blended learning overcomes these challenges by

providing a collaborative learning experience. In blended learning students with special

talents or interests outside of the available curriculum can use educational technology to

advance their skills. Also, students which have difficulty in learning the material can seek

help from either class teacher or discussion forum. So this collaborative model overcomes

the limitations of pure classroom based or pure MOOC learning[3].

3

Page 9: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 1. Introduction 4

1.2.2 Challenges

For best use of the blended model, MOOC and classroom learning must be in sync.

Matching the course content of a MOOC can be challenging in a blended model, because

faculties from various institutions might have a different syllabus, either as per their college

curriculum or their own interest. Also, the lecture recording technologies can result in

students lagging behind on the course material. Students may also watch several weeks’

worth of videos in one sitting.

1.2.3 IITBombayX

IITBombayX offer MOOCs (Massive Open Online Courses) that provides high-quality

lectures in IIT-style to students in Indian languages. Along with this, students studying in

universities which are partner institutes of IITBombayX can register for blended learning.

Initially, three courses were provided to students free of cost: Introduction to Computer

Programming(CSE), Signals and Systems(EE), and Thermodynamics(Mech). The data

of these three courses along with few other courses provided on IITBombayX is used for

analysis. ***************Data************.

1.3 Performance Analytics

1.3.1 Analytics on open edX

Students can participate in various courses on edX or IITBombayX. Both uses open

edX pipeline for Data Analytics. A large number of activities happen in this process

including student and teacher interaction on the course page, which are categorized based

on events. These events cover everything from watching a video, solving problems, reading

a textbook, page close, navigation, registration, etc. The Analytics platform works around

this information of interaction with the system. Some crucial information like user names,

email ids, etc. are stored separately in MySql tables and this data is small in size. General

information like demographics and others can be achieved through this. Along with this,

every interaction by student and others, are stored as JSON objects in tracking logs.

4

Page 10: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 1. Introduction 5

For detailed analysis and to get all the information available we need to process these

tracking logs which capture all the events. More analytics can be done on this data to

develop models which can be used to learn student behavior properly. Once we learn

about students and their timeline. This information can be used in personalized tutoring

systems.

1.3.2 Objectives of performance analytics

There are various objectives of students performance analytics. Some of them are listed

below.

• Time tracking Time tracking can be one of the useful things which gives an indi-

cation about how much time a student has spent on some particular topic. It also

gives information about that if a student is spending enough time or not, by which

faculty can give suggestions to students.

• Self-monitoring for learners Self-monitoring for learners is important for per-

sonal learning environments and self-regulated learning. Feedback can be given by

the teacher and considering a large number of students it can also be given through

analysis of students data. These results can allow students to keep track of their

progress and compare themselves with other students. They can also see that in

some particular topic where they stand in comparison with other students and can

spend more time if required.

• Awareness for teachers Awareness for teachers is also important because they

should learn about the learning behavior of students. Depending on which, they

can make decisions like what they should recommend students. Generally, teachers

get the idea about the students progress at the time of exam only. But by providing

the summary of students time spent on the particular topic, video or problem can

help them learn more about the course at an early stage. For example, students

spending too much time on one exercise might indicate that they have a hard time

accomplishing it. The same way teachers can get an idea about which students need

more attention to successfully complete the course.

5

Page 11: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 1. Introduction 6

1.4 Problem Statement

As it is not possible to keep track of every student and their learning behavior, an au-

tomated system is needed which can create a time line for each student, according to

their engagement with the course. This analysis can help in a self-regulated learning

environment of MOOCs. Further, Blended MOOCs require monitoring students’ perfor-

mance in the classroom as well as in MOOC environment, and how it can best be used.

The performance analysis of students can help in achieving better outcomes. In this re-

port, performance analysis is done on students from partnering institutes, offering blended

MOOC with IITBombayX. Various specifications of the blended model by IITBombayX

are discussed in meeting held at IIT Bombay on 6th June, 2015[4]. Our vision is to ex-

plore, examine, and solve pedagogical and technical issues, and establish the best possible

model, for Indian education system.

Objective: This project proposes to design and implement a system to permit per-

formance analytic of students from different participating institutions. Using this system,

a teacher from such an institute will be able to:

• Compare performance of local students with that of other students.

• Compare performance of students in local and online assessments.

• View the event logs analytics to compare learning habits of students.

There are certain parameters which can be used for analyzing the performance of

students. Some typical characteristics/questions are following:

• How many students are solving questions before going through study material?

• What are grades of students for a particular class in comparison with other students?

• Is there any relation between performance on MOOC and classroom learning(provided

teacher has submitted classroom grades on IITBombayX)?

If there is much difference in performance of a student in one of the modes, then

the teacher can take corrective action.

• Any other feedback that might help for better learning?

6

Page 12: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 2

Literature Survey

In section 2.1, the front end of IITBombayX is discussed. This explains about how

each unit or module is identified on the platform. In section 2.2, the architecture of

IITBombayX is discussed. In Section 2.3, various event types used in IITBombayX are

discussed. In section 2.4, we discuss on what basis performance of a student will be

analyzed.

2.1 IITBombayX Front End

2.1.1 Navigation Bar, Sequences, Units and Modules

The main course content is under courseware section of navigation bar. From the left

side panel we can navigate through various units and subtopics, called sequences. Again

in a sequence there are various modules. A module might contain a video, text, pdf, or

problems. Specific content within a module is identified using a unique id. See Figure 2.1

for description.

2.1.2 Naming Schemes

In IITBombayX, URL patterns partially represent hierarchical structure of the course

URLs, but do not contain information about the courseware modules that appear in

a sequence. Modules are named using separate platform specific naming scheme. All

modules under a sequence have the same id.

7

Page 13: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 2. Literature Survey 8

Figure 2.1: Typical courseware on IITBombayX

Here it is explained how to identify units ans sequences from URL.

Example URL:

”https://iitbombayx.in/courses/IITBombayX/CS101.1xS16/2016 T1/courseware/

bc8f4e5d7e394bec93b07c529639ca41/c38c252b50674c5e8a41e5d798c0c233/”

Here ”bc8f4e5d7e394bec93b07c529639ca41” is unit id and ”c38c252b50674c5e8a41e5d798c0c233”

is sequence id.

All modules under a sequence have same URLs. So seperate platform specific URLs

are used for this purpose.

Example module URL:

”i4x-IITBombayX-ME209x-video-548064d63329451696e6f56a2c4a6446”

Where ”548064d63329451696e6f56a2c4a6446” is module id and ”video” represent type

of that module.

8

Page 14: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 2. Literature Survey 9

2.2 IITBombayX Architecture

Open edX is a web-based platform for creating, delivering, and analyzing online courses.

IITBombayX uses architecture of open edX[5]. IITBombayX also provides support for

Blended Learning. Separate authentication process is used in blended model as a wrapper

to open edX.

2.2.1 OpenEdX Components:

• CMS (Content management system): This component allows for the authoring of

tools. A Django application uses MongoDB(NoSQL) for content management.

• LMS (Learning Management System): The part of OpenEdX that students interact

with. It displays content, runs quizzes and interactive applications. It’s subcompo-

nents are Wiki, Discussion Forum, etc.

• Event Tracking: Track events for any interaction with the system. Capture and store

events with nested data structures in order to truly take advantage of schemaless

data storage systems. These event logs are stored as JSON objects.

• Open edX Insights and Analytics: Insights is a development version of a Python,

Mongo, and Django framework for creating simple, pluggable analytics based on

streaming events. This does not include the analysis of every event from logs.

2.2.2 Various Data Modules

IITBombayX data is stored in various data modules. For the ease of storage and interac-

tion with data, various designs are used. These are the various data models which stores

related data.

Student Information and Progress Data

General information about students and their progress is stored in MySQL database. This

can be termed as summary information about students. Open edX Insight makes use of

this data to give simple analytics. Information about assignments, quizzes, and exams is

9

Page 15: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 2. Literature Survey 10

stored here. Data for students is presented in as in User Data, Courseware Progress Data,

Certificate Data.

• User Data Student data is present in various tables. But for our use, we only need

to consider data from student courseenrollment table.

• Courseware Progress Data Each module in the courseware is stored in course-

ware studentmodule table with its state or score for every student. This table is

used to read student progress data.

• Certificate Data Certification related data is in certificates generatedcertificate

table tracks the state of certificates and final grades for a course.

Discussion Forum Data

IITBombayX discussion forum data is stored as collections of JSON documents in a

MongoDB database. It gives information about students interaction with other students.

Comment threads are used to analyze this data.

Tracking Logs Data

Whenever a student interacts with the course, every action by the student is stored in

logs, classified based on event type. For example, whenever a student clicks on some video

to watch or to pause, these events are stored in logs with the adequate information to

analyze it. Events are emitted by the server, the browser, or the mobile device to capture

information about interactions with the courseware and the Instructor Dashboard in the

LMS, and are stored in JSON documents.

2.3 Events in Tacking Logs (Student Engagement)

Tracking logs can be classified based on event type for which they are generated[6]. Events

comprise of fields which are common to all events, fields related to students activity, and

fields related to course team activity.

10

Page 16: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 2. Literature Survey 11

In sample log(Figure 2.2) for video seek event we can find all the required information

including, user name, IP, course name, module id, event type, seek start time, seek end

time, etc.

Figure 2.2: Sample log record for video interaction event

These logs can be analyzed by checking the events they are emitted from. Some events

and commons fields are detailed below.

• Common Fields: Fields that are common to the schema definitions of all logs.

– Context: It contains course id, org id, path(URL that generated the event),

user id fields.

– Event: This field provides information for the event this log is created.

– Event Source: This field is used to identify the application that was used from

the browser or mobile device.

– Event Type: This field provides information about for whom this event is

created. It can be a student or course team member.

– Page: URL of the page, the user was visiting when the event was emitted.

– Time: Gives the UTC time at which the event was emitted.

– UserName: The username of the user who caused the event to be emitted.

• Student Events

11

Page 17: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 2. Literature Survey 12

– Enrollment Event: Activities like activation, deactivation of account.

– Navigational Events: Events like page close, goto position, and jump to discussion

are found.

– Video Interaction Events: It consist of events like hide transcript, load video,

pause video, play video, seek video, show transcript, speed change video, stop video,

etc.

– Textbook Interaction Events: Consists of events for interaction with pdf and

other text material provided.

– Problem Interaction Events: Interaction with problems in quizzes and exams

are problem interaction events. Some typical events are problem check, prob-

lem graded, problem save, problem show, save problem success, show answer,

etc.

– Discussion Forum Events: This event is generated when a comment is created,

a response is given, or a new thread is created in discussion forums.

• Course Team Events: It consists of events which are emitted when a teacher or

admin interacts with the system.

2.4 Student Performance

The performance of students can be measured from their response to quizzes, exams, etc.

There are 2 kinds of content from which performance can be analyzed.

• Graded Content

Graded content contributes toward final score of a student. Overall score of a student

is calculated by taking given weightage of each quiz, assignment, exam, etc.

• Ungraded Content

Ungraded content do not contribute toward final score of a student. This content

can be used to understand learning ability and habits of a student and improvement

in learning for that course.

12

Page 18: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3

Proposed Approach

3.1 Problem Statement

To understand the learning behavior of students, it is necessary that their performance

in the course should be analyzed properly[7]. As MOOCs are offered to students in large

numbers, it is not possible for a teacher to keep a manual check on this. It is also noted

that learning style of each student is different. For that purpose, they must be taught in

a different manner. Some student might catch concept really fast and they might want to

improve their learning by solving challenging problems. On the other hand, some students

may find it difficult to learn even the provided material. So for them, other resources must

be suggested.

In the blended model, the teacher who is providing classroom learning might want to

know that how online learning is helping his students and what is their performance for

the online course. This analysis can give him insight into understanding learning behavior

of his students.

To cater the above need, it is essential that performance analysis of students is done

and provided to teachers. IITBombayX uses open edX Insights[8] to monitor activities for

a course. This tool provides basic information about course progress, students responses

to quizzes and assignments, and other details about students. Open edX Insights do not

consider all the events that are being generated in log files. In addition, there are other

fields which are not classified under any event. We have tried to classify them on the basis

13

Page 19: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 14

of their source from which they are emitted by identifying their properties. Such fields

and their possible event source are shown in Chapter 4.

There are various components for which Open edX Insights generate reports. They

are:

• Course Enrollment

This category gives information about students enrollment, demographics, geogra-

phy, etc.

• Student Engagement

This category gives information about the quizzes students have answered, which

choices were selected, their assignment submission status, and interaction with the

videos. Only general information about videos is used for example how long they

have watched and which section was watched again.

• Student Performance

Students’ answers are recorded for graded and nongraded quizzes and assignments.

Based on this, various reports are generated. For example, how many students have

answered it correctly and so on.

But the Open edX Insight doesn’t provide a complete analysis of students’ performance

and their timeline about their interaction with the system. It is essential to show the

learning style of students. For example, which student watches a video first and then

attempts a quiz and vice versa. So, an automated system is needed which can create

a time line for each student according to their engagement with the course, and also to

analyze their performances.

As IITBombayX provide blended courses, it is essential that proper wrappers are

provided over this system to give these reports to their classroom faculty. In addition to

that, for analyzing the performance of their students in a better way, comparison among

various students can be done with both blended and non-blended course students.

First cleaning and preprocessing of tracking logs are done. This data is stored in

HDFS(Hadoop Distributed File System) for the purpose of analytic work. For the purpose

of proper analysis, preprocessing of logs is required. Once we get the data in proper place,

we can start doing performance analysis based on particular characteristics.[9]

14

Page 20: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 15

3.2 Proposed Method

This approach was originally conceived by Mrs. Sukla Nag of IITBombayX team. During

summer 2015, a few interns also worked on some aspects. As part of this project, I have

built upon that work by testing and documenting all the earlier work, and then developing

final modules in the prototype.

3.2.1 System Architecture:

The architecture diagram for the model used in data analytics is shown below.

Figure 3.1: Architecture of data analytic system

The above diagram explains how the analysis will be done. Various steps are explained

below in detail.

15

Page 21: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 16

• Data: As explained in section 2.2, data is present in various modules. Out of these,

tracking logs are not structured. So these are preprocessed and cleaned. An example

of tracking log is shown in section 2.3.

• ETL: These tracking logs are cleaned and preprocessed using a JAVA program

based on particular event type.

• Storage: This data will then be moved to Hive tables in HDFS. From here, this

can be used for analytics.

• Data Analytics for non-blended MOOCs: Now, the available data can be used

to analyze students performance.

• Data Analytics for blended MOOCs: The analysis results for non-blended

MOOCs can be wrapped using authentication of the blended model. After filtering

these results, reports can be shown to faculty, students, etc., on the web-based

platform.

3.2.2 Available Data

To analyze the performance of students, tracking logs provide a lot of useful information.

But the format of tracking log is semi-structured. To understand the pattern and classify

these logs based on their events, it is necessary to preprocess and clean them. Once we

classify them based on the events, it becomes easier to analyze this data. Steps followed

in the method are:

1. Identify various data modules which can be brought to use in performance analysis.

2. Clean the data

3. Analyze the data and generate reports for students’ performance.

As detailed earlier, there are various data modules which hold IITBombayX data. For

the purpose of performance analysis, we are considering below-mentioned data modules.

These data modules are stored in different ways. The uses of each module and it’s storage

are explained in detail below and also shown in Figure 3.2.

16

Page 22: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 17

Figure 3.2: Various data modules

1. Student Info and Progress Data

This section explains how stateful data for students is stored internally. It contains

general information about students, their name, username, email id, geographical

details etc. It also stores progress of students in a course. Data for students is

presented in these categories:

• User Data: Basic information about the user.

• Courseware Progress Data: It stores information about what material a student

has covered and what were the responses to various modules.

• Certificate Data: It contains final grade, the status of a certificate, etc.

This data is stored in MySQL tables. Open edX Insights uses this data to produce

various reports. For example, how many students are registered in course etc. The

size of the data is less than the tracking log data, and can be useful for generating

reports in less time. This will be used for student performance analysis as required.

A typical report produced by Open edX Insights is shown in Figure 3.3.

2. Course Content Data

Course content data can be used to get information about course modules. We can

17

Page 23: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 18

check how many quizzes/exams are conducted in course or module wise and other

related information. Information about videos, quizzes, assignments, etc. is stored

in JSON files.

3. Discussion Forum Data

IITBombayX discussion forum data is stored as a collection of JSON documents in

MongoDB database. The primary collection that holds all of the discussion posts

written by users is “contents”. Comment and comment threads are created for

discussions. In addition to these collections, events are also emitted to track specific

user activities and is stored in tracking logs. These will be explained in next section.

Wiki data is also stored in MySQL tables. There are 2 files, one file gives information

about articles added on the wiki, while the other contains modifications made to

articles on the wiki.

4. Tracking Logs

Tracking logs store every activity or interaction with the system. These logs can

be classified based on various events. These events are already discussed in section

2.3. There are various approaches used to understand and use this data for analysis.

Open edX Insights perform basic analysis and gather reports produced from them.

To do this analysis the logs are processed and required information is stored in

MySQL. Then generated MySQL data is used for analysis.

Open edX Insights only captures limited information[10]. Currently, only the video

based events are captured. For proper performance analysis of students, it is neces-

sary to capture other events like problem interaction event, course interaction event,

etc. For this purpose, this data is cleaned and pre-processed. In the next section,

we explain the importance of data preprocessing and classification based on events.

3.2.3 Data Cleaning

A JAVA project has been created to break these logs based on the event type. Separate

modules are created for various events. These logs are processed one-by-one, and parsing

is done for JSON objects. After classifying event type, respective object is used to store

18

Page 24: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 19

that information. Once the processing of that particular log is done, it is stored in Hive

tables. The event types are studied from edX documentation on event logs. Data model

are created based on edX defined event classification for ease of change. In case any

change is made in open edX event type. It is easy to change the data modal based on

as mentioned in documentation. Following are the details for various objects created and

schema.

• Objects Created

To capture the events, various objects are created. Once the event is identified, data

is stored in these objects, which is then stored in HDFS. Various objects used are:

CourseProblems, CourseQuizzes, CourseVideos, EventCourseInteract, EventEnroll-

ment, EventForumInteract, EventProbInteract, EventVideoInteract, StudentCourseEn-

rolment, StudentCourseGrade, UserSession, etc.

These objects have various fields to store required information about that particu-

lar event. After storing data in these objects, these are passed to respective HDFS

store.

• Tables created

Once the data is stored in objects based on events, they are transfered to hive tables

in HDFS schema. Some table and their columns are shown in Table 3.1.

3.2.4 Data Analytic

The data stored in various modules can be used for analyzing the performance of a stu-

dent. Using this, we will be able to get the timeline of a student. This can then be used

for identifying learning pattern of a student.

Open edX Insights, with addition of events which were not considered, can be used for

MOOCs analysis. To use this for a blended model, we need to wrap results using authen-

tication mechanism provided for blended model. In this way, each faculty will see results

only for their students. In case when we compare performances of students from different

institutes, then this data can also be shown to them.

19

Page 25: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 20

Table 3.1: Tables created in MySQL for various events

Table Name ColumnsCourse courseId, lmsName, orgName, courseName, courseTitle, au-

thorUserId, textbackslash currConcepts, prevConcepts, course-Lang, minPrice, suggestedPrice, countryCode, endDate, start-Date

CourseForums forumId, lmsName, orgName, courseName, courseRun, com-mentSysId, commentType, anonymousMode, lmsAuthorId,lmsAuthorName, createDateTime, lastModDateTime, upVote-Count, totVoteCount, commentCount, threadType, title, com-mentableSysId, endorsed, closed, visible

CourseProblems problemId, lmsName, orgName, courseName, chapterSys-Name, sessionSysName, quizSysName, quizTitle, quizType,quizWeight, noOfAttemptsAllowed, quizMaxMarks, hintAvail-able, correctChoice

CourseVideos videoId, lmsName, orgName, courseName, chapterSysName,videoSysName, videoUTubeId, videoDownload, videoTrack-DownLoad, videoTitle, videoUTubeId075, videoUTubeId125,videoUTubeId15, videolength

CourseWiki wikiId, lmsName, orgName, courseName, wikiSlug, lmsWikiId,createdDate, lastModDate, lastRevId, ownerId, groupId,groupRead, groupWrite, otherRead, otherWrite

EventCourseInteract eventId, lmsName, orgName, courseName, courseRun, lm-sUserId, eventName, eventNo, moduleType, moduleSysName,moduleTitle, chapterSysName, chapterTitle, createDateTime,modDateTime, oldPosition, curPosition, source

EventForumInterect eventId, lmsName, orgName, courseName, eventName, com-mentThreadId, lmsUserId, queryText, noOfResults

EventProbInteract eventId, lmsName, orgName, courseName, lmsUserId, event-Name, eventNo, quizzSysName, quizzTitle, chapterSysName,chapterTitle, hintAvailable, hintMode, inputType, response-Type, variantId, oldScore, newScore, maxGrade, attempts,maxAttempts, choice, success, source, probSubTime, done,createDateTime, lastModDateTime, courseRun

EventVideoInteract eventId, sessionSysName, lmsName, orgName, courseName,courseRun, lmsUserId, eventName, eventNo, videoSysName,videoTitle, chapterSysName, chapterTitle, oldSeekTime,currSeekTime, videoNavigType, oldSpeed, currSpeed, source,createDateTime, lastModDateTime

20

Page 26: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 3. Proposed Approach 21

We have generated a report from the data collected for blended MOOCs on IIT-

BombayX. Currently, there are 3 courses running on IITBombayX blended. These are:

CS101.1xA15, ME209xA15, and EE210.1xA15. The report(Figure 3.3) shows, for 3

courses on a weekly basis, how many users have interacted with the system. We can

see patterns in this graph. Like, for course CS101.1xA15, there is a sharp increase in

the number of users because of quiz 1 of the course in that week(19/08-25/08). This

report can also be generated for each institute/batch separately. There are other useful

observations we can make from this report. Some of them are:

• Area under the curve can explain what can be best duration between 2 quizzes to

keep the students active.

• User’s activities can show which part of the course was not interesting, after which

most users’ interest decreased.

Figure 3.3: Typical analytic report

21

Page 27: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 4

Analytic Model

4.1 Understanding events and their use

4.2 Timeline of students

4.3 Categorization of students

4.4 Other results

22

Page 28: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 5

Obserations

5.1 Some Observations

Large data from tracking logs has been processed. Tracking logs generated for IITBom-

bayX are used for purpose of analysis. Records for month of June to September were

used. Summary of this is shown below.

• Summary of processed logs: Around 20 million IITBombayX tracking logs were

processed. There were JSON objects which were not in proper format are recorded.

Also, there were some events which are not documented. These logs are classified

as of now based on behavior and property of these logs. They are also recorded for

further analysis and will be reported.

• Data in MySQL: The backup of MySQL database is taken in sql dump and given

to various IITBombayX teams for experimentation.

Preprocessing of tracking also logs reveals some interesting facts about logs. Few

observations made from log preprocessing are:

• Few logs are not properly structured. They are invalidated when parsed by JSON

parser. This is because of some minor bug in some modules of code from where these

logs are generated. As we have all the information required to find exact location of

this error, we can use this to fix this kind of coding issues. On an average, we have

around 2 such cases out of 2500 logs. One such log is shown below.

23

Page 29: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 5. Obserations 24

Figure 5.1: Invalidated JSON object

• There are some logs for which event are not defined. There are events name and their

fields in documentation. But these event/fields names are nowhere found. These

results can also be used to understand the issues in code that why such events are

generated and we can modify the code accordingly or document the newly found

events. Few events of such kind are shown below. For the purpose of understanding

and storing these in tables, these are classified in existing events based on their

nature.

– These event are of type navigational:

goto position, dashboard, jsi18n, i18n.js, jump to discussion, progress, view courses,

logout, how it works, calculate, jump to vertical, etc.

– These event are of type video interaction:

save user state, transcript translation, transcript download, /transcript/trans-

lation, /transcript/download, etc.

– These event are of type discussion forum:

users, reply, upvote, flagAbuse, follow, unfollow, upload, etc.

24

Page 30: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 5. Obserations 25

5.2 Reporting Issues

It is proposed to examine these issues in the context of the new Cypress release. Some

of the points may have been already addressed. Final Cypress documentation related to

Insights is expected to be released soon. Points which are found to be relevant will be

addressed in this project, and will be communicated to Open edX.

25

Page 31: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 6

Future Work and Conclusion

6.1 Future Work

• One pending task in tracking log cleaning is to complete the preprocessing of discus-

sion forum event, because discussion forum can give many more interesting results.

Generally, student’s interaction with the discussion forum shows that they are tak-

ing huge interest in learning. If they answers questions on discussion forum for some

topic correctly, then it shows that they have good command over that particular

topic. So, preprocessing of tracking logs is completed. Now using this and other

data modules, we can do better analysis for student performance. Then, based on

this analysis reports will be generated for teachers. These reports will then be used

to make the blended learning more effective.

• Before proceeding with any analysis, it is essential that characteristics and measures

used for this analysis are studied. As discussed in section 1.3, these measures will

be identified. Some points that will be considered in stage 2 are:

– How can we use students timeline to learn about their learning style.

– Based on their interaction with study material and quizzes, and marks ob-

tained, we can identify understanding of student for some particular module.

– What kind of reports need to be generated for teachers?

– Can we give some suggestions to students for improving their learning?

26

Page 32: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Chapter 6. Future Work and Conclusion 27

6.2 Conclusion

It has been shown that preprocessing of tracking logs data is necessary to get better

performance analysis results. If we don’t do the preprocessing then it is not possible to

process the logs on the go. Size of the logs are quite high and to make use of them it is

essential that they are preprocessed. Tracking log data with other data models like stu-

dents’ normal data and discussion forum data can give clear idea about students’ learning.

It is also noted that these performance reports can really help students in their self learn-

ing. Constant feedback on students’ process will also help teacher to learn more about

students learning and their performance to provide better learning experience to them.

27

Page 33: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Bibliography

[1] Thanasis Daradoumis, Roxana Bassi, Fatos Xhafa, and Santi Caballe. A review on

massive e-learning (mooc) design, delivery and assessment. In P2P, Parallel, Grid,

Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on,

pages 208–213. IEEE, 2013.

[2] Deepak B Phatak. Adopting moocs for quality engineering education in india. In

Proceedings of the International Conference on Transformations in Engineering Ed-

ucation, pages 11–23. Springer, 2015.

[3] Alfred P Rovai and Hope Jordan. Blended learning and sense of community: A com-

parative analysis with traditional and fully online graduate courses. The International

Review of Research in Open and Distributed Learning, 5(2), 2004.

[4] Consultation conclave on blended moocs. URL http://www.it.iitb.ac.in/frg/

wiki/images/9/9a/Minutes_Consultation_Conclave_on_Blended_MOOCs.pdf.

Retrieved on 2015-10-11.

[5] Open edx architecture. URL https://open.edx.org/contributing-to-edx/

architecture. Retrieved on 2015-10-11.

[6] Events in the tracking logs. URL http://edx.readthedocs.org/projects/

devdata/en/latest/internal_data_formats/tracking_logs.html. Retrieved on

2015-10-11.

[7] Joseph A Konstan, JD Walker, D Christopher Brooks, Keith Brown, and Michael D

Ekstrand. Teaching recommender systems at large scale: Evaluation and lessons

28

Page 34: Student Performance Analytics for Blended MOOCs on IITBombayX · Rahul Dev Parashar (Roll No. 13305R006) Supervisor: Prof. Deepak B Phatak ... Final grade will be based on the composite

Bibliography 29

learned from a hybrid mooc. ACM Transactions on Computer-Human Interaction

(TOCHI), 22(2):10, 2015.

[8] Using edx insights. URL http://edx-insights.readthedocs.org/en/latest/

index.html. Retrieved on 2015-10-11.

[9] Cristobal Romero and Sebastian Ventura. Educational data mining: A survey from

1995 to 2005. Expert systems with applications, 33(1):135–146, 2007.

[10] Edx analytics dashboard. URL https://github.com/edx/

edx-analytics-dashboard. Retrieved on 2015-10-11.

29