52
Data Synchronization for edX Platform M.Tech Thesis Submitted in partial fulfillment of the requirements of the degree of Master of Technology by Alpesh Rathore Roll Number: 113050057 Under the guidance of: Prof. D.B. Phatak Department of Computer Science and Engineering Indian Institute of Technology, Bombay Mumbai 2015

Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Data Synchronization for edX Platform

M.Tech Thesis

Submitted in partial fulfillment of the requirements

of the degree of

Master of Technology

by

Alpesh Rathore

Roll Number: 113050057

Under the guidance of:

Prof. D.B. Phatak

Department of Computer Science and EngineeringIndian Institute of Technology, Bombay

Mumbai

2015

Page 2: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Abstract

EDX Platform is an open source platform for Massive Open Online Course (MOOC)which provides functionalities for faculties to run courses online and students fromacross the globe to register for those courses. Faculties can upload course contents,provide videos through Youtube or any other uploaded video. Additionally, facultiescan conduct online tests, quizzes, etc. Students, after registering for a course, getaccess to all the material provided by faculty. They can view videos, etc. providedby the faculties. They can upload assignments and are given facilities to give self as-sessment, peer assessment or example-based Artificial Intelligence assessment. Thereare forums as well, where students can discuss online and interact with many otherstudents who have registered in the course.EDX platform is very easy to setup and get running using Vagrant setup. Once theserver is up, users can start registering for courses and start learning. In the processthere is one central EDX server where all the data is kept and which is responsiblefor processing and completing all requests sent by various users. However, there isno process where a professor or a student from another college can start uploadingsome important material which he/she thinks can be useful for other members ofthe course. For allowing such a platform where there is an open place which can beustilized by students and professors across the globe to upload various contents, thereis a need of a tool that authenticates the user and uploads their content at particularlocation in edX platform.Here we are implementing a tool that allows an administrator to manage variouscolleges and servers across the globe so that they can be paired up for synchronizingfiles or data. Once the administrator is authorized, they can create colleges, whichcontain servers within them along with respective ip-addresses and further create di-rectory pairs which can be synched later on. Hence professors/students can sharetheir contents with the college administrator, who then uploaes their content on thecollege server which then gets synched to central server where it is visible to othermembers.This tool is a web application which can be utilized for any kind of synchroniza-tion between two or more servers with extra facilities, like, periodic synchronization,notification of failure, etc.

Page 3: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Contents

1 Introduction to File Synchronization and EDX Platform 41.1 File Synchronization Techniques [9, 8] . . . . . . . . . . . . . . . . . . 5

1.1.1 Tools for File Synchronization . . . . . . . . . . . . . . . . . . 51.2 EDX Platform [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Related Work And Background 102.1 Utilities over RSync . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 RSync Algorithm [9, 8] . . . . . . . . . . . . . . . . . . . . . . 122.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Architecture and Design 153.1 File Synchronization Admin Utility . . . . . . . . . . . . . . . . . . . 15

3.1.1 Database for File Synchronization Utility Server End . . . . . 173.1.2 File Synchronization Utility Architecture . . . . . . . . . . . . 193.1.3 Android Application Architecture . . . . . . . . . . . . . . . . 20

3.2 Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Implementation 234.1 Modular Approach for Implementation . . . . . . . . . . . . . . . . . 23

4.1.1 Web Application’s Modules . . . . . . . . . . . . . . . . . . . 234.2 Modules Detailed Implementation . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Authentication Module . . . . . . . . . . . . . . . . . . . . . . 244.2.2 Synchronization Module . . . . . . . . . . . . . . . . . . . . . 294.2.3 Sync History Module . . . . . . . . . . . . . . . . . . . . . . . 304.2.4 SSH Checker Module . . . . . . . . . . . . . . . . . . . . . . . 324.2.5 Reverse SSH Checker Module . . . . . . . . . . . . . . . . . . 344.2.6 Schedule Handler Module . . . . . . . . . . . . . . . . . . . . 364.2.7 Sync Now Module . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.8 Check Availability Module . . . . . . . . . . . . . . . . . . . . 374.2.9 Rsync command status Module . . . . . . . . . . . . . . . . . 384.2.10 Notifications Module . . . . . . . . . . . . . . . . . . . . . . . 404.2.11 Map Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3 Android Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1

Page 4: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

4.3.1 Android Sync Notification Module . . . . . . . . . . . . . . . 424.4 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4.1 Back End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.4.2 Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5 Setting up EDX Platform [4] . . . . . . . . . . . . . . . . . . . . . . . 46

5 Conclusion and Future Work 485.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2

Page 5: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

List of Figures

1.1 EDX Architecture diagram Source: ([5]) . . . . . . . . . . . . . . . . 71.2 ORA Architecture diagram Source:([10]) . . . . . . . . . . . . . . . . 8

2.1 Diagram showing RSync Algorithm Working . . . . . . . . . . . . . . 13

3.1 File Synchronization Between Colleges and Servers . . . . . . . . . . 163.2 Use case Diagram for File Synchronization Module . . . . . . . . . . 173.3 ER Diagram for File Synchronization Utility . . . . . . . . . . . . . . 183.4 Authentication Module Architecture . . . . . . . . . . . . . . . . . . 20

4.1 Authentication Screenshot . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Authentication Module Architecture . . . . . . . . . . . . . . . . . . 264.3 Synchronization Module Architecture . . . . . . . . . . . . . . . . . . 304.4 View History Screenshot . . . . . . . . . . . . . . . . . . . . . . . . . 314.5 SSH Checker Screenshot . . . . . . . . . . . . . . . . . . . . . . . . . 324.6 Reverse SSH Checker Screenshot . . . . . . . . . . . . . . . . . . . . 344.7 Reverse SSH Checker Module Architecture . . . . . . . . . . . . . . . 354.8 Schedule Handler Screenshot . . . . . . . . . . . . . . . . . . . . . . . 364.9 Check Availability Screenshot . . . . . . . . . . . . . . . . . . . . . . 384.10 Check Availability Module Architecture . . . . . . . . . . . . . . . . . 394.11 Map View Screenshot . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3

Page 6: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Chapter 1

Introduction to FileSynchronization and EDXPlatform

File synchronization is all about keeping two different locations on same or differentstoring devices synchronized with each other. There are two types of file synchroniza-tions:

1. Mirroring:Keeping one location sync’ed with another so that if some changes are made onone location, they are reflected on the other end, but NOT vice versa. This ba-sically means that if files are added or changed at one location these changes arereflected back on another location. Such synchronization is commonly known as”Mirroring”. Mirroring is mostly used in cases when one needs to keep backupof their files on some remote server, so that if their local machine crashes orlooses their important work, they have got it backed-up on the server.

2. Two Way:In the second type, two locations are sync’ed two-way, so that both of themare exactly identical if you open them any day. This means if some file isadded, deleted or modified at one place, changes get reflected back on theother location and vice-versa. Such synchronizations are not commonly usedfor backup purposes because if somebody tampers with the files at your backing-up location, the files at the other locations end up getting tampered as well.There is also an addition to above techniques which may be required in somecases and not in other. If a file is deleted at one location, is it really requiredto delete it from the other location as well? Since deleting is an operation thatis little more destructive, such a synchronization in which deletion of files ispropagated to the other location as well might not be required all the time.

4

Page 7: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

1.1 File Synchronization Techniques [9, 8]

Although File synchronization may sound little too simple, it definitely has its owncomplexities and tricks in the bag. This section explains some techniques that mayimprove performance to a big factor. Following are some things which are generallytaken care while talking about file synchronization as a whole:

1. Delta Differencing:Delta differencing is a technique which actually makes file synchronization awhole lot more than doing FTP between two remote machines. It means that ifthere is a change in one file at one location, its just the change that travels downthe network and other file is up-to-date’d based upon the changes alone. If wewere to send whole file owing to changes of a degree of one or two characters,it might seize the network bandwidth big time.

2. Security and encryption:Security is in itself one very important issue when it comes down to synchroniz-ing files over two separately located machines connected through at least onerouter that links them to the Internet.

3. Compression:If delta differencing is being performed by the underlying utility doing synchro-nization, it is already saving a lot of network bandwidth. But when it comes toadding big files and making bigger changes all the time, compressing the databefore sending it across the network saves some bandwidth.

4. Multiple locations syncing:There may be situations when one needs to replicate one location on morethan one other locations. One way to achieve this is to perform pair-wise syn-chronization between every combination of machines possible. But this maylead to inconsistencies among different pairs. Another way, which is adaptedby tools like Unison is to follow a star schema. In such schema, there is onecentral machine with which every other machine synchronizes. This solves theinconsistency problem.

Above are listed some of the critical things that may decide a lot about filesynchronization technique. In addition to those, there are some points thatare lesser critical but may be more of aesthetics for the synchronization tool.These things include the tool being able to show its progress, failures, successes,performance, bandwidth utilization, etc.

1.1.1 Tools for File Synchronization

There are some file synchronization tools which are available, some of them are:

(a) Unison:Unison is open source (under GNU licence) and works over both windows

5

Page 8: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

and linux platforms. It allows synchronization from both directions, i.e.,two-way synchronizations and key features like star schema for synchro-nizing multiple machines.

(b) RSync:RSync is more commonly used file synchronization tool which is also avail-able with IBM AIX distribution. RSync was initially launched for Linuxplatform but now has been ported for Windows as well. It is a very flex-ibly configurable utility software with an array of options to configure itto work according to one’s requirements. RSync has also got some toolswhich are built on top of it to provide GUI for the same. But most ofthem are not stable.

6

Page 9: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

1.2 EDX Platform [7]

EDX Platform is an open source platform which provides functionalities forstudents to register various courses registered by various universities across theworld. There is a series of weekly videos for every course, there are submissions,assessments, feedback, comments, notifications and various analytics of the dataand logs of EDX platform. Assessments can be one of the follwoing types:

(a) Self Assessment:Students self assess and grade their own submission.

(b) Peer Assessment:Students assess and grade submissions of other students.

(c) AI Assessment:This is example-based artificial-intelligence assessments.

The EDX platform is very much modular and each module has well definedand loosely coupled functionality, as shown in figure 1.1. This architecturecan be summarized as follows:

Figure 1.1: EDX Architecture diagram Source: ([5])

(a) CMS:CMS is Content Management System or EDX Studio. CMS helps in man-aging the content of a course. It provides facilities to add, delete or editcourseware.

(b) LMS:LMS is Learning Management System. Students interact to EDX through

7

Page 10: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

LMS, which provides facilities like quiz, comments, feedback, submission,and other interactions. It is also responsible for displaying contents on theweb page.

(c) Configuration:This module provides for configuration management when setting up newEDX platform. It uses Ansible for configuration management which is anopen source platform for configuring and managing machines.

(d) XBlock:XBlock module provides for creating courseware for a course. Coursewarefollows a hierarchical structure. A courseware may be considered as com-posed of varoius components, where each component may be as simple asa paragraph, input form, video ,etc, and as complex as a section, chapteror complete course.

(e) edx-ora2:edx-ora2 (Open Resource Assessers) module takes care of any assessmentrelated activity. It provides for faculties to ask open ended questions and toasses submissions made by students. Assessments may be self-assessments,peer assessments or example-based Artificial Intelligence assessments. Thisalso provides facility where faculties can train students on how to assessthe problem. This is shown in figure 1.2.

Figure 1.2: ORA Architecture diagram Source:([10])

(f) CS Comments Service:CS Comments Service module facilitates nested comments and voting.

(g) XQueue:XQueue module provides checking interface for LMS, so that when students

8

Page 11: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

makes any submission through LMS, the submission goes to XQueue whichmakes the submission assessed and graded by external service and sendsthe assessment back to LMS.

(h) XServer:XServer is responsible for taking code submissions taken by LMS and run-ning the code using courseware checkers.

(i) notifier:This module sends daily feeds from forums to students registered on theforums.

(j) Analytics Dashboard:Analytics Dashboard displays meta data about activities on their courses,like, enrollments, performance of students, etc.

(k) Analytics Pipeline:This module analyzes data from tracking logs and EDX databases and pro-vides analyzed information to outside world through edx-analytics-data-api.

9

Page 12: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Chapter 2

Related Work And Background

File synchronization is done for an array of scenarios and there are a numberof tools available for file synchronization. One such tool is RSync. Rsync toolcomes as a command in linux distribution. Rsync tool is one of the most widelyused tool for file synchronization. However, since this is a command based tool,a lot of utitilities have been built on top of RSync to give it a nice GUI. Theseutilities give interface to add directories to be synched and also give additionalfeatures like periodic synchronization, etc.

2.1 Utilities over RSync

This section provides a list of utilities that are built on top of RSync to providebetter GUI and additional features. They are as follows:

(a) Lucky Backup [11]:’Lucky Backup’ is a free desktop application which runs on top of RSyncand gives following features[1]:

i. Backup It helps in keeping backup of the data on some remote ma-chine, so that whenever any files are added, deleted, or modified, allthe files and changes get backed up on the remote machine.

ii. (Snapshots) User can take snapshot of the directory being backed up orsynched and store the snapshot. Such snapshots can later be recoveredand directory comes back to the same state as at the time of takingsnapshot.

iii. Sync User can sync multiple pairs of directories so that whenever thereis a change in the synched file, those changes get reflected back on otherlocations.

iv. Exclude Option To exclude one or more files based upon names orpattern of names.

v. Simulation This is a very powerful functionality, where if use is notsure of the outcome of running the RSync command, they can run

10

Page 13: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

a simulation of the command they are going to execute. Utility willproduce similar results as real RSync command but there won’t beany changes on either end of synchrnization. Once user is sure, theycan go forward with executing the command.

vi. Scheduling User can schedule sync using cronjobs.

(b) FlyBack [12]:FlyBack was initially built on top of RSync but is now created from scratch.It is more useful in cases where incremental changes of files need to bemaintained. It provides facility to backup incremental changes which canlater be retrieved back.

(c) Grsync [6]:Grsync is an opensource utility under GPL licence. It is built on topof RSync tool and provides for synching directories locally or over thenetwork. Grsync is also a desktop application and does not have anyweb interface to work with. Mostly all of the features provided by ’LuckyBackup’ tool are supported by Grsync as well. It has support for MACOS as well as windows version is alsoe available for Windows OS.

(d) Gadmin:Gadmin is another tool with almost same facilities as Grsync but does notsuppor other Operating Systems than Linux.

Although there are many softwares available for file synchronization, they aremostly desktop applications. In our work, we will be developing a web basedapplication which can easily be configured and provides various functionalitieswhich are needed for EDX platform synchronization. Benefit of using web basedapplication is that we can open web service end points which can be accessed toview various informations about the pair of servers, like, what is the differencebetween directories, what is the status of synchronization, etc. If a desktopapplication is used for file synchronization, a web application has to be createdseparately which provides other functionalities as web services.

2.2 Background

Our Application for file synchronization would be using RSync tool for provid-ing basic synchronization and would have other functionalities added. Wholeapplication can be broadly viewed as composed of two different functionali-ties: File Synchronization utility and Support for EDX platform with the FileSynchronization utility. This section will mostly cover the former part.

Most of the file synchronization uitilities are built on top of RSync tool. RSynctool adopts RSync algorithm for synchronizing files over two distinct directories.It is a very simple but network-efficient algorithm. For synchronization overnetwork to remote sites, RSync can be configured with SSH public keys so that

11

Page 14: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

communication between the two nodes can be set up without authenticationevery time.

2.2.1 RSync Algorithm [9, 8]

The RSync Algorithm was proposed by Andrew Tridgell and Paul Mackerrasin June 1996. The problem that the algorithm tries to solve is that if there aretwo versions of a file X and Y on two different nodes A and B (respectively)that are connected over (slow) network, then what is the most optimized way ofbringing the two versions of files X and Y to identical state ( let’s say Y is to beupdated to be equal to X ). Now, brute force technique would say just copy thefile X over the network on B and replace Y. But the problem is if the networkis slow enough and if it is choppy in nature, then copying whole file at once andwith all its content to be copied may not be very efficient. Worst case beingthat the network link becomes unavailable after 99% of file getting copied. Onesolution is to chunk the file down into pieces. Then send only those chunk thoseare not yet copied. But still we are copying the whole file over. The RSyncalgorithm given by Tridgell and Paul Mackerras is very efficient for networkbandwidth since it transfers only those parts of files which have been updatedand not the complete file. The major problem in such technique is that if weneed to find difference between two files, they need to be physically present onthe same machine. And if they are located on the same machine they do notneed to be copied further, because the file has already been copied from remoteserver to make it available locally for finding the difference. Thus, the purposeis lost. For this problem, the RSync algorithm uses following technique:Setup:

(a) File X on node A has to be made consistent with file Y on node B

(b) Node A and B are connected over (slow) network.

Algorithm: RSync uses Delta Differencing as explained in ’Introduction’ sec-tion. It divides file into blocks and calculates checksum of blocks. It thenfinds out what blocks need to be transfered because of changes by sending theseblocks over network to other machine. Overall algorithm can be listed as below:

(a) B splits the file Y into a series of non-overlapping fixed-sized chunks of sizes. Last chunk need not be size s.

(b) B finds two checksums of each chunk, a weak checksum, i.e., rolling 32 bitchecksum and a strong 128 bit MD5 checksum.

(c) B sends checksums to A.

(d) A finds checksums for every possible chunk if file X of size s. It comparesweak checksum for every chunk with list of checksums sent over by B. Oncea chunk (of X) is qualified for having same checksum with another chunk

12

Page 15: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

sent by B then it has to qualify for strong checksum with the same chunk.If both weak and strong (checksum match, then the chunk is considered tobe the same. Although, matching the checksums does not guarantee theblocks to be same, but there is negligible chances that two different chunkshave same weak as well as strong checksums.

(e) Based upon above comparisons, A sends (to B) the offsets of chunks whichmatched and at what place those chunks are to be fit in Y. In addition tomatched blocks, A also sends ”literal” data, which is the data which didnot match any block sent by B.

(f) B, upon receiving the chunks with their corresponding offsets ( there wherethe chunk is to be inserted ) and the literal data with their correspondingoffsets reconstructs the file Y, which is now same as X on A.

These steps are summarized in the figure 2.1.

Figure 2.1: Diagram showing RSync Algorithm Working

Rolling Checksum: Major time consuming step in above step is comparingthe checksums sent over by B for every possible combination of chunks of size son A. But, if the rolling checksum used in the algorithm has following propertythen this comparison step boils down to one parse of the file X: Calculating thechecksum for a buffer X(2)...X(n+1) is a cheap operation given checksum forbuffer X(1)...X(n), X(1) and X(n+1).

2.3 Problem Statement

Design a web-application based solution for full-fledged File Synchronizationover a distributed set of servers in various colleges so that Faculty or students

13

Page 16: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

or other members of edX course can contribute to the course content which canbe verified and synched to central server from where it becomes available tomembers of the course across teh globe, along with additional facilities, like,viewing history, periodic synchroinzation, notification, etc.

14

Page 17: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Chapter 3

Architecture and Design

This chapter mainly focuses on the architecture that will be follwoed in order toprovide for full-fledged file synchronization utility which can serve the purposeof distributing/synching data across various college campuses, as shown in figure3.1.

Overall architecture can be viewed as shown in figure 3.1.

As figure shows, There are various pairs of servers that are spread across variouscollege campuses across the globe. They all have directories that are paired upwith different servers and once the files are kept in these paired directories theycan be synched manually by administrator or the directories can be scheduledto sync automatically at regular intervals.

3.1 File Synchronization Admin Utility

File Synchronization Admin Utility is a standalone web-based application whichprovides a web interface for managing servers and synchronization among them.

Following are the operations which admin should be able to perform using thisutility:

(a) Login: Admin should be able to login into web interface using usernameand password.

(b) Manage Colleges: Admi nshould be able to add or remove various collegeswhich further contain one or many servers.

(c) Manage Servers: Admin should be able to add or remove various serverswith which hosting server wants to sync.

(d) Manage Directories: Admin should be able to manage directories pairswhich need to be synched.

(e) Manage Periodic Synchronization: Admin should be able to set periodicrefreshes that should be done at regular interval, like, two directories need

15

Page 18: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 3.1: File Synchronization Between Colleges and Servers

to be synched every 1 hour, n hours or m days, etc. He should also be ableto synch instantly.

(f) View Status: Admin should be able to view status of various servers anddirecotries. Example, which direcotries are synched and which need to besynched. He should also get to know how much time remains for eachdirectory to get synced automatically. Admin should also view currentprogress of the synching process if any of them is in progress.

For handling above functionalities, the file synchronization will connect to MySql

16

Page 19: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 3.2: Use case Diagram for File Synchronization Module

database for informations like admin username and password, servers, directo-ries to sync, and various informations related to one directory, like, last synched,last status, last failed, etc.

3.1.1 Database for File Synchronization Utility ServerEnd

ER-diagram for database is shown in the figure 3.3.

For handling database following schemas can be used:

(a) userinfo ( loginId varchar(11) NOT NULL,loginPassword varchar(30) NOT NULL DEFAULT ’password’,fullName varchar(40) DEFAULT ’User Name’,userRole varchar(20) NOT NULL DEFAULT ’user’,PRIMARY KEY (loginId) )

(b) collegedata ( collegeid int(11) NOT NULL AUTO INCREMENT,collegename varchar(50) NOT NULL DEFAULT ’College Name’,

17

Page 20: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 3.3: ER Diagram for File Synchronization Utility

collegeaddress varchar(60) NOT NULL DEFAULT ’College Address’,collegelat double DEFAULT NULL,collegelon double DEFAULT NULL,PRIMARY KEY (collegeid) )

(c) serverdata ( serverid int(11) NOT NULL AUTO INCREMENT,servername varchar(30) NOT NULL DEFAULT ’Server Name’,servercollege int(11) NOT NULL,serverlat double NOT NULL,serverlon double NOT NULL,serverip varchar(30) NOT NULL DEFAULT ’localhost’,serverappport int(11) NOT NULL DEFAULT ’8181’,PRIMARY KEY (serverid),UNIQUE KEY serverip (serverip) )

(d) directoriesinfo ( directoryid int(11) NOT NULL AUTO INCREMENT,directoryPath varchar(70) NOT NULL,directoryServer int(11) NOT NULL,PRIMARY KEY (directoryid) )

(e) directoriesrelation ( dirrelid int(11) NOT NULL AUTO INCREMENT,

18

Page 21: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

sourcedirectoryid int(11) NOT NULL,destinationdirectoryid int(11) NOT NULL,scheduletype varchar(30) NOT NULL DEFAULT ’None’,scheduleinterval int(11) NOT NULL DEFAULT ’1’,isactive tinyint(1) NOT NULL DEFAULT ’1’,PRIMARY KEY (dirrelid) )

(f) synchistory ( syncid int(11) NOT NULL AUTO INCREMENT,directoriesrelationid int(11) NOT NULL,syncstarttime datetime NOT NULL DEFAULT CURRENT TIMESTAMP,syncendtime datetime DEFAULT NULL,synclogs longtext,lastUpdated datetime DEFAULT NULL,iskilled tinyint(1) NOT NULL DEFAULT ’0’,processstatus varchar(30) DEFAULT NULL,PRIMARY KEY (syncid) )

(g) notificationinfo ( notificationid int(11) NOT NULL AUTO INCREMENT,synchistoryid int(11) NOT NULL,notificationtext varchar(300) DEFAULT NULL,notificationcreated datetime NOT NULL DEFAULT CURRENT TIMESTAMP,isread tinyint(1) NOT NULL DEFAULT ’0’,notificationcode int(11) NOT NULL DEFAULT ’1’,PRIMARY KEY (notificationid) )

3.1.2 File Synchronization Utility Architecture

As shown in figure 3.4, front end of the utility runs on the browser whichdisplays all the controls through which user can perform different actions whichget performed on the server. All the communication between front end andserver is through JSP/Servlets and RESTful services. Additionally, server maycommunicate to another server through HttpClient library in order to get someinformation, like:

(a) Get Availability: This functionality serves the purpose of getting avaiala-bility of a paired directory on a remote server. So, web application’s servercommunicates to the remote server and calls a RESTful service which thenreturns whether the specified directory is present on that remote server ornot.

(b) Reverse SSH Reachable: This functionality serves the pupose of gettingconnectivity from a remote server to web application’s server through SSH.For this a RESTful service is called on remote server which in turn triesto set up a SSH connection with web application’s server and returns trueor false depending upon wether it was able to set up the connection suc-cessfully or not.

19

Page 22: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 3.4: Authentication Module Architecture

3.1.3 Android Application Architecture

This section provides architecture of Android Application.

Android Application Overview

The Android application is an extra utility that can be configured to connectto the server and start controlling sync or view history of previous instances ofsync process or get notification for failed sync processes on the fly. This givesflexibility as well as quick alerts to the admin, flexibility in the sense that admincan start sync process from mobile device.

Android Application Architecture

As shown in figure 3.4, Android application connects to the server throughHttpClient library. Once it establishes the connection it authenticates the userthrough Moodle or web application’s local database depending upon what userselects. All the data communication between Android application and the server

20

Page 23: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

is through RESTful service calls where Android application is the client whichreads data from server and server responds with the appropriate response.

3.2 Use Case

This section gives an overall picture of above explained sections and subsections.Following are the stps:

3.2.1 Setup

(a) Admin logs into the web application through local database or throughMoodle.

(b) Admin creates a list of colleges and servers within colleges along with theircorresponding ip-addresses.

(c) Admin selects each server one by one and creates directory pairs that needto be synched.

(d) For each directory pair admin might select a period for regular sync processor may keep it to ’None’.

3.2.2 Use

(a) Admin logs into the web application through local database or throughMoodle.

(b) Admin selects a college to work on

(c) Admin selects a server within the college to work on

(d) Admin sees list of local source and local destination directories.

(e) Admin selects a local source or local destination directory to work on.

(f) Admin selects a pair directory for selected local directory to work on.

(g) Admin can perform following actions

� Select “View History”: This shows a list of timestamps when syncprocess took place. Admin selects one of the timestamps to see corre-sponding process’s logs.

� Select “Sync Now”: When admin clicks “Sync Now” button, a REST-ful service is called which starts a sync process on selected pair ofdirectories only if there is no other synching process in progress forthat pair of directories.

� Select “Delete Pair”: When admin clicks “Delete Pair”, a popup ap-pears to confirm if admin really wants to delete the pair. If admin se-lects “Cancel”, nothing happens and if admin selects “Ok”, a RESTfulservice is called which deletes the selected pair of directories.

21

Page 24: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(h) Admin selects “Map View”: If admin selects “Map View”, they are takento another page where they can see the servers on map and also they cansee a blue stroke for servers where sync is in progress. This gives a betteridea of synching servers to the admin on map.

(i) Adming selects “Notifications Icon”: If admin selects “Notificatoins Icon”,a popup opens up which shows failed sync processes along with the serversand directories for which process failed.

(j) Admin performs changes in server settings, like, Admin’s name, Admin’spassword, web application’s port, etc.

22

Page 25: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Chapter 4

Implementation

In this chapter, we will look at various techiniques/technologies that are usedfor implementing File Synchronization Admin Utility web application.

4.1 Modular Approach for Implementation

Whole application is divided into various modules which perform dirrerent tasksand coordinate together in order to serve the purpose as a whole.

4.1.1 Web Application’s Modules

Following is the list of all modules of application along with their correspondingresponsibilities.

� Authentication Module: This module contains two parts:

(a) Local MySQL Database Authentication: User is authenticated fromlocal MySQL database.

(b) Moodle Authentication: User is authenticated from Moodle.

� Synchronization Module: This module is the core module which takescare of synching two directory paths along with book-keeping every syncprocess’s information.

� Sync History Module: This module takes care of maintaining the historyof synchironization of various directory pairs and providing end point tothe web application’s front end as RESTful service.

� SSH Checker Module: This module serves the purpose of checking whethera remote server can be connected through SSH from web application’sserver and opens an end point for web application’s front end as RESTfulservice.

23

Page 26: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

� Reverse SSH Checker Module: This module serves the purpose of checkingwhether web application’s server can be connected from a remote serverthrough SSH and opens an end point for web application’s front end asRESTful service.

� Schedule Handler Module: This module takes the responsibility of regu-larly checking all directory pairs if they have a periodic synchronizaion setup for themselves and if it figures out that currently more time has elapsedthan what the period for each directory pair is configured for, it starts anew sync process for corresponding directory pair. It also takes care thatit does not start a sync process for already in-progress sync processes.

� Sync Now Module: This module gives the flexibility to admin to selecta pair of directories and start a sync process for that pair of directoriesstraight away. It takes care that it does not start a sync process for alreadyin-progress sync processes.

� Check Availability Module: This module is responsible for checking if adirectory is available locally or remotely as well.

� Rsync command status Module: This module constantly checks status ofa sync process which is in progress and if has been running for too longwithout generating any logs then it kills the process so that such danglingprocesses do not keep on clustering on the server which are waiting for saysome input from user or are simply not responding for long time.

� Notifications Module: This module is responsible for creating notificationsin database which are then fetched by web application’s front end and dis-played to the admin in Notification’s tray or shown in Android applicationas Notifications.

� Map Module: This module is responsible for taking latitude-longitude in-formation from database and passing it on to front end where a map isshown to the admin along with markers for each server. This also sendsinformation about current in-progress sync processes which are displayedas blue strokes on the map on front end.

4.2 Modules Detailed Implementation

This section explains in detail about implementation of various modules thatare mentioned in above section. It explains database, back end logic, front endlogic and integration of these three components together for each module.

4.2.1 Authentication Module

Purpose

User Authentication. This is described in Figure 4.1.

24

Page 27: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.1: Authentication Screenshot

Implementation

As discussed above, this module has two authentication methods, as describedin figure 4.2:

� Local MySQL Authentication: This module consists of a database table:CREATE TABLE IF NOT EXISTS ‘userinfo‘ (‘loginId‘ varchar(11) NOT NULL,‘loginPassword‘ varchar(30) NOT NULL DEFAULT ’password’,‘fullName‘ varchar(40) DEFAULT ’User Name’,‘userRole‘ varchar(20) NOT NULL DEFAULT ’user’,PRIMARY KEY (‘loginId‘)) ENGINE=InnoDB DEFAULT CHARSET=latin1;Application contains a Spring servlet-mapping ’/LoginValidator’ whichconnects with the database and finds out if supplied username and pass-word exist in the database Based upon results it redirects the page to either’/home’ or ’/index’. The database table also contains a ’userRole’ column,this column gives flexibility of giving different kinds of access to differentroles of users in future.

25

Page 28: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.2: Authentication Module Architecture

� Moodle Authentication: This module accesses Moodle’s external web ser-vice in order to authenticate a user. Following strategy is used:For adding authentication through Moodle following steps were followed:

– Installed Moodle on Virtual Machine by following steps from [3]Brief steps are as follows:

* Update apt-get repository. This step is necessary in order to getall the components with latest versionsudo apt-get update

* Install apache server:sudo apt-get install apache2

* Install mysql client and server:sudo apt-get install mysql-client mysql-server

* Install php5:sud apt-get install php5

* Install additional softwares:sudo apt-get install graphviz aspell php5-pspell php5-curl php5-gdphp5-intl php5-mysql php5-xmlrpc php5-ldap

26

Page 29: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

* Restart apache service:sudo service apache2 restart

* Install git. This is required so that we can use git clone commandto get the Moodle directory from git repository.sudo apt-get install git-core

* Download Moodle:

· cd /opt

· sudo git clone git://git.moodle.org/moodle.git

· cd moodle

· sudo git branch -a

· sudo git branch –track MOODLE 26 STABLE origin/MOODLE 26 STABLE

· sudo git checkout MOODLE 26 STABLE

* Copy local repository to /var/www/html/

· sudo cp -R /opt/moodle /var/www/html/

· sudo mkdir /var/moodledata

· sudo chown -R www-data /var/moodledata

· sudo chmod -R 777 /var/moodledata

· sudo chmod -R 0755 /var/www/html/moodle

* Setup MySQL Server:

· sudo vi /etc/mysql/my.cnfInside this file add ’default-storage-engine = innodb’ inside [mysqld]section and create this section if it is not already present

· sudo service mysql restart

· mysql -u root -p

· Inside MySQL shell type following command:CREATE DATABASE moodle DEFAULT CHARACTER SETutf8 COLLATE utf8 unicode ci;GRANT SELECT,INSERT,UPDATE,DELETE,CREATE,CREATETEMPORARY TABLES,DROP,INDEX,ALTER ON moodle.*TO moodledude@localhost IDENTIFIED BY ’passwordformoo-dledude’;quit

* Complete Setup:sudo chmod -R 777 /var/www/moodlesudo chmod -R 0755 /var/www/moodle

* Test Setup:Type following in Browser:http://IP.ADDRESS.OF.SERVER/moodle

27

Page 30: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

– Added a user ”Student1” in Moodle.

– Added an external web service named ”AuthenticatorService”.

– Assigned ”Student1” and ”admin” to privileged users of ”Authentica-torService”.

– Enabled ”webservices” in Site Administration ¿ Plugins ¿ WebservicesNOTE - Some changes needed to be done in Moodle’s database in”mdl external services” table because at the time of creating externalweb service through Moodle’s web UI it does not ask for ”shortname”of the external service but at the time of accessing web service it asksfor web service’s shortname (which is NULL in database initially). Forthis following steps were followed:

* Identified the id of newly created external web service throughfollowing qeury:SELECT * FROM mdl external services;This query’s result for ”AuthenticatorService” was id=2.

* Updated the ”shortname” for ”AuthenticatorService”:UPDATEmdl external services SET shortname=’authenticateservice’WHERE id=2;

– After setting up Moodle as described above SyncApp’s ”loginValida-tor” was modified in following way to authenticate users:

* It takes an extra parameter ”isMoodle” in order to specify whetherto authenticate through Moodle or through local MySQL database.

* If ”isMoodle” is false, user is authenticated normally from MySQLdatabase.

* If ”isMoodle” is true, HttpClient is used to fire following RESTcall:http://localhost/moodle/login/token.php?username=<username>&password=<userpasswservice=authenticateservice

* Above REST service’s response’s behavior is as follows:

· If ”username” and ”userpassword” are registered in Moodle thenit returns:token: <token>

· If ”username” and/or ”userpassword” is wrong then it returns:”error”:”The username was not found in the database”,”stacktrace”:null,”debuginfo”:null,”reproductionlink”:null

* By parsing the response of ”AuthenticatorService” ”loginValida-tor” web service decides whether the user is valid registered userof Moodle or not and return true or false correspondingy.

28

Page 31: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

4.2.2 Synchronization Module

Purpose

Core module which starts a sync process and does book-keeping for the process.

Implementation

Module that spawns a new process for synchronizing two directories has beenimplemented for basic synchronization. Java creates a new process using Run-time.getRuntime().exec() command. Java method that calls this method takestwo araguments, i.e., source directory path and destination directory path.Upon receiving these two parameters, it spawns a new process that calls abatch file which in turn takes two arguments which is same as above directorypaths. This batch file then calls ”rsync” command on these two directories tostart the sync process. Once the process has started, Java module takes inputstream of the batch file’s process as one of the variables in itself. This is donethrough Process.getInputStream() method so that input stream of the processgets attached to one of the InputStream variable in Java. Once we attach In-putStream variable with Process’s input stream, whatever output of processis generated gets streamed into the java InputStream variable. Next we useBufferedReader to collect output of the process to collect stream’s data intoone line of string. Advantage of using BufferedReader is that we can performoperations on each line of process’s output instead of each character and also theoutput of process does not need to be read in real time thus buffering and read-ing will not hamper performance or functionality by any means. Additionally,We have created a new thread for every request that comes for synchronizing adirectory. Purpose of such approach is that if there is a big directory for whichwe start synchronization process, it will take a lot of time to complete. If suchsynchronization process was started through AJAX request from browser, thenthis request will take a lot of time to complete. Hence, if we do not spawn anew thread, request will keep on waiting for the synchronization to complete.Such a request that takes lot of time to complete are not good practice and alsothere are request time outs that may occur if synchronization takes very longto complete. This affects user experience. Therefore, main controller servlettakes the request and spawns a new thread that takes care of spawning a newprocess of synchronization and also takes care of logging, etc. The main servlet,after spawning the thread returns back the response to the user which is id of”synchistory” table’s new entry. The overall architecture is described in figure4.3

29

Page 32: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.3: Synchronization Module Architecture

4.2.3 Sync History Module

Purpose

It maintains history of all sync processes. This is described in Figure 4.4.

Implementation

Database for this module is as follwos:CREATE TABLE IF NOT EXISTS ‘synchistory‘ (‘syncid‘ int(11) NOT NULL AUTO INCREMENT,‘directoriesrelationid‘ int(11) NOT NULL,‘syncstarttime‘ datetime NOT NULL DEFAULT CURRENT TIMESTAMP,

30

Page 33: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.4: View History Screenshot

‘syncendtime‘ datetime DEFAULT NULL,‘synclogs‘ longtext,‘lastUpdated‘ datetime DEFAULT NULL,‘iskilled‘ tinyint(1) NOT NULL DEFAULT ’0’,‘processstatus‘ varchar(30) DEFAULT NULL,PRIMARY KEY (‘syncid‘)) ENGINE=InnoDB DEFAULT CHARSET=latin1;

This table is used by Sychronization Status Module in order to see which pro-cesses are in progress and which have not updated any log since long time(through ’lastUpdated’ column’s value). It also contains a column ’isKilled’which means that Synchronization Status Module killed corresponding processbecause it was not updating any logs since long time while ’syncendtime’ entryis still NULL which means it is not finished.

31

Page 34: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.5: SSH Checker Screenshot

4.2.4 SSH Checker Module

Purpose

It checks if a remote server is reachable through SSH from web application’sserver. This is described in Figure 4.5.

Implementation

Written a module in spring controller which is basically a RESTful service andserves the purpose of telling the client whether a server (ip-address) is ’ssh-reachable’ or not.SSH-Reachable:Calling a server ’ssh-reachable’ if current server can communicate with remoteserver through ssh. This includes 3 parameters:

(a) Current server should have ssh enabled.

(b) Remote server should have ssh enabled.

32

Page 35: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(c) Remote server’s public ssh key should be included in the ” /.ssh/authorized keys”file in current server.

ssh is required because ”rsync” works over ssh to communicate between masterand slave servers. Point 3 is essential because if remote server’s public ssh keyis not included in above mentioned file, then, slave will not be able to loginthrough ssh without master’s password, and if the key is authorized then slavecan login into master through ssh without entering passowrd. Therefore, inorder for ”rsync” to communicate over ssh, remote server’s public key must beincluded in the list of authorized keys of current server. Multiple keys can beentered in ” /.ssh/authorized keys” file, where each key is entered in new line.Hence the job of ”ssh-checker” is to get the ip-address of a server (or hostname),try to connect to that server through ssh and return 0 or 1 based upon whetherit was able to connect to server or not. But this involves some complexities.First of all, there is some latency involved in trying to connect to the serverthrough ssh. So, we cannot be sure if the server is unreachable therefore nooutput is coming from ”ssh” command or if it is because of the latency. Forsolving this, following strategy is used:

� When a request for ”ssh-checker” comes, ”ssh-checker” spawns a newthread ”SshChecker”.

� ”SshChecker” thread launches a new process (through Runtime.getRuntime())and establishes IPC to this process through InputStream and Output-Stream.

� After spawning new thread, main thread of ”ssh-checker” calls join(intmillis) method on this newly created thread. join() method makes the mainthread to wait for child thread to finish its job. But if we specify ”millis”argument in join() method, then main thread will wait for only that manymilliseconds. So, by default main thread is waiting for 10 seconds (thiscan be configured) for child thread to finish.

� Child thread reads InputStream of spawned process which in turn calls”ssh” on specified server. ”ssh” command does following:

(a) If public ssh key of local server is included in authorized keys of re-mote server, then it gives some output on it’s output stream (which isgenerally displayed on terminal when it is called from terminal.

(b) Java thread that spawned this process waits for at least 2 lines toappear in Process’s input stream. If lines appear, it can return thatssh connection is possible to remote server.

(c) But if ssh key of local server is not included in the authorized keysof remote server, then ssh waits of user to enter password of remoteserver. And in such case, there is no output on the output stream ofthe process. Thus the child thread keeps waiting for input to come onprocess’s input stream.

33

Page 36: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(d) Main thread after spawning the child thread waits for 10 seconds (con-figurable) to return. But if ssh is not able to connect, child thread can-not return and after 10 seconds main thread calls interrupt() methodon child thread and returns 0, which means remote server cannot bereached through ssh.

� Based upon the result of ”ssh-checker” service call (0/1), web UI rendersserver as green (successful in connecting through ssh) or red (unsuccessfulin connecting through ssh).

4.2.5 Reverse SSH Checker Module

Figure 4.6: Reverse SSH Checker Screenshot

Purpose

This module checks if remote server can communicate to web application’s serverthrough SSH or not. This is described in Figure 4.6.

34

Page 37: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Impelementation

This module checks if reverse ssh works for remote server to local server or not.For this, I have created a restful server ”reverseSshChecker” which accepts anipaddress and a port. For this ipAddress and port, this service uses ApacheHttpClient library to fire call on the ”Synchronization” application runningon the remote server for ”sshChecker”. So, ”reverseSshChecker” in turn calls”sshChecker” of remote server and forwards the results as it is to the browser.However, ”sshChecker” itself needs the ipAddress of the server with which sshconnection needs to be checked. For this ipAddress, ”reverseSshChecker” takeslocal ip and sends it to ”sshChecker” of remote server. This ipAddress can betaken via NetworkInterfaces.getNetworkInterfaces() method and then applyingisSiteLocal() method on returned interfaces. Architecture is described in figure4.7

Figure 4.7: Reverse SSH Checker Module Architecture

35

Page 38: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.8: Schedule Handler Screenshot

4.2.6 Schedule Handler Module

Purpose

This module handles scheduling of sync processes. This is described in Figure4.8.

Implementation

� It scans all the directory pairs which need to be synched. After gettingthe ids of all relations, it looks for latest entry of each id in ”synchistory”table.

� For each entry in ”synchistory”, it checks if end time of the sync is NULLor not. If it is NULL, that means sync is still in progress.

� If the end time value is not NULL, then it retrieves the end time valuefrom ”synchistory” and matches it with the ”syncinterval” of current pairand finds out if current time has surpassed the interval at which it neededto sync current pair of directories. If yes, it launches a new thread which

36

Page 39: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

starts synching two directories, if not, it simply discards current pair andcontinues the same for other pairs of directories.

� It performs these steps at an interval of 1 minute. This is accomplishedby using Thread.sleep(60000) method on the ”ScheduleHandler” thread.

4.2.7 Sync Now Module

Purpose

This module helps admin to start a sync process for a directory pair straightaway.

Implementation

”Sync Now” facility first checks if SSH connection to remote server is possibleor not. If the connection is not possible, it does not try to sync two directories.On the other hand if SSH connection to remote server is possible then it startsthe sync process as normal through Synchronization Module.

4.2.8 Check Availability Module

Purpose

This module checks if a directory is present locally or remotely. This is describedin Figure 4.9.

Implementation

As shown in figure 4.10, this module checks availability of local and/or remotedirectories. Admin has rights to create directory pairs which need to be synchedand which may be present locally or remotely. Once a directory pair is createdthere are two possibilities:

� Selected directory is present locally: In such cases, while sending the listof directories which are present locally it checks if the directory exists ornot and if it is a directory and not a file.

� Selected directory is present remotely: In such cases, there’s a button”Check Availability” on UI which when clicked calls a service ”/isRemote-DirectoryExist” for every such directory and pass their corresponding di-rectory path. This service then takes the directory path and composes anew HttpClient request with the directory path and calls ”/isLocalDirec-toryExist” service on remote server. This service checks if given directorypath exists and is directory and returns the results.

37

Page 40: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.9: Check Availability Screenshot

4.2.9 Rsync command status Module

Purpose

Monitor all the sync processes and kill if any of them is stuck for long timewithout doing any activity.

Implementation

This module is responsible for checking if a rsync process was stated by theapplication then is it still running or something went wrong with it. For thisfollowing strategy is applied:

� Whenever a new process is initiated (for rsync), a new entry is added inthe ”synchistory” table with a new id and ”syncstarttime” as current timeand other entries null.

� For every process that is initiated (for rsync), a new thread (PrcossSta-tusChecker Thread) is launched.

38

Page 41: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.10: Check Availability Module Architecture

� ”ProcessStatusChecker” Thread loops through and for every iteration (witha time delay of 10 seconds) it checks if the process’s ”syncendtime” entryexists in the ”synchistory” table or not. If it does that means the processhas completed and it simply kills itself.

� If ”syncendtime” entry does not exist that means the process is still run-ning.

� Rsync process launcher module keeps reading the outputs generated fromthe process and keeps updating the new line in ”synclog” column of ”syn-chistory” table for that process’s row. Along with the ”synclog”, it alsoupdates ”lastupdated” column to current timestamp so that we keep trackof when was last input available from corresonding process. Rsync com-mand’s property is that it keeps writing to standard output every secondand updates what is the file being transferred and how much of it has beentransferred in percentage. So, if a process is running successfully then”lastupdated” column for that process will never cross 1 second or so fromcurrent time.

� If ”ProcessStatusChecker” thread finds ”syncendtime” column null thenit looks for ”lastupdated” column. If ”lastupdated” column is null that

39

Page 42: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

means no output has been given by the process yet. In such case it cal-culates the difference between current time and ”syncstarttime” and if itexceeds 120 seconds, it kills the process and itself along with making theentry for ”iskilled” column as ”true” in ”synchistory” table’s row for thatprocess.

� Similarly, if ”ProcessStatusChecker” thread finds ”syncendtime” columnnull and ”lastupdated” column is not null then it finds difference betweencurrent time and ”lastupdated” time and proceeds as defined in step 6.

� At the end of process, p.exitValue() is checked which is 0 if the process issuccessful and non-zero if there was some error. ”Process Launcher” mod-ule checks exitValue() and if it is 0 then it sets ”processstatus” column’svalue as ”Successful” in the ”synchistory” table’s corresponding row andsets the value to ”Failed: ¡exitValue¿” if exitValue() is not 0.

4.2.10 Notifications Module

Pupose

This module is responsible for managing database for various notifications thatneed to be generated for failed sync processes.

Implementation

Database table concerned for this module is as follows:CREATE TABLE IF NOT EXISTS ‘notificationinfo‘ (‘notificationid‘ int(11) NOT NULL AUTO INCREMENT,‘synchistoryid‘ int(11) NOT NULL,‘notificationtext‘ varchar(300) DEFAULT NULL,‘notificationcreated‘ datetime NOT NULL DEFAULT CURRENT TIMESTAMP,‘isread‘ tinyint(1) NOT NULL DEFAULT ’0’,‘notificationcode‘ int(11) NOT NULL DEFAULT ’1’,PRIMARY KEY (‘notificationid‘)) ENGINE=InnoDB DEFAULT CHARSET=latin1;Whenever a process fails (rsync command fails and returns a non-zero exit-Value()), an entry is made into the ”notificationinfo” table in the database.”synchistoryid” is foreign key to ”synchistory” table’s primary key. This col-umn will link each notification to the process which it corresponds to. ”noti-ficationtext” is the text that is to be displayed in ”Notifications Box” on UI.This contains the source and destination directory paths along with the remoteserver’s name and date at which this notification was created.

40

Page 43: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Figure 4.11: Map View Screenshot

4.2.11 Map Module

Purpose

This module handles the Map View on front end. This is described in Figure4.11.

Implementation

This module has following implementation details:

(a) When user clicks on ”Map View” button on UI then manageMap.jsp isopened.

(b) On this page user can see markers for each server that are configured inthe database.

(c) All the directories pairs for which sync is ongoing, a blue stroke is shownbetween the servers where they exist. This gives admin an idea aboutongoing sync processes.

41

Page 44: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

4.3 Android Application

Android application is an additional utility which provides flexibility to adminto start a new process through Android device. Admin can also view previoushistory of various sync processes. Additionally, Android application generates anotification for every failed sync process so that admin is intimated about anyfailure on the fly.An android app has been created which does the following:

� Authenticates user on the first screen (through web application’s localMySQL database or through Moodle).

� After authenticating it shows all the colleges as a ListView that have beenconfigured by admin in the database using Spring RESTful service callthrough HttpClient library of java.

� When user selects a college, on the same activity user can see all the serversthat have been configured by admin for that college in the database usingSpring RESTful service call through HttpClient library of java.

� After server selection user sees all the local directories for that particularserver in a ListView.

� Once user selects one of the local directories, another ListView below pop-ulates the paired directories for the selected directory in above ListView.

� After user selects a remote directory, next activity is shown where user cansee two buttons ”View History” and ”Sync Now”.

� If user selects ”View History” button a ListView is populated below thebuttons where user can see a list of all previous sync processes denoted bytheir start time.

� When user selects one of the history tiemes from ListView he can see thelogs for that sync process below the ListView.

� If user selects ”Sync Now” button, a new sync process will be started forthe selected directory pairs.

4.3.1 Android Sync Notification Module

Purpose

This module generates notifications for any failed notifications.

Implementation

Along with above functionalities, the Android app has an additional featurewherein a background service keeps running and throws an HttpClient requestto one of the RESTful services on the server. This RESTful service returns all

42

Page 45: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

the failed sync history items. Upon receiving these items, Notification Modulegenerates a notification in the Notifications Pane of the Android device. Thisserves the purpose of notifying admin if a sync process fails.

4.4 Technologies Used

It is a web application which provides UI for admins to login into the sys-tem and manage various servers and directories to be synched. Following isthe description of technologies used for slave side implementation of the webapplication:

4.4.1 Back End

Back end consists of the logics that process the request, connect to databaseand apply business logics to generate a response. For back end of the syncherweb application we will be using MVC Architecture.

MVC Architecture

MVC architecture is most widely used for managing requests and responses inweb applications. It divides the application into three major components:

(a) Model: Model is responsible for business logic and connecting to thedatabase.

(b) View: Views are the web pages that are displayed to user.

(c) Controller: It communicates with model and finds out where the controlshould flow.

Alternatives: There is another well known design patter Model-View-Presenterthat is used to design back end of web applications. MVP design has follwoingcomponents:

(a) Model: Same as MVC Model. Handles business logic and database con-nection.

(b) View: Same as MVC View. Presents UI to user on the basis of valuessupplied by Presenter.

(c) Presenter: Connects with Model to supply values to View which thenconstructs page on the basis of supplied values to present to the user.

Advantages: Advantages of using MVC architecture:

(a) Easy to manage database connection than normal web application.

43

Page 46: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(b) Modular approach.

(c) Does not decouple Presenter and View completely, unlike MVP architec-ture, so that the web application is more understandable for maintainancepeople.

(d) Separates the responsibilities very nicely, ie., user experience on views,all business logic and database operations on model and flow control oncontrols.

Spring MVC

Spring framework is an open source application framework for Java. SpringMVC framework extends Spring framework to support web applications, includ-ing web pages and web services. Spring framework gives Dependency Injectionwhich emphasizes on least dependency among various classes. Although thisdoes not impact user experienc, but can be very powerful tool when it comesto managing and debugging large projects. Spring focuses on keeping variousclasses independent of each other so that they can be tested individually. Foran example of dependency injection, If we need another class’s object to workwith in our class, we do not instatntiate the object of other class in our class,instead, leave the responsibility of instantiating and configuring another classto the user of our class. This way, we remove the dependency on other classesfor our class.Alternatives:

(a) EJB (Enterprise Java Beans) was earlier used for implementing web appli-cation’s back end.

(b) Struts Framework: Struts also provides a good framework for buildingweb applications. But it enforces lot of contraints on implementation andapplication ends up getting stuck to using Struts all the way through withvery less other options to explore.

Advantages: [2] There are various advantages of using Sring MVC frameworkfor developing web applications. Some are as follows:

(a) Dependency Injection: Spring provides support for dependency injectionso that various components are less depnedent on each other.

(b) It provides very clean distinction between Model, view and controllers.

(c) It provides support for controllers as well as interceptors to control thebehaviour of flow of application much better and intercepting requestswherever needed.

(d) XSLT, Velocity, etc. can be used instead of JSPs. So, it is not dependenton using only JSPs being a framework working over Java.

(e) Unlike Struts, Spring does not enforce extending any class for Controllers.

44

Page 47: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(f) Provides dependency injection for Models, ie., business logic.

Apart from above mentioned technologies, SSH needs to be added to the list ofpublic keys of the slave server on master server’s ’authorized keys’ file for RSyncto work. RSync operates on SSH so it needs to have SSH key to be added tobe called from Java application. Steps to add public SSH key:

(a) run this command:ssh-keygen -t rsaThis command generates the key pair.

(b) Once the key pair is generated, we need to copy the key into masterserver’s authorized keys by copying the contents of id rsa.pub file into/.ssh/authorized keys file on master server.

4.4.2 Front End

Front end is very important component since this is the part of whole applica-tion which user experiences. A good and effiecient front end can have a hugeimpact on user’s experience. For better front end, we will be using followingtechnologies:

JQuery

JQuery: JQuery is a very powerful library built on top of Javascript. It de-creases Lines Of Code to a great extent and gives rather flexible and robustregular expression support to search through elements in the DOM.Alternatives: There is no proper alternative to using Javascript for client-sidescripting. But, plain Javascrip can be used to replace JQuery code completely.Advantages: There are following advantages of using JQuery for client sidescripting:

(a) Lesser number of Lines of Code, so it is better to manage the code.

(b) Built-in functions to ease coding efforts.

(c) Very powerful regular expression based selection of elements in DOM.

(d) Fits well with CSS.

JQuery Widgets

JQuery widgets are Javascript plugins that give flexibility to create componentsthat can be fitted anywhere in the web page DOM and be rendered separatelyfor each instance having separate values to pring every time it is instantiated.Alternatives: Widgets provide modular structure to the web pages. Suchmodular structure is also supported at higher level by frameworks like Back-bone.js.Advantages:

45

Page 48: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(a) Easy to implement. They are not very heavy library as Backbone.js.

(b) Provide modular approach to building components in the web apge.

(c) Have a well defined life cycle, including, create, init, destroy, and optionsfor configuring values inside the widget.

CSS

CSS is purely client side styling utility. It provides support for separating stylefrom the main web page, making the web page more readable.Alternatives: CSS does not have proper alternative for itself, but styling caneasily be embedded in the HTML (or JSP) page to remove CSS completely.Advantages:

(a) It separates styling from web page making it more readable and manage-able.

(b) Fits well with JQuery.

(c) Provides easily available animations that can be added in the page forbetter design.

Bootstrap

Bootstrap is a free collection of tools which provide better UI to the HTMLpage. It has various control elements out of the box and very well styled (in-cluded in CSS part of Bootstrap) to be used readily, including, buttons, progressbars, tabs, etc. Most important thing about Bootstrap is that it provides forresponsive designs. Thus, a bootstrap-structured page will behave normally ondifferent screen sizes (If designed properly).Advantages: Advantages of using Bootstrap are as follows:

(a) Provides out of the box elements that can be used readily to build webpage.

(b) Divides the page into grid like structure, making it more manageable.

(c) Provides media queries to make the web page responsive and make it be-have properly on different devices and screen sizes.

4.5 Setting up EDX Platform [4]

This section shows how to setup EDX platform to perform various testings ofthe application. EDX platform can be setup very quickly through Vagrant boxavailable. Following are the required applications:

46

Page 49: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

(a) Virtual Box:VMwaqre provides for Virtual Machine and can easily be downloaded fromhttps://my.vmware.com/web/vmware/downloadsitem Vagrant Install Vagrant in the Virtual Machine. It can be down-loaded from http://www.vagrantup.com/downloads. Downloaded file canbe opened in software management utility and installed.

After installing Vagrant, we need to run ‘vagrant up‘ command. This downloadsEDX platform on the machine.

47

Page 50: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Chapter 5

Conclusion and Future Work

EDX Platform is an open source platform where various universities can runonline courses and students from across the globe can register to those courses.It provides facilities for users to create course content, like, videos and otherfiles, quizzes, assignements, submissions, forums, etc.We can provide an utility which allows faculties or students in various collegesto share some content which can be viewed by other faculties or students en-rolled in the course. This utility allows admin to put authenticated content ona college campus server and then sync that directory to another server fromwhere the content can be viewed by other students or faculties enrolled in thecourse. Designed web application is a full-fledged File Synchronization utilityif the data is distributed across servers spread over various campuses across thecountry.For such functionalities, a complete File Synchronization application needs tobe developed, which provides efficient synchronization of various servers withcentral edX server. For efficient use of bandwidth, RSync tool in linux dis-tributed will be used.Following are some basic facilities provided by the utility:

� Add/delete/edit colleges.

� Add/delete/edit server withing different colleges.

� Add/delete/edit directories on different servers.

� Add/delete/edit directory pairs.

Along with basic synchronization, application also supports other facilities, suchas follows:

� View History by timestamp for knowing at what times sync process failed.

� View logs of sync processes.

� Schedule periodic synchronization of directory pairs.

� Check SSH connection possibility between local and remote servers.

48

Page 51: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

� Map view for overall visiblility of in-progress sync processes.

� Android application for viewing sync processes’ logs or start a new syncprocess.

� Notification on Android device for any failed sync processes.

� Check availability of directories on local and/or remote server. This givesadministrator an idea about which directories might be wrongly enteredon the UI directly.

5.1 Future Work

Following points can make the application much more userful and user-friendly

(a) Allow different roles to users so that student, faculty and admin havedifferent privileges on the web application.

(b) Allow students/faculties to upload videos/contents to specific directory onweb application’s server directly from web applicaton or through Androiddevice.

(c) Allow admin to add server or edit servers from Map View.

(d) Give better notifications on web UI for Administrator to look at the exactlogs of failed sync process by directly clicking on the notification.

(e) Authenticate user, faculty and administrator differently from Moodle using’Roles’ in Moodle. Currently authentication only checks if given user isenrolled in Moodle or not.

(f) Allow administrator to add/delete colleges, servers and directories pairsfrom Android application.

(g) Sync Process Analytics: Allow administrator to view time taken by differ-ent sync processes and analytics on the data, like, average time taken bylast 10 sync processes, top 10 most time consuming sync processes, etc.This will help administrator to monitor which directories are having toomuch of data, which directory pair is failing most number of times in syncprocess, etc.

(h) Allow database configuration of various values in Android application andweb application which are hardcoded.

(i) Allow administrator to keep snapshots of directories which can then beloaded back in future. This goes towards versioning of directories wheretools such as SVN, Git, etc can be used.

(j) Map view on Android application.

49

Page 52: Data Synchronization for edX Platform · The EDX platform is very much modular and each module has well defined and loosely coupled functionality, as shown in figure 1.1. This architecture

Bibliography

[1] Loukas Avgeriou. http://luckybackup.sourceforge.net/features.html.

[2] Lijin. http://orangeslate.com/2006/11/10/

12-benefits-of-spring-mvc-over-struts/,10-November,2006.

[3] Moodle Pty Ltd. https://docs.moodle.org/26/en/Step-by-step\

_Installation\_Guide\_for\_Ubuntu.

[4] MIT. http://people.csail.mit.edu/ichuang/edx/.

[5] Don Mitchell. https://github.com/edx/edx-platform/wiki/

Split-mongo-architecture-and-rollout-options,27-February,2014.

[6] opbyte. http://www.opbyte.it/grsync/\#features.

[7] EDX Platform. http://code.edx.org/,2013-2014.

[8] Andrew Tridgell and Paul Mackerras. The rsync algorithm. 1996.

[9] Svilen Mihaylov Torsten Suel Utku Irmak. Improved Single-Round Protocols forRemote File Synchronization. pages 156–160. IEEE, 2008.

[10] VikParuchuri. https://github.com/edx/edx-ora/blob/master/docs/

images/ora\_diagram.png,12-July,2013.

[11] Wikipedia. http://en.wikipedia.org/wiki/LuckyBackup.

[12] Wikipedia. http://en.wikipedia.org/wiki/FlyBack.

50