A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

  • Upload
    ijsret

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    1/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    A Utility Mining Approach for Building a Knowledge-based

    Recommender for Educational Decision Support 

    Deborah Evelyn. S1,1Department of Computer Science and Engineering, University College of Engineering, Kanchipuram

    ABSTRACTWith the boom in the number of streamlined career

    options that are available today, there is a need forstrategic guidance to students enrolled into a course of

     broader study. Helping them discover their domain ofexpertise will therefore result in a hassle free pathtowards a successful career. This is the key notion of this

     paper – to help students connect the dots. For this we canemploy data mining technologies, which provide acollection of methodologies to discover vital patterns

    and relationships among data within large data sets. Thedatabase can be based on curriculum and evaluation

     process. Then the corresponding patterns would showstudents‟ expertise. By the development of a knowledge-

     based recommender, this information can be conveyed to

    the student community. This system has proved to besynergistic in educational decision support. Alongside

    this concept, a psychological proposition called the FirstLetter Hypothesis has been put forth through research on

    the databases.

    Keywords   –   Association rules, Confidence, Knowledge

    base, Recommendation, Support, Utility mining.

    I.  INTRODUCTION 

    1.1. 

    SIGNIFICANCE OF UTILITY MININGThe vastness and accessibility of data has indeedmotivated the formulation of various strategies to

    unravel meaningful knowledge hidden in huge databasesthrough data mining. Of all the numerous mining

    techniques that are available, frequent itemset patternmining and utility mining techniques have gained muchsignificance. Some of the key factors for thisdevelopment are the nature of the databases (the

    transition from static types to transactional andincremental types of databases.) and the nature of theattributes and the entities contained in it.More recently, there has been a noted drift of theapplication domains that employ data mining, toward theutility mining approach, from the frequent itemset

    mining approach because the latter implicitly considersthe utilities of the item sets contained to be equal andrepresents their occurrences with binary values.Secondly, in the frequent itemset mining, values of itemsets only increase with frequency. These limitations had

    resulted in the development of a better strategy, i.e. the

    utility mining technique.The utility mining approach has been formulated toidentify item sets of high utilities (e.g. profit margin

    value, user preferences, etc.) and also to allow the usersto set utility threshold of all item sets in a database. It is

    an improvised version of the frequent itemset patternmining strategy and is the most state-of-art approach thacan be adapted.

    1.2. 

    RECOMMENDER SYSTEMSRecommender systems are a subclass of informationfiltering system that seek to predict the „rating‟ or

     preference that a user would give to an item.There are four types of recommender systems as given

     belowContent-based: It is an approach that focuses on thecontent, i.e. the type of file or format of information

    mined in the past activities of users.Collaborative: It is an approach that works with a

     predictive model trained by the logs of the past activities

    of users.

    Hybrid: It is a combination of the above types.Knowledge-based: This type of recommender system isthe one that truly depends on a knowledge base built bythe association rules mined in the process of data

    mining.The last type of recommender system discussed above isthe best suit for this project as the type of database thatwas mined was a static database. The other types ofrecommender systems are designed to work with

    transactional and incremental databases.

    1.3. ROLE OF DATA MINING IN THE

    RECOMMENDER SYSTEMThe association rules generated during the data mining

     process was used to formulate the knowledge base of therecommender system. Associations among the attributes

    of the database in-hand were generated using Apriorand PredictiveApriori algorithms. Thus the knowledge

     base so obtained provided meaningful insight on thestudents‟ approach to their respective curriculum andalso proved the last letter hypothesis to be true.

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    2/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    1.4. 

    OTHER RELATED KEYWORDS AND

    DEFINITIONS

    1.4.1. 

    DATA MINING

    Data mining is the process of discovering interestingknowledge from large amount of data stored in database,data warehouse or other information repositories. Based

    on this view, the architecture of a typical system has thefollowing major components.

    1.4.2. 

    ASSOCIATION RULES, SUPPORT AND

    CONFIDENCE

    Association rules are used to show the relationship between data items. Mining association rules allowsfinding rules of the form: X-> Y for all X and Y in U.

    Here X and Y are item sets of some data set U.Support and confidence are common methods used to

    measure the quality of association rule. Support for theassociation rule X->Y is the percentage of transaction in

    the database that contains XUY. Confidence for theassociation rule is X->Y is the ratio of the number oftransaction that contains XUY to the number oftransaction that contain X.

    Figure 1.1: Association Rules

    1.4.3. 

    FIRST LETTER HYPOTHESISThis hypothesis states that, individuals whose names

     begin with the last ten letters of the alphabet series are

     better achievers and competitors than those whosenames begin with the first ten letters of the alphabetseries.

    II.  THE ARCHITECTURE OF THE

    EDUCATIONAL RECOMMENDER  The following diagram shows a simple schematic of the

     proposed architecture of this project. The project isdivided into three layers for implementation ease. The

    Application layer deals purely with the creation and preprocessing of the databases. The data mining layerconsists of the set of activities that are aimed at efficient

    extraction of association rules (meaningful patterns ofthis project) and the formulation of the knowledge basewith the most realistic association rules that were minedduring this process. Special focus has been given to theformulation of the knowledge base and its details are

    clearly explained in the following texts of this chapterand the association rules generated through the mining

     process are discussed in chapter 4.

    Figure 2.1: Educational Recommender Architecture

    2.1. APPLICATION LAYER

    The database containing the student information is the

    application database for this project. Since no suchdatabase was pre-existent, it was created to through

    questionnaires. The following steps were involved in this process.

    2.1.1. DATA COLLECTION AND

    PREPROCESSING

    The databases were created with the data submitted bythe students of UCEK through a prudently completedquestionnaire. It consisted of the following sections to be

    completed against every mainstream subjects in thecurriculum for Computer Science and Engineering a

    levels of under graduation study.Understanding (rating range: 1-3)

    Marks Scored (rating range: 1-3) (1-E and below),( 2-C,D) ,(3-B and above)Confidence (rating range: 1-3)

    First Attempt (P, PF, F) (P-cleared the finals, PF- clearedthe finals but not in the first attempt, F- still a backlog)The details furnished were then preprocessed for

    efficient and compatible association mining bysmoothing and transformation. The following

    transformations were carried out in order to make thedatabase compatible with the mining tool.The range 1-3 for understanding was transformed into

     Nominal (1), Sound (2) and Profound (3). The range 1-3for marks scored was transformed into Low (1), Average

    (2) and Good (3). The range 1-3 for confidence ratingwas transformed into Doubtful (1), Secure (2) and

    Confident (3). No transformation was required on thefirst attempt column. The unsupervised transformationfunction NumericToNominal was applied to numeric

    attributes.The missing values were found on elective subjects these

    were smoothened by applying the minimum thresholdvalue of the entity in the respective column. Thissmoothing was necessary as the data mining tool used

    cannot handle null values.

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    3/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    2.1.2. PROCEDURE FOR THE CONSTRUCTION

    OF THE APPLICATION LAYER

    1. Collect the required data by administering surveyquestionnaires or by employing other survey methods tothe population of interest (in this project, studentcommunity).

    2. Apply data cleaning strategies for efficiency duringthe mining process.3. Apply the required data transformation strategies forcompatibility of attribute types.4. Create the database in a format compatible with the

    data mining tool that is to be used.

    2.2. DATA MINING LAYER

    This step involved the application of associationalgorithms for the formulation of association rules.

    Hence, the next phase of the project was to mine the preprocessed databases. Attempts to cluster the database

    through the tool had failed due to the large number ofattributes considered and the exceeding size of thedatabase on the whole. The subjects were clusteredmanually into the following categories.

    Table 2.1: Categories and Subjects

    SL.

     NO

    Categories And Subjects

    1 Application Programming

    Fundamentals Of Computing And Programming

    Object Oriented Programming

    Java Programming Paradigms

    2 Critical Programming

    Fundamentals Of Computing And ProgrammingData Structures

    Design Analysis And Algorithms

    3 Hardware Logic

    Electric Circuits And Electron Devices

    Digital Principles Of System Design

    Microprocessors And Microcomputers

    Computer Organization And Architecture

    Advanced Computer Architecture

    4 System Theory

    Operating System

    System Software

    5 Machine Learning

    Artificial IntelligenceTheory Of Computations

    Principles Of Compiler Design

    6 Network Study

    Computer Networks

    Web Technology

    7 Software Engineering

    Software Engineering

    Object Oriented Analysis And Design

    8 Database Techniques

    Database Management Systems

    Advanced Database Technology

    Each cluster was mined independently using both

    Apriori and PredictiveApriori algorithms and theinferences of these procedures were compared foranalysis and to find out which was a more realisticapproach of the two; these inferences are discussed inchapter 6.

    2.2.1. ASSOCIATION RULE MINING

    ALGORITHMSApriori algorithm is a frequent itemset mining strategywhich learns as it operates over a transactional database

    An item that is frequently encountered during the mining process has a greater support value. Thus it‟s analgorithm that highlights the general trend in a particular

    dataset.The pseudo code for the algorithm is given below for a

    transaction database , and a support threshold of

    Usual set theoretic notation is employed; though note

    that is a multiset. Ck  is the candidate set for level . Aeach step, the algorithm is assumed to generate the

    candidate sets from the large item sets of the preceding

    level, heeding the downward closure lemma. Count[c]

    accesses a field of the data structure that represents

    candidate set c, which is initially assumed to be zero

    Many details are omitted below, usually the most

    important part of the implementation is the data structure

    used for storing the candidate sets, and counting their

    frequencies.

    Apriori, while historically significant, suffers from anumber of inefficiencies or trade-offs, which havespawned other algorithms. Candidate generationgenerates large numbers of subsets (the algorithm

    attempts to load up the candidate set with as many as possible before each scan). Bottom-up subseexploration (essentially a breadth-first traversal of thesubset lattice) finds any maximal subset S only after all

    of its proper subsets

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    4/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    PredictiveApriori algorithm overcomes this by

    considering a threshold that various along the processand by continually predicting the course of the followingsteps or associations.Thus the knowledge base of the recommender systemcan be formed of the most realistic association rules

    formulated from both algorithms. This knowledge basewill direct the recommender service it would providethrough the front end to the user. The followingschematic shows the architecture of the recommender.

    2.2.2. PROCEDURE FOR THE CONSTRUCTION

    OF THE KNOWLEDGE BASE

    1. Load the database into the data mining environment.

    2. Apply the necessary filters and transforms forattribute type compatibility. This step is necessary

     because both Apriori and PredictiveApriori algorithmscannot handle varying data types.

    3. Compare association rules generated.4. Evaluate the same based on accuracy rate and supportcount.5. Select realistic associations to build the knowledge

     base.

    2.3. RECOMMENDATION LAYER

    The knowledge base programmed into the recommendersystem‟s source code is the cornerstone for therecommender service. It is the recommender algorithm.

    In this project the knowledge base formulated in the previous stage of the project is programmed into it. In a

    knowledge based recommender the interfacingapplications software is not required.

    III.  IMPLEMENTATION DETAILSThis chapter provides a brief look into how the

    architecture was implemented using the variouscomponents mentioned in chapter 4.

    3.1. DATABASES FOR RESEARCH

    The databases where created as per the procedure found in chapter 3. Initially the data from the

    flat files were fed into MS Excel and saved in the

    comma separated version. The various anomalies werecorrected as mentioned in chapter 3. Each category had a

    corresponding database. The instances in all thedatabases were equal (200) and each database had

    varying number of attributes corresponding to thesubjects dealt under it.

    3.2. ASSOCIATION RULE MINING WITH WEKA

    The association rules were generated using

    the algorithms, Apriori and PredictiveApriori. Each log

    was executed with a 10 cycles of cross validation and

    was set to generate the 10 best association rules as a

    result of the process. The comparison based on the

     performance evaluation for the two algorithms are

    discussed in the following chapter.

    Figure 3.3: All attributes of application programming

    after preprocessing

    3.3. RECOMMENDATION USER INTERFACE

    The front end was developed in Java programminglanguage in the NetBeans IDE 6.9.1. The swingcomponents and their associated event handlingmechanisms were implemented. Each category of the

    core stream subjects were created to be introduced andexplained about practically in separate frames. Eachcategory frame was made to display the subjects underit, its application and the scope or the job titles that itinvolved.

    3.4. FLOW DIAGRAM OF IMPLEMENTATION

    METHODOLOGY 

    Figure 3.4: Flow diagram of Implementation

    Methodology

    Thus the various implementation strategies are

    explained. The results are discussed in the following

    chapter.

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    5/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    IV. 

    RESULTS AND DISCUSSIONS

    4.1. ASSOCIATION RULE MINING

    The results of the Association rules mining processes yielded two types of inferences. Theapplication specific inferences regarding the patterns andrules that were generated from application databases

    and, the domain specific inferences i.e. technicainferences on the comparison between Apriori andPredictiveApriori association rule mining algorithmsThe following association rules were generated usingApriori and PredictiveApriori algorithms.

    Sl.

     No

    Association Rules Algorithm

    Used

    1.1.

    Application Programming

    Apriori

    1.2.

    Application Programming

    Predictive

    Apriori

    2.1.

    Critical Programming

    Apriori

    2.2.

    Critical Programming

    Predictive

    Apriori

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    6/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    3.1.

    Hardware Logic

    Apriori

    3.2.

    Hardware Logic

    Predictive

    Apriori

    4.1.

    System Theory

    Apriori

    4.2.

    System Theory

    Predictive

    Apriori

    5.1.

    Machine Learning

    Apriori

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    7/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    5.2. PredictiveApriori

    6.1.

     Network Study

    Apriori

    6.2.

     Network Study

    Predictive

    Apriori

    7.1.

    Software Engineering

    Apriori

    7.2.

    Software Engineering

    Predictive

    Apriori

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    8/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    8.1.

    Database Techniques

    Apriori

    8.2.

    Database Techniques

    PredictiveApriori

    Table 4.1. Tabulation of association rules

    4.2. COMPARISON BASED ON SUPPORT AND

    CONFIDENCEThe following table shows the Support and Confidencevalues of the 10 best association rules generated by the

    two algorithms that are discussed. All the results ofApriori algorithm show that the support value of the

    association increases gradually with time. This clearlyshows that Apriori algorithm is a frequent itemset

    mining strategy. The relationships are established among

    attributes with varying support count.Conversely, in the PredictiveApriori strategyrelationships are built among attributes with the same

    support count first.The following graph shows the difference in suppor

    count along execution in the two algorithms

     

    Figure 4.1: Comparison of Support Count Figure 4.2: Comparison of Confidence

    In the following graph the comparison in the confidenceor accuracy for the same associations rules have beendrawn. This graph shows that PredictiveApriorialgorithm has a higher accuracy level than Apriori

    algorithm. The fall in accuracy rate of PredictiveApriorialgorithm is relatively very small in comparison with the

    other. The margin of slope of the PredictiveApriorialgorithm is smaller. Higher accuracy correlates tohigher utility. The following graph shows the accuracy

     plot of the two algorithms discussed.

  • 8/20/2019 A Utility Mining Approach for Building a Knowledge-based Recommender for Educational Decision Support

    9/9

     

    www.ijsret.org

    International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278  –  0882

    Volume 4, Issue 10, October 2015

    4.3. RESULTANT GRAPHS OF FIRST LETTER

    HYPOTHESIS

    The following graph shows the plot of performance inthe finals against the first letters of candidates‟ names.  

    Figure 4.3: First Letter Hypothesis Plot

    From the graph it is clear that the statement of first letterhypothesis of true. The following chapter holds the

    inference of the same.

    V. CONCLUSION AND FUTURE WORK5.1. CONCLUSIONThus the implementation of a recommender system wascompleted successfully and the comparison drawn

     between the two algorithms infer that PredictiveApriori performs better based on the predictive accuracy and the

    various statistical measures that were considered. Thefollowing inferences were observed in this educationalresearch. Candidates who have a good understanding on

    the basics do well in the successive levels. A securelevel of understanding and Confidence produce a greater

     possibility for success in finals. Basic course titles arevery easy to succeed in finals. A good understandingmay not always lead to an equivalent level of success.

    Two dimensional graphs that were generated as agraphical result of the association rule mining process

    showed clearly that the psychological relationship between performance or competitiveness and the firstletter of a candidate‟s name, as defined by the first letterhypothesis was true. This is because the individuals withnames beginning with the first ten letters of the alphabet

    series are always first in line, in sorting and hence don‟thave the urge to fight in order to move ahead; the

    converse holds true for the individuals with their names

     beginning with the last ten letters of the alphabet series.

    REFERENCESJournal Papers:

    [1] Sunita B Aher, Lobo. L. M. R. J.(2012), “Data

     Preparation Strategy in E-Learning System using Association Rule Mining Algorithm”, Internationa

    Journal of Computer Applications, Volume 41-pages 35-

    40.

    [2] Sunita B Aher, Lobo. L. M. R. J.(2012), “A

    Comparative Study for Selecting the Best Unsupervised

    learning Algorithm in E- learning Systems”

    International Journal of Computer Applications, Volume

    41-pages 27-34.

    [3] Sunita B Aher, Lobo. L. M. R. J.(2011), “ Data

     Mining in Educational System in WEKA”, Internationa

    Journal of Computer Applications, Internationa

    Conference on Emerging Technology Trends.

    [4] Sunita B Aher, Lobo. L. M. R. J.(2011), “ A

     Framework for Recommendation of courses in E-

    learning System”, International Journal of Computer

    Applications, Volume 35-pages 21-28.

    [5] Sunita B Aher, Lobo. L. M. R. J.(2012), “ A

    Comparative Study of Association Rule Algorithms for

    Course Recommender System in E-learning ”

    International Journal of Computer Applications, Volume

    39-pages 48-52.[6] Mukesh Sharma, Jyothi Choudhary, Gunjan Sharma

    (2013), “ Evaluating the performance of apriori and

     predictive apriori algorithm to find new association

    rules based on the statistical measures of datasets”

    International Journal of Engineering Research and

    Technology, Volume 6.

    [7] Shwetha, Kanwal Garg (2013), “ Mining Efficien

     Association Rules Through Apriori Algorithm Using

     Attributes and Comparative Analysis of Various

     Association Rule Algorithms”, International Journal o

    Advanced Research in Computer Science and Software

    Engineering, Volume 3.

    Web Source:

    [8] Wikipedia – Apriori algorithm

    https://en.wikipedia.org/wiki/Apriori_algorithm