12
Research Article Negative and Positive Association Rules Mining from Text Using Frequent and Infrequent Itemsets Sajid Mahmood, 1,2 Muhammad Shahbaz, 1 and Aziz Guergachi 3 1 Department of Computer Science & Engineering, University of Engineering & Technology, Lahore, Pakistan 2 Al-Khawarizmi Institute of Computer Sciences, UET, Lahore, Pakistan 3 Ted Rogers School of Information Technology Management, Ryerson University, Toronto, Canada Correspondence should be addressed to Sajid Mahmood; [email protected] Received 17 February 2014; Revised 27 March 2014; Accepted 1 April 2014; Published 18 May 2014 Academic Editor: Daniel D. S´ anchez Copyright © 2014 Sajid Mahmood et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Association rule mining research typically focuses on positive association rules (PARs), generated from frequently occurring itemsets. However, in recent years, there has been a significant research focused on finding interesting infrequent itemsets leading to the discovery of negative association rules (NARs). e discovery of infrequent itemsets is far more difficult than their counterparts, that is, frequent itemsets. ese problems include infrequent itemsets discovery and generation of accurate NARs, and their huge number as compared with positive association rules. In medical science, for example, one is interested in factors which can either adjudicate the presence of a disease or write-off of its possibility. e vivid positive symptoms are oſten obvious; however, negative symptoms are subtler and more difficult to recognize and diagnose. In this paper, we propose an algorithm for discovering positive and negative association rules among frequent and infrequent itemsets. We identify associations among medications, symptoms, and laboratory results using state-of-the-art data mining technology. 1. Introduction Association rules (ARs), a branch of data mining, have been studied successfully and extensively in many application domains including market basket analysis, intrusion detec- tion, diagnosis decisions support, and telecommunications. However, the discovery of associations in an efficient way has been a major focus of the data mining research community [16]. Traditionally, the association rule mining algorithms target the extraction of frequent features (itemsets), that is, features boasting high frequency in a transactional database. However, many important itemsets, with low support (i.e., infrequent), are ignored by these algorithms. ese infre- quent itemsets, despite their low support, can produce poten- tially important negative association rules (NARs) with high confidences, which are not observable among frequent data items. erefore, discovery of potential negative association rules is important to build a reliable decision support system. e research in this paper extends discovery of positive as well as negative association rules of the forms ⇒ ¬, ¬ ⇒ , ¬ ⇒ ¬, and so forth. e number of people discussing their health in blogs and other online discussion forums is growing rapidly [7, 8]. Patient-authored blogs have now become an important component of modern-day healthcare. ese blogs can be effectively used for decision support and quality assurance. Patient-authored blogs, where patients give an account of their personal experiences, offer near accurate and complete problem lists with symptoms and ongoing treatments [9]. In this paper, we have investigated the efficient mechanism of identifying positive and negative associations among medications, symptoms, and laboratory results using state-of- the-art data mining technology. Rules of the form or ⇒ ¬ can help explain the presence or absence of different factors/variables. Such types of associations can be useful for building decision support systems in the healthcare sector. We target 3 major problems in association rule mining: (a) effectively extracting positive and negative association rules from text datasets, (b) extracting negative association Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 973750, 11 pages http://dx.doi.org/10.1155/2014/973750

Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

Research ArticleNegative and Positive Association Rules Mining fromText Using Frequent and Infrequent Itemsets

Sajid Mahmood12 Muhammad Shahbaz1 and Aziz Guergachi3

1 Department of Computer Science amp Engineering University of Engineering amp Technology Lahore Pakistan2 Al-Khawarizmi Institute of Computer Sciences UET Lahore Pakistan3 Ted Rogers School of Information Technology Management Ryerson University Toronto Canada

Correspondence should be addressed to Sajid Mahmood sajidmahmoodkicsedupk

Received 17 February 2014 Revised 27 March 2014 Accepted 1 April 2014 Published 18 May 2014

Academic Editor Daniel D Sanchez

Copyright copy 2014 Sajid Mahmood et alThis is an open access article distributed under theCreative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Association rule mining research typically focuses on positive association rules (PARs) generated from frequently occurringitemsets However in recent years there has been a significant research focused on finding interesting infrequent itemsets leading tothe discovery of negative association rules (NARs)The discovery of infrequent itemsets is farmore difficult than their counterpartsthat is frequent itemsets These problems include infrequent itemsets discovery and generation of accurate NARs and their hugenumber as compared with positive association rules In medical science for example one is interested in factors which can eitheradjudicate the presence of a disease or write-off of its possibility The vivid positive symptoms are often obvious however negativesymptoms are subtler and more difficult to recognize and diagnose In this paper we propose an algorithm for discovering positiveand negative association rules among frequent and infrequent itemsets We identify associations among medications symptomsand laboratory results using state-of-the-art data mining technology

1 Introduction

Association rules (ARs) a branch of data mining havebeen studied successfully and extensively inmany applicationdomains including market basket analysis intrusion detec-tion diagnosis decisions support and telecommunicationsHowever the discovery of associations in an efficient way hasbeen a major focus of the data mining research community[1ndash6]

Traditionally the association rule mining algorithmstarget the extraction of frequent features (itemsets) that isfeatures boasting high frequency in a transactional databaseHowever many important itemsets with low support (ieinfrequent) are ignored by these algorithms These infre-quent itemsets despite their low support can produce poten-tially important negative association rules (NARs) with highconfidences which are not observable among frequent dataitems Therefore discovery of potential negative associationrules is important to build a reliable decision support systemThe research in this paper extends discovery of positive as

well as negative association rules of the forms119860 rArr not119861 not119860 rArr

119861 not119860 rArr not119861 and so forthThe number of people discussing their health in blogs

and other online discussion forums is growing rapidly [78] Patient-authored blogs have now become an importantcomponent of modern-day healthcare These blogs can beeffectively used for decision support and quality assurancePatient-authored blogs where patients give an account oftheir personal experiences offer near accurate and completeproblem lists with symptoms and ongoing treatments [9]In this paper we have investigated the efficient mechanismof identifying positive and negative associations amongmedications symptoms and laboratory results using state-of-the-art data mining technology Rules of the form 119860 rArr 119861 or119860 rArr not119861 can help explain the presence or absence of differentfactorsvariables Such types of associations can be useful forbuilding decision support systems in the healthcare sector

We target 3 major problems in association rule mining(a) effectively extracting positive and negative associationrules from text datasets (b) extracting negative association

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 973750 11 pageshttpdxdoiorg1011552014973750

2 The Scientific World Journal

rules from the frequent itemsets and (c) the extraction ofpositive association rules from infrequent itemsets

The rest of this paper is organized as follows In thenext section we present brief introduction to data miningterminology and background Section 3 reviews related workon association rule mining In Section 4 we describe themethodology for identifying both frequent and infrequentitemsets of interest and generation of association rules basedon these itemsetsThe proposedmodel for extracting positiveand negative association rules is presented in Section 5 andthe final section of this paper gives the details of the exper-imental results comparisons and conclusions Section 6 pre-sents the experimental results and the conclusion and futuredirections are given in Section 7

2 Terminology and Background

Let us consider 119868 = 1198941 1198942 119894

119873 as a set of 119873 distinct liter-

alsterms called items and let119863 be a database of transactions(documentsblogs etc) where each transaction 119879 is a set ofitemsterms such that 119879 is a subset of ldquo119868rdquo Each transactionis associated with a unique identifier called 119879

119868119863 Let 119860 119861

be sets of items an association rule is a derivation of theform 119860 rArr 119861 where 119860 sub 119868 119861 sub 119868 and 119860 cap 119861 = Oslashldquo119860rdquo is called the antecedent of the rule and ldquo119861rdquo is calledthe consequent of the rule An association rule 119860 rArr 119861

can have different measures denoting its significance andquality In our approach we have employed (i) support bydenoting it by supp which is the percentage of transactionsin database 119863 containing both 119860 and 119861 (ii) confidence bydenoting it by conf which is representing the percentage oftransactions in 119863 containing 119860 that also contain 119861 whichcan be denoted in probability terms by 119875(119861 | 119860) and(iii) lift by denoting it by lift characterizing the directionof relationship between the antecedent and consequent ofthe association rule Rules having the support value greaterthan user defined minimum support minsupp in which theitemset needs to be present in minimum threshold numberof transactions and confidence greater than user definedminimum confidence minconf are called valid associationrules The lift symbolizes the association whether positive ornegative A value of lift greater than 1 indicates a positiverelationship between the itemsets value of lift less than 1indicates a negative relationship and where the value of liftequals 1 the itemsets are independent and there exists norelationship between the itemsets

Some of the above and derived definitions can be repre-sented with the following mathematical equations

supp (119860) lowastNumberPercentage of transaction(s)

containing 119860lowast

(1)

supp (119860 997904rArr 119861)

= 119875 (119860119861) lowastNumberPercentage of

transactions where 119860 and 119861 co-exist lowast(2)

conf (119860 997904rArr 119861)

=119875 (119860119861)

119875 (119860)lowastConfidence measure of the rule that

whenever 119860 occurs 119861 also occurs

in transaction(s) lowast(3)

lift (119860 997904rArr 119861)

=119875 (119860119861)

119875 (119860) 119875 (119861)lowastThe strength of relationship

between 119860 and 119861lowast

= 1 +119875 (119860119861) minus 119875 (119860) 119875 (119861)

119875 (119860) 119875 (119861)

(4)

Supp (not119860) = 1 minus Supp (119860) (5)

Supp (119860 cup not119861) = Supp (119860) minus Supp (119860 cup 119861) (6)

Conf (119860 997904rArr not119861) = 1 minus Conf (119860 997904rArr 119861) =119875 (119860not119861)

119875 (119860)(7)

Supp (not119860 cup 119861) = Supp (119861) minus Supp (119861 cup 119860) (8)

Conf (not119860 997904rArr 119861) =Supp (119861) (1 minus Conf (119861 997904rArr 119860))

1 minus 119875 (119860)

=Supp (not119860 cup 119861)

Supp (not119860)

(9)

Supp (not119860 cup not119861) = 1 minus Supp (119860) minus Supp (119861) + Supp (119860 cup 119861)

(10)

Conf (not119860 997904rArr not119861)

=1 minus Supp (119860) minus Supp (119861) + Supp (119860 cup 119861)

1 minus 119875 (119860)

=Supp (not119860 cup not119861)

Supp (not119860)

(11)

The huge number of infrequent items generates an evenhigher number of negative rules in comparison to the positiveassociation rulesThe problem is overwhelmed when dealingwith text where wordsterms are items and the documentsare transactions It is also difficult to set a threshold for theminimum support as a measure for text because of the hugenumber of unique and sporadic items (words) in a textualdataset Indexing (assigning weights to the terms) of the textdocuments is very important for them to be used as trans-actions for extracting association rules Indexing techniquesfrom the information retrieval field [10] can greatly benefitin this regard Index terms the words whose semantics helpin identifying the documentrsquos main subject matter [11] helpdescribe a document They possess different relevance to agiven document in the collection and therefore the assigneddifferent numerical weights Text mining aims to retrieveinformation from unstructured text to present the extracted

The Scientific World Journal 3

Table 1 IDF scores of sample keywords of the corpus

Selected keywords IDF Discarded keywords IDFChemo 22693 Threat 00158Radiation 22535 Disease 00157Tumor 22316 Severe 001426Surgery 22135 Produce 001392Cancer 22013 Result 001194Temodar 19830 Need 000639CT scan 19812 Analysis 000136Glioblastoma 19609 Type 000940Skull 18906 New 000843Cause 18902 Level 000694sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

knowledge in a compact form to the users [12] The primarygoal is to provide the users with knowledge for the researchand educational purposes

It is therefore imperative to employ some weight assign-mentmechanismWe have used inverse document frequency(IDF) which denotes the importance of a term in a corpusSelection of features on the basis of IDF values needs a carefulreading that is which range of IDF value features is includedThis can greatly affect the results as choosing a very low valueof ldquo119873rdquo is feared to nullify the impact of IDF and choosing avery high value may result in losing important termsfeaturesof the datasetWe have proposed a user tuned parameter thatis top 119873 where the value of ldquo119873rdquo is chosen by the userdepending upon the characteristics of the data

21 IDF IDF weighting scheme is based on the intuition ofterm occurrence in a corpus It surmises that the fewer thedocuments containing a term are the more discerning theybecome and hence an important artefact The IDF helps inunderstanding the terms that carry special significance in acertain type of document corpus It assigns high weight toterms that occur rarely in a document corpus In a medicalcorpus for instance the word ldquodiseaserdquo is not likely to carrya significant meaning Instead a disease name for exampleldquocholerardquo or ldquocancerrdquo would carry significant meanings incharacterizing the document

Keeping this in view a higher weight should be assignedto the words that appear in documents in close connectionwith a certain topic while a lower weight should be assignedto those words that show up without any contextual back-ground IDF weighting is a broadly used method for textanalysis [10 13] We can mathematically represent IDF asIDF119905= log(119863119889119891

119905) where 119905 is the term whose weight is to be

calculated 119889119891119905represents the documents in which the term

is present and119863 symbolizes the document corpusTable 1 shows the example set of words and their respec-

tive IDF scores in a document corpus with a threshold scorevalue of 60

The number of words in the documents before IDFscore calculation and top 60 selection is 280254 and afterselecting the top 60 based on IDF scores goes down to

81733 The bold part of the table shows a sample of theeradicated words which have IDF scores below thresholdvalue

3 Literature Review

The association rule mining is aimed at the discovery ofassociations among itemsets of a transactional databaseResearchers have extensively studied association rules sincetheir introduction by [14] Apriori is the most well-knownassociation rule mining algorithm The algorithm has a two-step process (1) frequent itemset generation (doing multiplescans of the database) and (2) association rule generationThe major advantage of Apriori over other association rulemining algorithms is its simple and easy implementationHowever multiple scans over the database to find rules makeApriori algorithmrsquos convergence slower for large databases

The other popular association rule mining algorithmfrequent pattern-growth (FP-Growth) proposed by Han etal [3] compresses the data into an FP-Tree for identifyingfrequent itemsets FP algorithm makes fewer scans of thedatabase making it practically usable for large databases liketext

However there has been very little research done forfinding negative association rules among infrequent itemsetsThe association rule mining algorithms are seldom designedfor mining negative association rules Most of the existingalgorithms can rarely be applied in their current capacity inthe context of negative association rule mining The recentpast however has witnessed a shift in the focus of theassociation rule mining community which is now focusingmore on negative association rules extraction [1 15ndash19]Delgado et al have proposed a framework for fuzzy rules thatextends the interestingmeasures for their validation from thecrisp to the fuzzy case [20] A fuzzy approach for miningassociation rules using the crisp methodology that involvesthe absent items is presented in [21]

Savasere et al [22] presented an algorithm for extractingstrong negative association rules They combine frequentitemsets and the domain knowledge to form taxonomy formining negative association rules Their approach howeverrequires users to set a predefined domain dependent hier-archical classification structure which makes it difficult andhard to generalizeMorzy presented theDI-Apriori algorithmfor extracting dissociation rules among negatively associateditemsets The algorithm keeps the number of generatedpatterns low however the proposedmethod does not captureall types of generalized association rules [23]

Dong et al [16] introduced an interest value for gen-erating both positive and negative candidate items Thealgorithm is Apriori-like for mining positive and nega-tive association rules Antonie and Zaıane [15] presenteda 119904119906119901119901119900119903119905 119888119900119899119891119894119889119890119899119888119890 119886119899119889 119888119900119903119903119890119897119886119905119894119900119899 coefficient basedalgorithm for mining positive and negative association rulesthe coefficients need to be continuously updated whilerunning the algorithm also the generation of all negativeassociation rules is not guaranteed

Wu et al proposed negative and positive rule miningframework based on the Apriori algorithmTheir work does

4 The Scientific World Journal

not focus on itemset dependency rather it focuses on ruledependency measures More specifically they have employedthe Piatetsky-Shapiro argument [24] about the associationrules that is a rule 119883 rArr 119884 is interesting if and onlyif supp(119860 rArr 119861) = supp(119860) supp(119861) This can be furtherexplained using the following

A rule is as follows 119860 rArr not119861 is valid if and only if

(1) lift(119860 not119861) = | supp(119860cupnot119861)minus supp(119860) supp(not119861)| gt 1

indicates a positive relationship between119883 and not119884(2) the other condition for a rule to be valid implies that

119883 119884 120598 frequent itemsets

The second condition can greatly increase the efficiencyof the rule mining process because all the antecedentor consequent parts of ARs can be ignored where 119883

119884 120598 Infrequent itemset This is needed because the itemsetis infrequent so Apriori does not guarantee the subsets to befrequent

Association rule mining research mostly concentrates onpositive association rules The negative association rule min-ing methods reported in literature generally target marketbasket data or other numeric or structured data Complexityof generating negative association rules from text data thusbecomes twofold that is dealing with the text and generatingnegative association rules as well as positive rules As wedemonstrate in the later sections negative association rulemining from text is different from discovering associationrules in numeric databases and identifying negative asso-ciations raises new problems such as dealing with frequentitemsets of interest and the number of involved infrequentitemsets This necessitates the exploration of specific andefficient mining models for discovering positive and negativeassociation rules from text databases

4 Extraction of Association Rules

We present an algorithm for mining both positive andnegative association rules from frequent and infrequentitemsets The algorithm discovers a complete set of positiveand negative association rules simultaneously There are fewalgorithms formining association rules for both frequent andinfrequent itemsets from textual datasets

We can divide association rule mining into the following

(1) finding the interesting frequent and infrequent item-sets in the database119863

(2) finding positive and negative association rules fromthe frequent and infrequent itemsets which we get inthe first step

The mining of association rules appears to be the coreissue however generation and selection of interesting fre-quent and infrequent itemsets is equally important Wediscuss the details of both in the following discussion

41 Identifying Frequent and Infrequent Itemsets As men-tioned above the number of extracted items (both frequentand infrequent) from the text datasets can be very large

with only a fraction of them being important enough forgenerating the interesting association rules Selection of theuseful itemsets therefore is challenging The support of anitem is a relativemeasure with respect to the databasecorpussize Let us suppose that the support of an itemset 119883 is04 in 100 transactions that is 40 of transactions containthe itemset Now if 100 more transactions are added to thedataset and only 10 of the added 100 transactions containthe itemset 119883 the overall support of the itemset 119883 will be025 that is 25 of the transactions now contain itemset 119883Therefore the support of an itemset is a relative measureHence we cannot rely only on support measure for theselection of importantfrequent itemsets

The handling of large number of itemsets which is thecase when dealingwith textual datasets ismore evident whendealing with infrequent itemsets This is because the numberof infrequent itemsets rises exponentially [1] Thereforewe only select those termsitems from the collection ofdocuments which have importance in the corpus This isdone using the IDF weight assigning method We filter outwords either not occurring frequently enough or having anear constant distribution among the different documentsWe use top-119873 age (for a user specified 119873 we used 60)as the final set of keywords to be used in the text miningphase [25] The keywords are sorted in descending order bythe algorithm based on their IDF scores

42 Identifying Valid Positive and Negative Association RulesBy a valid association rule we mean any expression of theform 119860 rArr 119861 where 119860 isin 119883 not119883 119861 isin 119884 not119884 119883119884 sub 119868 and119883 cap 119884 = 0 st Consider

(i) supp(119860 rArr 119861) ge ms(ii) supp(119883) ge ms supp(119884) ge ms(iii) conf(119860 rArr 119861) ge mc(iv) lift(119860 rArr 119861) gt 1

Let us consider an example from our medical ldquocancerrdquo blogsrsquotext dataset We analyze peoplersquos behavior about the cancerand mole We have

(i) supp(mole) = 04 supp(notmole) = 06(ii) supp(cancer) = 06 supp(notcancer) = 04(iii) supp(cancer cupmole) = 005(iv) min supp = 02(v) min conf = 06

From above we can see the following

(i) supp(cancer cup mole) = 005 lt min supp thereforemole cup cancer is an infrequent itemset (inFIS)

(ii) conf(mole rArr cancer) = 0125 lt min conf from (3)

(a) therefore (mole rArr cancer) cannot be generatedas a valid rule using the support-confidenceframework

Now we try to generate a negative rule from this example

The Scientific World Journal 5

(i) supp(molecupnotcancer) = supp(mole) minus supp(cancercupmole)

(a) = 04 minus 005 = 035 gt min supp from (6)

(ii) conf(mole rArr notcancer) = supp(mole cup notcancer)supp(mole) from (7)

(a) = 03504 = 0875 gt min conf therefore wecan generate (mole rArr notcancer) as a negativerule

(iii) lift(mole rArr notcancer) = supp(mole cup

notcancer) supp(mole) supp(notcancer)

(a) = 035(04 lowast 04) = 21875 gt 1 which ismuch greater than 1 showing a strong relationbetween the presence of mole and absence ofcancer therefore we can generate (mole rArr

notcancer) as a valid negative rule with 875confidence and strong association relationshipbetween the presence and absence of mole andcancer respectively

The above example clearly shows the importance ofinfrequent itemsets and the generated negative associationrules and their capability to track the important implica-tionsassociations which would have been missed whenmining only positive association rules

43 Proposed Algorithm Apriori FISinFIS (See Algorithm 1)The Apriori FISinFIS procedure generates all frequent andinfrequent itemsets of interest in a given database 119863 whereFIS is the set of all frequent itemsets of interest in119863 and inFISis the set of all infrequent itemsets in119863 FIS and inFIS containonly frequent and infrequent itemsets of interest respectively

The initialization is done in Step (1) Step (2) generatestemp1 all itemsets of size 1 in step (21) we generate FIS

1 all

frequent itemsets of size 1 while in step (22) all infrequentitemsets of size 1 in database119863 are generated in inFIS

1in the

first pass of119863Step (3) generates FIS

119896and inFIS

119896for 119896 ge 2 by a loop

where FIS119896is the set of all frequent 119896-itemsets which have

greater support than user defined minimum threshold inthe 119896th pass of 119863 inFIS

119896is the set of all infrequent 119896-

itemsets which have less support than user definedminimumthreshold The loop terminates when all the temporaryitemsets have been tried that is temp

119896minus1= 0 For each pass

of the database in Step (3) say pass 119896 there are five substepsas follows

Step (31) generates candidate itemsets 119862119896of all 119896-

itemsets in119863 where each 119896-itemset in119862119896is generated by two

frequent itemsets in temp119896minus1

The itemsets in 119862119896are counted

in 119863 using a loop in Step (32) Step (33) calculates supportof each itemset in 119862

119896and step (34) stores the generated

itemsets in a temporary data structure We have used animplementation of ldquoHashMaprdquo in our experimentation as atemporary data structure

Then FIS119896and inFIS

119896are generated in Steps (4) and (5)

respectively FIS119896is the set of all potentially useful frequent

119896-itemsets in temp119896 which have greater support value than

the minsupp inFIS119896is the set of all infrequent 119896-itemsets in

temp119896 which have less support values than the minsuppThe

FIS119896and inFIS

119896are added to the FIS and inFIS in Steps (6) and

(7) Step (8) increments the itemset size The procedure endsin Step (9)which outputs frequent and infrequent itemsets inFIS and inFIS respectively

44 Algorithm FISinFIS Based Positive and Negative Asso-ciation Rule Mining Algorithm 2 generates positive andnegative association rules from both the frequent itemsets(FIS) and infrequent itemsets (inFIS) Step (1) initializes thepositive and negative association rule sets as empty Step (2)

generates association rules from FIS in step (21) positiveassociation rules of the form 119860 rArr 119861 or 119861 rArr 119860 whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid positive associationrules Step (22) generates negative association rules of theform 119860 rArr not119861 not119860 rArr 119861 not119860 rArr not119861 and so forth whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid negative associationrules (Figure 2)

Step (3) generates association rules from inFIS in step(31) positive association rule of the form 119860 rArr 119861 or119861 rArr 119860 which has greater confidence than the user definedthreshold and lift greater than 1 is extracted as a valid positiveassociation rule Step (32) generates negative association ruleof the form 119860 rArr not119861 not119860 rArr 119861 or not119860 rArr not119861 which hasgreater confidence than the user defined threshold and liftgreater than 1 and is extracted as a valid negative associationrule

5 Discovering Association Rules amongFrequent and Infrequent Items

Mining positive association rules from frequent itemsets isrelatively a trivial issue and has been extensively studied inthe literature Mining negative association rules of the form119860 rArr not119861 not119860 rArr 119861 119861 rArr not119860 or not119861 rArr 119860 and so forth fromtextual datasets however is a difficult task where 119860 cup 119861 is anonfrequent itemset Database (corpus)119863 has an exponentialscore of nonfrequent itemsets therefore negative associationrule mining stipulates the examination of much more searchspace than positive association rules

We in this work propose that for the itemsets occurringfrequently given the user definedmin-sup their subitems canbe negatively correlated leading to the discovery of negativeassociation rules Similarly the infrequent itemsets may havetheir subitems with a strong positive correlation leading tothe discovery of positive association rules

Let 119868 be the set of items in database119863 such that

119868 = 119860 cup 119861119860 cap 119861 = 120593

Thresholds minsup and minconf are given by the user

51 Generating Association Rules among Frequent ItemsetsSee Algorithm 3

6 The Scientific World Journal

Input TD Transaction (Document) Databasems minimum support thresholdOutput FIS frequent itemsets inFIS infrequent itemsets(1) initialize FIS = Φ inFIS = Φ(2) temp1 = forall119860 | 119860 isin top(N) IDF terms lowast get all 1-itemsets being in top-N IDF items lowast(21) FIS1 = 119860 | 119860 isin temp1 and support (119860) ge (ms) lowast get frequent 1-itemsets lowast(22) inFIS1 = temp1minus FIS1 lowast all infrequent 1-itemsets lowast119896 = 2 lowast initialize for itemsets greater than 1 lowast(3) while (temp

119896minus1= Φ) do begin

(31) 119862119896= generate (temp

119896minus1 ms) lowast candidate 119896-itemsets lowast

(32) for each transaction 119905 isin TDdo begin lowast scan database TDlowast119862119905= subset(119862

119896 119905) lowast get temp candidates in transaction 119905lowast

for each candidate 119888 isin 119862119905

ccount++ lowast increase count of itemsets if it exists in transaction lowastend

(33) csupport = (119888count|TD|

) lowast calculate support of candidate 119896-itemset lowast

(34) temp119896= 119888 | 119888 isin 119862

119896and (119888support ge minsup) lowast add to temp 119896-itemsets lowast

(4) FISk = 119860 | 119860 isin temp119896and 119860 support gems) lowast add to frequent 119896-itemsets if supp(119860) greater than minsupplowast

(5) inFISk = temp119896minus FIS

119896 lowast add to in-frequent 119896-itemsets having support less than minsupplowast

(6) FIS = cup119896FIS119896 lowast add generated frequent 119896-itemsets to FIS lowast

(7) inFIS = cup119896inFIS

119896 lowast add generated in-frequent 119896-itemsets to inFIS lowast

(8) k++ lowast increment itemset size by 1 lowastend(9) return FIS and inFIS

Algorithm 1

52 Generating Association Rules among Infrequent ItemsetsFor brevity we only consider one form of association fromboth positive and negative that is 119860 rArr 119861 and 119860 rArr not119861 theother forms can similarly be extracted

In the description of association rules as shown inAlgorithm 4 supp(119860 cup 119861) ge minsupp guarantees thatthe association rule describes the relationship among itemsof a frequent itemset whereas supp(119860 cup 119861) lt minsuppguarantees that the association rule describes the relationshipamong items of an infrequent itemset however the subitemsof the itemset need to be frequent as enforced by theconditions supp(119860) ge minsupp and supp(119861) ge minsuppThe interestingness measure lift has to be greater than 1articulating a positive dependency among the itemsets thevalue of lift less than 1 will articulate a negative relationshipamong the itemsets

The algorithm generates a complete set of positive andnegative association rules from both frequent and infrequentitemsets The frequent itemsets have traditionally been usedto generate positive association rules however we argue thatitems in frequent itemsets can be negatively correlated Thiscan be illustrated using the following example

minsupp = 02 minconf = 06supp(119860) = 07 supp(119861) = 05 supp(119860 cup 119861) = 025conf(119860 rArr 119861) = 03571 lift(119860 rArr 119861) = 07143conf(119861 rArr 119860) = 05 lift(119861 rArr 119860) = 07143supp(119860 cup not119861) = 07 minus 025 = 045 conf(119860 rArr not119861) =

04507 = 06429lift(119860 rArr not119861) = 045(07 lowast 05) = 12857

The above example clearly shows that itemsets despitebeing frequent can have negative relationships among theiritem subsetsTherefore in Step (22) of Algorithm 2 we try togenerate negative association rules using frequent itemsets

The infrequent itemsets on the other hand have eitherbeen totally ignored while generating associations or mostlyused to generate only negative association rules Howeverinfrequent itemsets have potentially valid and important pos-itive association rules among them having high confidenceand strongly positive correlationThis can be illustrated usingthe following example

minsupp = 03 minconf = 07supp(119860) = 08 supp(119861) = 02 supp(119860 cup 119861) = 02conf(119860 rArr 119861) = 025 lift(119860 rArr 119861) = 119conf(119861 rArr 119860) = 1 lift(119861 rArr 119860) = 119

We can visualize from the example that (119860 cup 119861) in spiteof being an infrequent itemset has a very strong positiveassociation 119861 rArr 119860 having 100 confidence and a positivecorrelation Our proposed algorithm covers the generationof such rules as explained in Step (31) of our proposedalgorithm

6 Experimental Results and Discussions

We performed our experiments on medical blogs datasetsmostly authored by patients writing about their problems andexperiences The datasets have been collected from differentmedical blog sites

(i) Cancer Survivors Network [httpcsncancerorg]

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

2 The Scientific World Journal

rules from the frequent itemsets and (c) the extraction ofpositive association rules from infrequent itemsets

The rest of this paper is organized as follows In thenext section we present brief introduction to data miningterminology and background Section 3 reviews related workon association rule mining In Section 4 we describe themethodology for identifying both frequent and infrequentitemsets of interest and generation of association rules basedon these itemsetsThe proposedmodel for extracting positiveand negative association rules is presented in Section 5 andthe final section of this paper gives the details of the exper-imental results comparisons and conclusions Section 6 pre-sents the experimental results and the conclusion and futuredirections are given in Section 7

2 Terminology and Background

Let us consider 119868 = 1198941 1198942 119894

119873 as a set of 119873 distinct liter-

alsterms called items and let119863 be a database of transactions(documentsblogs etc) where each transaction 119879 is a set ofitemsterms such that 119879 is a subset of ldquo119868rdquo Each transactionis associated with a unique identifier called 119879

119868119863 Let 119860 119861

be sets of items an association rule is a derivation of theform 119860 rArr 119861 where 119860 sub 119868 119861 sub 119868 and 119860 cap 119861 = Oslashldquo119860rdquo is called the antecedent of the rule and ldquo119861rdquo is calledthe consequent of the rule An association rule 119860 rArr 119861

can have different measures denoting its significance andquality In our approach we have employed (i) support bydenoting it by supp which is the percentage of transactionsin database 119863 containing both 119860 and 119861 (ii) confidence bydenoting it by conf which is representing the percentage oftransactions in 119863 containing 119860 that also contain 119861 whichcan be denoted in probability terms by 119875(119861 | 119860) and(iii) lift by denoting it by lift characterizing the directionof relationship between the antecedent and consequent ofthe association rule Rules having the support value greaterthan user defined minimum support minsupp in which theitemset needs to be present in minimum threshold numberof transactions and confidence greater than user definedminimum confidence minconf are called valid associationrules The lift symbolizes the association whether positive ornegative A value of lift greater than 1 indicates a positiverelationship between the itemsets value of lift less than 1indicates a negative relationship and where the value of liftequals 1 the itemsets are independent and there exists norelationship between the itemsets

Some of the above and derived definitions can be repre-sented with the following mathematical equations

supp (119860) lowastNumberPercentage of transaction(s)

containing 119860lowast

(1)

supp (119860 997904rArr 119861)

= 119875 (119860119861) lowastNumberPercentage of

transactions where 119860 and 119861 co-exist lowast(2)

conf (119860 997904rArr 119861)

=119875 (119860119861)

119875 (119860)lowastConfidence measure of the rule that

whenever 119860 occurs 119861 also occurs

in transaction(s) lowast(3)

lift (119860 997904rArr 119861)

=119875 (119860119861)

119875 (119860) 119875 (119861)lowastThe strength of relationship

between 119860 and 119861lowast

= 1 +119875 (119860119861) minus 119875 (119860) 119875 (119861)

119875 (119860) 119875 (119861)

(4)

Supp (not119860) = 1 minus Supp (119860) (5)

Supp (119860 cup not119861) = Supp (119860) minus Supp (119860 cup 119861) (6)

Conf (119860 997904rArr not119861) = 1 minus Conf (119860 997904rArr 119861) =119875 (119860not119861)

119875 (119860)(7)

Supp (not119860 cup 119861) = Supp (119861) minus Supp (119861 cup 119860) (8)

Conf (not119860 997904rArr 119861) =Supp (119861) (1 minus Conf (119861 997904rArr 119860))

1 minus 119875 (119860)

=Supp (not119860 cup 119861)

Supp (not119860)

(9)

Supp (not119860 cup not119861) = 1 minus Supp (119860) minus Supp (119861) + Supp (119860 cup 119861)

(10)

Conf (not119860 997904rArr not119861)

=1 minus Supp (119860) minus Supp (119861) + Supp (119860 cup 119861)

1 minus 119875 (119860)

=Supp (not119860 cup not119861)

Supp (not119860)

(11)

The huge number of infrequent items generates an evenhigher number of negative rules in comparison to the positiveassociation rulesThe problem is overwhelmed when dealingwith text where wordsterms are items and the documentsare transactions It is also difficult to set a threshold for theminimum support as a measure for text because of the hugenumber of unique and sporadic items (words) in a textualdataset Indexing (assigning weights to the terms) of the textdocuments is very important for them to be used as trans-actions for extracting association rules Indexing techniquesfrom the information retrieval field [10] can greatly benefitin this regard Index terms the words whose semantics helpin identifying the documentrsquos main subject matter [11] helpdescribe a document They possess different relevance to agiven document in the collection and therefore the assigneddifferent numerical weights Text mining aims to retrieveinformation from unstructured text to present the extracted

The Scientific World Journal 3

Table 1 IDF scores of sample keywords of the corpus

Selected keywords IDF Discarded keywords IDFChemo 22693 Threat 00158Radiation 22535 Disease 00157Tumor 22316 Severe 001426Surgery 22135 Produce 001392Cancer 22013 Result 001194Temodar 19830 Need 000639CT scan 19812 Analysis 000136Glioblastoma 19609 Type 000940Skull 18906 New 000843Cause 18902 Level 000694sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

knowledge in a compact form to the users [12] The primarygoal is to provide the users with knowledge for the researchand educational purposes

It is therefore imperative to employ some weight assign-mentmechanismWe have used inverse document frequency(IDF) which denotes the importance of a term in a corpusSelection of features on the basis of IDF values needs a carefulreading that is which range of IDF value features is includedThis can greatly affect the results as choosing a very low valueof ldquo119873rdquo is feared to nullify the impact of IDF and choosing avery high value may result in losing important termsfeaturesof the datasetWe have proposed a user tuned parameter thatis top 119873 where the value of ldquo119873rdquo is chosen by the userdepending upon the characteristics of the data

21 IDF IDF weighting scheme is based on the intuition ofterm occurrence in a corpus It surmises that the fewer thedocuments containing a term are the more discerning theybecome and hence an important artefact The IDF helps inunderstanding the terms that carry special significance in acertain type of document corpus It assigns high weight toterms that occur rarely in a document corpus In a medicalcorpus for instance the word ldquodiseaserdquo is not likely to carrya significant meaning Instead a disease name for exampleldquocholerardquo or ldquocancerrdquo would carry significant meanings incharacterizing the document

Keeping this in view a higher weight should be assignedto the words that appear in documents in close connectionwith a certain topic while a lower weight should be assignedto those words that show up without any contextual back-ground IDF weighting is a broadly used method for textanalysis [10 13] We can mathematically represent IDF asIDF119905= log(119863119889119891

119905) where 119905 is the term whose weight is to be

calculated 119889119891119905represents the documents in which the term

is present and119863 symbolizes the document corpusTable 1 shows the example set of words and their respec-

tive IDF scores in a document corpus with a threshold scorevalue of 60

The number of words in the documents before IDFscore calculation and top 60 selection is 280254 and afterselecting the top 60 based on IDF scores goes down to

81733 The bold part of the table shows a sample of theeradicated words which have IDF scores below thresholdvalue

3 Literature Review

The association rule mining is aimed at the discovery ofassociations among itemsets of a transactional databaseResearchers have extensively studied association rules sincetheir introduction by [14] Apriori is the most well-knownassociation rule mining algorithm The algorithm has a two-step process (1) frequent itemset generation (doing multiplescans of the database) and (2) association rule generationThe major advantage of Apriori over other association rulemining algorithms is its simple and easy implementationHowever multiple scans over the database to find rules makeApriori algorithmrsquos convergence slower for large databases

The other popular association rule mining algorithmfrequent pattern-growth (FP-Growth) proposed by Han etal [3] compresses the data into an FP-Tree for identifyingfrequent itemsets FP algorithm makes fewer scans of thedatabase making it practically usable for large databases liketext

However there has been very little research done forfinding negative association rules among infrequent itemsetsThe association rule mining algorithms are seldom designedfor mining negative association rules Most of the existingalgorithms can rarely be applied in their current capacity inthe context of negative association rule mining The recentpast however has witnessed a shift in the focus of theassociation rule mining community which is now focusingmore on negative association rules extraction [1 15ndash19]Delgado et al have proposed a framework for fuzzy rules thatextends the interestingmeasures for their validation from thecrisp to the fuzzy case [20] A fuzzy approach for miningassociation rules using the crisp methodology that involvesthe absent items is presented in [21]

Savasere et al [22] presented an algorithm for extractingstrong negative association rules They combine frequentitemsets and the domain knowledge to form taxonomy formining negative association rules Their approach howeverrequires users to set a predefined domain dependent hier-archical classification structure which makes it difficult andhard to generalizeMorzy presented theDI-Apriori algorithmfor extracting dissociation rules among negatively associateditemsets The algorithm keeps the number of generatedpatterns low however the proposedmethod does not captureall types of generalized association rules [23]

Dong et al [16] introduced an interest value for gen-erating both positive and negative candidate items Thealgorithm is Apriori-like for mining positive and nega-tive association rules Antonie and Zaıane [15] presenteda 119904119906119901119901119900119903119905 119888119900119899119891119894119889119890119899119888119890 119886119899119889 119888119900119903119903119890119897119886119905119894119900119899 coefficient basedalgorithm for mining positive and negative association rulesthe coefficients need to be continuously updated whilerunning the algorithm also the generation of all negativeassociation rules is not guaranteed

Wu et al proposed negative and positive rule miningframework based on the Apriori algorithmTheir work does

4 The Scientific World Journal

not focus on itemset dependency rather it focuses on ruledependency measures More specifically they have employedthe Piatetsky-Shapiro argument [24] about the associationrules that is a rule 119883 rArr 119884 is interesting if and onlyif supp(119860 rArr 119861) = supp(119860) supp(119861) This can be furtherexplained using the following

A rule is as follows 119860 rArr not119861 is valid if and only if

(1) lift(119860 not119861) = | supp(119860cupnot119861)minus supp(119860) supp(not119861)| gt 1

indicates a positive relationship between119883 and not119884(2) the other condition for a rule to be valid implies that

119883 119884 120598 frequent itemsets

The second condition can greatly increase the efficiencyof the rule mining process because all the antecedentor consequent parts of ARs can be ignored where 119883

119884 120598 Infrequent itemset This is needed because the itemsetis infrequent so Apriori does not guarantee the subsets to befrequent

Association rule mining research mostly concentrates onpositive association rules The negative association rule min-ing methods reported in literature generally target marketbasket data or other numeric or structured data Complexityof generating negative association rules from text data thusbecomes twofold that is dealing with the text and generatingnegative association rules as well as positive rules As wedemonstrate in the later sections negative association rulemining from text is different from discovering associationrules in numeric databases and identifying negative asso-ciations raises new problems such as dealing with frequentitemsets of interest and the number of involved infrequentitemsets This necessitates the exploration of specific andefficient mining models for discovering positive and negativeassociation rules from text databases

4 Extraction of Association Rules

We present an algorithm for mining both positive andnegative association rules from frequent and infrequentitemsets The algorithm discovers a complete set of positiveand negative association rules simultaneously There are fewalgorithms formining association rules for both frequent andinfrequent itemsets from textual datasets

We can divide association rule mining into the following

(1) finding the interesting frequent and infrequent item-sets in the database119863

(2) finding positive and negative association rules fromthe frequent and infrequent itemsets which we get inthe first step

The mining of association rules appears to be the coreissue however generation and selection of interesting fre-quent and infrequent itemsets is equally important Wediscuss the details of both in the following discussion

41 Identifying Frequent and Infrequent Itemsets As men-tioned above the number of extracted items (both frequentand infrequent) from the text datasets can be very large

with only a fraction of them being important enough forgenerating the interesting association rules Selection of theuseful itemsets therefore is challenging The support of anitem is a relativemeasure with respect to the databasecorpussize Let us suppose that the support of an itemset 119883 is04 in 100 transactions that is 40 of transactions containthe itemset Now if 100 more transactions are added to thedataset and only 10 of the added 100 transactions containthe itemset 119883 the overall support of the itemset 119883 will be025 that is 25 of the transactions now contain itemset 119883Therefore the support of an itemset is a relative measureHence we cannot rely only on support measure for theselection of importantfrequent itemsets

The handling of large number of itemsets which is thecase when dealingwith textual datasets ismore evident whendealing with infrequent itemsets This is because the numberof infrequent itemsets rises exponentially [1] Thereforewe only select those termsitems from the collection ofdocuments which have importance in the corpus This isdone using the IDF weight assigning method We filter outwords either not occurring frequently enough or having anear constant distribution among the different documentsWe use top-119873 age (for a user specified 119873 we used 60)as the final set of keywords to be used in the text miningphase [25] The keywords are sorted in descending order bythe algorithm based on their IDF scores

42 Identifying Valid Positive and Negative Association RulesBy a valid association rule we mean any expression of theform 119860 rArr 119861 where 119860 isin 119883 not119883 119861 isin 119884 not119884 119883119884 sub 119868 and119883 cap 119884 = 0 st Consider

(i) supp(119860 rArr 119861) ge ms(ii) supp(119883) ge ms supp(119884) ge ms(iii) conf(119860 rArr 119861) ge mc(iv) lift(119860 rArr 119861) gt 1

Let us consider an example from our medical ldquocancerrdquo blogsrsquotext dataset We analyze peoplersquos behavior about the cancerand mole We have

(i) supp(mole) = 04 supp(notmole) = 06(ii) supp(cancer) = 06 supp(notcancer) = 04(iii) supp(cancer cupmole) = 005(iv) min supp = 02(v) min conf = 06

From above we can see the following

(i) supp(cancer cup mole) = 005 lt min supp thereforemole cup cancer is an infrequent itemset (inFIS)

(ii) conf(mole rArr cancer) = 0125 lt min conf from (3)

(a) therefore (mole rArr cancer) cannot be generatedas a valid rule using the support-confidenceframework

Now we try to generate a negative rule from this example

The Scientific World Journal 5

(i) supp(molecupnotcancer) = supp(mole) minus supp(cancercupmole)

(a) = 04 minus 005 = 035 gt min supp from (6)

(ii) conf(mole rArr notcancer) = supp(mole cup notcancer)supp(mole) from (7)

(a) = 03504 = 0875 gt min conf therefore wecan generate (mole rArr notcancer) as a negativerule

(iii) lift(mole rArr notcancer) = supp(mole cup

notcancer) supp(mole) supp(notcancer)

(a) = 035(04 lowast 04) = 21875 gt 1 which ismuch greater than 1 showing a strong relationbetween the presence of mole and absence ofcancer therefore we can generate (mole rArr

notcancer) as a valid negative rule with 875confidence and strong association relationshipbetween the presence and absence of mole andcancer respectively

The above example clearly shows the importance ofinfrequent itemsets and the generated negative associationrules and their capability to track the important implica-tionsassociations which would have been missed whenmining only positive association rules

43 Proposed Algorithm Apriori FISinFIS (See Algorithm 1)The Apriori FISinFIS procedure generates all frequent andinfrequent itemsets of interest in a given database 119863 whereFIS is the set of all frequent itemsets of interest in119863 and inFISis the set of all infrequent itemsets in119863 FIS and inFIS containonly frequent and infrequent itemsets of interest respectively

The initialization is done in Step (1) Step (2) generatestemp1 all itemsets of size 1 in step (21) we generate FIS

1 all

frequent itemsets of size 1 while in step (22) all infrequentitemsets of size 1 in database119863 are generated in inFIS

1in the

first pass of119863Step (3) generates FIS

119896and inFIS

119896for 119896 ge 2 by a loop

where FIS119896is the set of all frequent 119896-itemsets which have

greater support than user defined minimum threshold inthe 119896th pass of 119863 inFIS

119896is the set of all infrequent 119896-

itemsets which have less support than user definedminimumthreshold The loop terminates when all the temporaryitemsets have been tried that is temp

119896minus1= 0 For each pass

of the database in Step (3) say pass 119896 there are five substepsas follows

Step (31) generates candidate itemsets 119862119896of all 119896-

itemsets in119863 where each 119896-itemset in119862119896is generated by two

frequent itemsets in temp119896minus1

The itemsets in 119862119896are counted

in 119863 using a loop in Step (32) Step (33) calculates supportof each itemset in 119862

119896and step (34) stores the generated

itemsets in a temporary data structure We have used animplementation of ldquoHashMaprdquo in our experimentation as atemporary data structure

Then FIS119896and inFIS

119896are generated in Steps (4) and (5)

respectively FIS119896is the set of all potentially useful frequent

119896-itemsets in temp119896 which have greater support value than

the minsupp inFIS119896is the set of all infrequent 119896-itemsets in

temp119896 which have less support values than the minsuppThe

FIS119896and inFIS

119896are added to the FIS and inFIS in Steps (6) and

(7) Step (8) increments the itemset size The procedure endsin Step (9)which outputs frequent and infrequent itemsets inFIS and inFIS respectively

44 Algorithm FISinFIS Based Positive and Negative Asso-ciation Rule Mining Algorithm 2 generates positive andnegative association rules from both the frequent itemsets(FIS) and infrequent itemsets (inFIS) Step (1) initializes thepositive and negative association rule sets as empty Step (2)

generates association rules from FIS in step (21) positiveassociation rules of the form 119860 rArr 119861 or 119861 rArr 119860 whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid positive associationrules Step (22) generates negative association rules of theform 119860 rArr not119861 not119860 rArr 119861 not119860 rArr not119861 and so forth whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid negative associationrules (Figure 2)

Step (3) generates association rules from inFIS in step(31) positive association rule of the form 119860 rArr 119861 or119861 rArr 119860 which has greater confidence than the user definedthreshold and lift greater than 1 is extracted as a valid positiveassociation rule Step (32) generates negative association ruleof the form 119860 rArr not119861 not119860 rArr 119861 or not119860 rArr not119861 which hasgreater confidence than the user defined threshold and liftgreater than 1 and is extracted as a valid negative associationrule

5 Discovering Association Rules amongFrequent and Infrequent Items

Mining positive association rules from frequent itemsets isrelatively a trivial issue and has been extensively studied inthe literature Mining negative association rules of the form119860 rArr not119861 not119860 rArr 119861 119861 rArr not119860 or not119861 rArr 119860 and so forth fromtextual datasets however is a difficult task where 119860 cup 119861 is anonfrequent itemset Database (corpus)119863 has an exponentialscore of nonfrequent itemsets therefore negative associationrule mining stipulates the examination of much more searchspace than positive association rules

We in this work propose that for the itemsets occurringfrequently given the user definedmin-sup their subitems canbe negatively correlated leading to the discovery of negativeassociation rules Similarly the infrequent itemsets may havetheir subitems with a strong positive correlation leading tothe discovery of positive association rules

Let 119868 be the set of items in database119863 such that

119868 = 119860 cup 119861119860 cap 119861 = 120593

Thresholds minsup and minconf are given by the user

51 Generating Association Rules among Frequent ItemsetsSee Algorithm 3

6 The Scientific World Journal

Input TD Transaction (Document) Databasems minimum support thresholdOutput FIS frequent itemsets inFIS infrequent itemsets(1) initialize FIS = Φ inFIS = Φ(2) temp1 = forall119860 | 119860 isin top(N) IDF terms lowast get all 1-itemsets being in top-N IDF items lowast(21) FIS1 = 119860 | 119860 isin temp1 and support (119860) ge (ms) lowast get frequent 1-itemsets lowast(22) inFIS1 = temp1minus FIS1 lowast all infrequent 1-itemsets lowast119896 = 2 lowast initialize for itemsets greater than 1 lowast(3) while (temp

119896minus1= Φ) do begin

(31) 119862119896= generate (temp

119896minus1 ms) lowast candidate 119896-itemsets lowast

(32) for each transaction 119905 isin TDdo begin lowast scan database TDlowast119862119905= subset(119862

119896 119905) lowast get temp candidates in transaction 119905lowast

for each candidate 119888 isin 119862119905

ccount++ lowast increase count of itemsets if it exists in transaction lowastend

(33) csupport = (119888count|TD|

) lowast calculate support of candidate 119896-itemset lowast

(34) temp119896= 119888 | 119888 isin 119862

119896and (119888support ge minsup) lowast add to temp 119896-itemsets lowast

(4) FISk = 119860 | 119860 isin temp119896and 119860 support gems) lowast add to frequent 119896-itemsets if supp(119860) greater than minsupplowast

(5) inFISk = temp119896minus FIS

119896 lowast add to in-frequent 119896-itemsets having support less than minsupplowast

(6) FIS = cup119896FIS119896 lowast add generated frequent 119896-itemsets to FIS lowast

(7) inFIS = cup119896inFIS

119896 lowast add generated in-frequent 119896-itemsets to inFIS lowast

(8) k++ lowast increment itemset size by 1 lowastend(9) return FIS and inFIS

Algorithm 1

52 Generating Association Rules among Infrequent ItemsetsFor brevity we only consider one form of association fromboth positive and negative that is 119860 rArr 119861 and 119860 rArr not119861 theother forms can similarly be extracted

In the description of association rules as shown inAlgorithm 4 supp(119860 cup 119861) ge minsupp guarantees thatthe association rule describes the relationship among itemsof a frequent itemset whereas supp(119860 cup 119861) lt minsuppguarantees that the association rule describes the relationshipamong items of an infrequent itemset however the subitemsof the itemset need to be frequent as enforced by theconditions supp(119860) ge minsupp and supp(119861) ge minsuppThe interestingness measure lift has to be greater than 1articulating a positive dependency among the itemsets thevalue of lift less than 1 will articulate a negative relationshipamong the itemsets

The algorithm generates a complete set of positive andnegative association rules from both frequent and infrequentitemsets The frequent itemsets have traditionally been usedto generate positive association rules however we argue thatitems in frequent itemsets can be negatively correlated Thiscan be illustrated using the following example

minsupp = 02 minconf = 06supp(119860) = 07 supp(119861) = 05 supp(119860 cup 119861) = 025conf(119860 rArr 119861) = 03571 lift(119860 rArr 119861) = 07143conf(119861 rArr 119860) = 05 lift(119861 rArr 119860) = 07143supp(119860 cup not119861) = 07 minus 025 = 045 conf(119860 rArr not119861) =

04507 = 06429lift(119860 rArr not119861) = 045(07 lowast 05) = 12857

The above example clearly shows that itemsets despitebeing frequent can have negative relationships among theiritem subsetsTherefore in Step (22) of Algorithm 2 we try togenerate negative association rules using frequent itemsets

The infrequent itemsets on the other hand have eitherbeen totally ignored while generating associations or mostlyused to generate only negative association rules Howeverinfrequent itemsets have potentially valid and important pos-itive association rules among them having high confidenceand strongly positive correlationThis can be illustrated usingthe following example

minsupp = 03 minconf = 07supp(119860) = 08 supp(119861) = 02 supp(119860 cup 119861) = 02conf(119860 rArr 119861) = 025 lift(119860 rArr 119861) = 119conf(119861 rArr 119860) = 1 lift(119861 rArr 119860) = 119

We can visualize from the example that (119860 cup 119861) in spiteof being an infrequent itemset has a very strong positiveassociation 119861 rArr 119860 having 100 confidence and a positivecorrelation Our proposed algorithm covers the generationof such rules as explained in Step (31) of our proposedalgorithm

6 Experimental Results and Discussions

We performed our experiments on medical blogs datasetsmostly authored by patients writing about their problems andexperiences The datasets have been collected from differentmedical blog sites

(i) Cancer Survivors Network [httpcsncancerorg]

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

The Scientific World Journal 3

Table 1 IDF scores of sample keywords of the corpus

Selected keywords IDF Discarded keywords IDFChemo 22693 Threat 00158Radiation 22535 Disease 00157Tumor 22316 Severe 001426Surgery 22135 Produce 001392Cancer 22013 Result 001194Temodar 19830 Need 000639CT scan 19812 Analysis 000136Glioblastoma 19609 Type 000940Skull 18906 New 000843Cause 18902 Level 000694sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot sdot

knowledge in a compact form to the users [12] The primarygoal is to provide the users with knowledge for the researchand educational purposes

It is therefore imperative to employ some weight assign-mentmechanismWe have used inverse document frequency(IDF) which denotes the importance of a term in a corpusSelection of features on the basis of IDF values needs a carefulreading that is which range of IDF value features is includedThis can greatly affect the results as choosing a very low valueof ldquo119873rdquo is feared to nullify the impact of IDF and choosing avery high value may result in losing important termsfeaturesof the datasetWe have proposed a user tuned parameter thatis top 119873 where the value of ldquo119873rdquo is chosen by the userdepending upon the characteristics of the data

21 IDF IDF weighting scheme is based on the intuition ofterm occurrence in a corpus It surmises that the fewer thedocuments containing a term are the more discerning theybecome and hence an important artefact The IDF helps inunderstanding the terms that carry special significance in acertain type of document corpus It assigns high weight toterms that occur rarely in a document corpus In a medicalcorpus for instance the word ldquodiseaserdquo is not likely to carrya significant meaning Instead a disease name for exampleldquocholerardquo or ldquocancerrdquo would carry significant meanings incharacterizing the document

Keeping this in view a higher weight should be assignedto the words that appear in documents in close connectionwith a certain topic while a lower weight should be assignedto those words that show up without any contextual back-ground IDF weighting is a broadly used method for textanalysis [10 13] We can mathematically represent IDF asIDF119905= log(119863119889119891

119905) where 119905 is the term whose weight is to be

calculated 119889119891119905represents the documents in which the term

is present and119863 symbolizes the document corpusTable 1 shows the example set of words and their respec-

tive IDF scores in a document corpus with a threshold scorevalue of 60

The number of words in the documents before IDFscore calculation and top 60 selection is 280254 and afterselecting the top 60 based on IDF scores goes down to

81733 The bold part of the table shows a sample of theeradicated words which have IDF scores below thresholdvalue

3 Literature Review

The association rule mining is aimed at the discovery ofassociations among itemsets of a transactional databaseResearchers have extensively studied association rules sincetheir introduction by [14] Apriori is the most well-knownassociation rule mining algorithm The algorithm has a two-step process (1) frequent itemset generation (doing multiplescans of the database) and (2) association rule generationThe major advantage of Apriori over other association rulemining algorithms is its simple and easy implementationHowever multiple scans over the database to find rules makeApriori algorithmrsquos convergence slower for large databases

The other popular association rule mining algorithmfrequent pattern-growth (FP-Growth) proposed by Han etal [3] compresses the data into an FP-Tree for identifyingfrequent itemsets FP algorithm makes fewer scans of thedatabase making it practically usable for large databases liketext

However there has been very little research done forfinding negative association rules among infrequent itemsetsThe association rule mining algorithms are seldom designedfor mining negative association rules Most of the existingalgorithms can rarely be applied in their current capacity inthe context of negative association rule mining The recentpast however has witnessed a shift in the focus of theassociation rule mining community which is now focusingmore on negative association rules extraction [1 15ndash19]Delgado et al have proposed a framework for fuzzy rules thatextends the interestingmeasures for their validation from thecrisp to the fuzzy case [20] A fuzzy approach for miningassociation rules using the crisp methodology that involvesthe absent items is presented in [21]

Savasere et al [22] presented an algorithm for extractingstrong negative association rules They combine frequentitemsets and the domain knowledge to form taxonomy formining negative association rules Their approach howeverrequires users to set a predefined domain dependent hier-archical classification structure which makes it difficult andhard to generalizeMorzy presented theDI-Apriori algorithmfor extracting dissociation rules among negatively associateditemsets The algorithm keeps the number of generatedpatterns low however the proposedmethod does not captureall types of generalized association rules [23]

Dong et al [16] introduced an interest value for gen-erating both positive and negative candidate items Thealgorithm is Apriori-like for mining positive and nega-tive association rules Antonie and Zaıane [15] presenteda 119904119906119901119901119900119903119905 119888119900119899119891119894119889119890119899119888119890 119886119899119889 119888119900119903119903119890119897119886119905119894119900119899 coefficient basedalgorithm for mining positive and negative association rulesthe coefficients need to be continuously updated whilerunning the algorithm also the generation of all negativeassociation rules is not guaranteed

Wu et al proposed negative and positive rule miningframework based on the Apriori algorithmTheir work does

4 The Scientific World Journal

not focus on itemset dependency rather it focuses on ruledependency measures More specifically they have employedthe Piatetsky-Shapiro argument [24] about the associationrules that is a rule 119883 rArr 119884 is interesting if and onlyif supp(119860 rArr 119861) = supp(119860) supp(119861) This can be furtherexplained using the following

A rule is as follows 119860 rArr not119861 is valid if and only if

(1) lift(119860 not119861) = | supp(119860cupnot119861)minus supp(119860) supp(not119861)| gt 1

indicates a positive relationship between119883 and not119884(2) the other condition for a rule to be valid implies that

119883 119884 120598 frequent itemsets

The second condition can greatly increase the efficiencyof the rule mining process because all the antecedentor consequent parts of ARs can be ignored where 119883

119884 120598 Infrequent itemset This is needed because the itemsetis infrequent so Apriori does not guarantee the subsets to befrequent

Association rule mining research mostly concentrates onpositive association rules The negative association rule min-ing methods reported in literature generally target marketbasket data or other numeric or structured data Complexityof generating negative association rules from text data thusbecomes twofold that is dealing with the text and generatingnegative association rules as well as positive rules As wedemonstrate in the later sections negative association rulemining from text is different from discovering associationrules in numeric databases and identifying negative asso-ciations raises new problems such as dealing with frequentitemsets of interest and the number of involved infrequentitemsets This necessitates the exploration of specific andefficient mining models for discovering positive and negativeassociation rules from text databases

4 Extraction of Association Rules

We present an algorithm for mining both positive andnegative association rules from frequent and infrequentitemsets The algorithm discovers a complete set of positiveand negative association rules simultaneously There are fewalgorithms formining association rules for both frequent andinfrequent itemsets from textual datasets

We can divide association rule mining into the following

(1) finding the interesting frequent and infrequent item-sets in the database119863

(2) finding positive and negative association rules fromthe frequent and infrequent itemsets which we get inthe first step

The mining of association rules appears to be the coreissue however generation and selection of interesting fre-quent and infrequent itemsets is equally important Wediscuss the details of both in the following discussion

41 Identifying Frequent and Infrequent Itemsets As men-tioned above the number of extracted items (both frequentand infrequent) from the text datasets can be very large

with only a fraction of them being important enough forgenerating the interesting association rules Selection of theuseful itemsets therefore is challenging The support of anitem is a relativemeasure with respect to the databasecorpussize Let us suppose that the support of an itemset 119883 is04 in 100 transactions that is 40 of transactions containthe itemset Now if 100 more transactions are added to thedataset and only 10 of the added 100 transactions containthe itemset 119883 the overall support of the itemset 119883 will be025 that is 25 of the transactions now contain itemset 119883Therefore the support of an itemset is a relative measureHence we cannot rely only on support measure for theselection of importantfrequent itemsets

The handling of large number of itemsets which is thecase when dealingwith textual datasets ismore evident whendealing with infrequent itemsets This is because the numberof infrequent itemsets rises exponentially [1] Thereforewe only select those termsitems from the collection ofdocuments which have importance in the corpus This isdone using the IDF weight assigning method We filter outwords either not occurring frequently enough or having anear constant distribution among the different documentsWe use top-119873 age (for a user specified 119873 we used 60)as the final set of keywords to be used in the text miningphase [25] The keywords are sorted in descending order bythe algorithm based on their IDF scores

42 Identifying Valid Positive and Negative Association RulesBy a valid association rule we mean any expression of theform 119860 rArr 119861 where 119860 isin 119883 not119883 119861 isin 119884 not119884 119883119884 sub 119868 and119883 cap 119884 = 0 st Consider

(i) supp(119860 rArr 119861) ge ms(ii) supp(119883) ge ms supp(119884) ge ms(iii) conf(119860 rArr 119861) ge mc(iv) lift(119860 rArr 119861) gt 1

Let us consider an example from our medical ldquocancerrdquo blogsrsquotext dataset We analyze peoplersquos behavior about the cancerand mole We have

(i) supp(mole) = 04 supp(notmole) = 06(ii) supp(cancer) = 06 supp(notcancer) = 04(iii) supp(cancer cupmole) = 005(iv) min supp = 02(v) min conf = 06

From above we can see the following

(i) supp(cancer cup mole) = 005 lt min supp thereforemole cup cancer is an infrequent itemset (inFIS)

(ii) conf(mole rArr cancer) = 0125 lt min conf from (3)

(a) therefore (mole rArr cancer) cannot be generatedas a valid rule using the support-confidenceframework

Now we try to generate a negative rule from this example

The Scientific World Journal 5

(i) supp(molecupnotcancer) = supp(mole) minus supp(cancercupmole)

(a) = 04 minus 005 = 035 gt min supp from (6)

(ii) conf(mole rArr notcancer) = supp(mole cup notcancer)supp(mole) from (7)

(a) = 03504 = 0875 gt min conf therefore wecan generate (mole rArr notcancer) as a negativerule

(iii) lift(mole rArr notcancer) = supp(mole cup

notcancer) supp(mole) supp(notcancer)

(a) = 035(04 lowast 04) = 21875 gt 1 which ismuch greater than 1 showing a strong relationbetween the presence of mole and absence ofcancer therefore we can generate (mole rArr

notcancer) as a valid negative rule with 875confidence and strong association relationshipbetween the presence and absence of mole andcancer respectively

The above example clearly shows the importance ofinfrequent itemsets and the generated negative associationrules and their capability to track the important implica-tionsassociations which would have been missed whenmining only positive association rules

43 Proposed Algorithm Apriori FISinFIS (See Algorithm 1)The Apriori FISinFIS procedure generates all frequent andinfrequent itemsets of interest in a given database 119863 whereFIS is the set of all frequent itemsets of interest in119863 and inFISis the set of all infrequent itemsets in119863 FIS and inFIS containonly frequent and infrequent itemsets of interest respectively

The initialization is done in Step (1) Step (2) generatestemp1 all itemsets of size 1 in step (21) we generate FIS

1 all

frequent itemsets of size 1 while in step (22) all infrequentitemsets of size 1 in database119863 are generated in inFIS

1in the

first pass of119863Step (3) generates FIS

119896and inFIS

119896for 119896 ge 2 by a loop

where FIS119896is the set of all frequent 119896-itemsets which have

greater support than user defined minimum threshold inthe 119896th pass of 119863 inFIS

119896is the set of all infrequent 119896-

itemsets which have less support than user definedminimumthreshold The loop terminates when all the temporaryitemsets have been tried that is temp

119896minus1= 0 For each pass

of the database in Step (3) say pass 119896 there are five substepsas follows

Step (31) generates candidate itemsets 119862119896of all 119896-

itemsets in119863 where each 119896-itemset in119862119896is generated by two

frequent itemsets in temp119896minus1

The itemsets in 119862119896are counted

in 119863 using a loop in Step (32) Step (33) calculates supportof each itemset in 119862

119896and step (34) stores the generated

itemsets in a temporary data structure We have used animplementation of ldquoHashMaprdquo in our experimentation as atemporary data structure

Then FIS119896and inFIS

119896are generated in Steps (4) and (5)

respectively FIS119896is the set of all potentially useful frequent

119896-itemsets in temp119896 which have greater support value than

the minsupp inFIS119896is the set of all infrequent 119896-itemsets in

temp119896 which have less support values than the minsuppThe

FIS119896and inFIS

119896are added to the FIS and inFIS in Steps (6) and

(7) Step (8) increments the itemset size The procedure endsin Step (9)which outputs frequent and infrequent itemsets inFIS and inFIS respectively

44 Algorithm FISinFIS Based Positive and Negative Asso-ciation Rule Mining Algorithm 2 generates positive andnegative association rules from both the frequent itemsets(FIS) and infrequent itemsets (inFIS) Step (1) initializes thepositive and negative association rule sets as empty Step (2)

generates association rules from FIS in step (21) positiveassociation rules of the form 119860 rArr 119861 or 119861 rArr 119860 whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid positive associationrules Step (22) generates negative association rules of theform 119860 rArr not119861 not119860 rArr 119861 not119860 rArr not119861 and so forth whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid negative associationrules (Figure 2)

Step (3) generates association rules from inFIS in step(31) positive association rule of the form 119860 rArr 119861 or119861 rArr 119860 which has greater confidence than the user definedthreshold and lift greater than 1 is extracted as a valid positiveassociation rule Step (32) generates negative association ruleof the form 119860 rArr not119861 not119860 rArr 119861 or not119860 rArr not119861 which hasgreater confidence than the user defined threshold and liftgreater than 1 and is extracted as a valid negative associationrule

5 Discovering Association Rules amongFrequent and Infrequent Items

Mining positive association rules from frequent itemsets isrelatively a trivial issue and has been extensively studied inthe literature Mining negative association rules of the form119860 rArr not119861 not119860 rArr 119861 119861 rArr not119860 or not119861 rArr 119860 and so forth fromtextual datasets however is a difficult task where 119860 cup 119861 is anonfrequent itemset Database (corpus)119863 has an exponentialscore of nonfrequent itemsets therefore negative associationrule mining stipulates the examination of much more searchspace than positive association rules

We in this work propose that for the itemsets occurringfrequently given the user definedmin-sup their subitems canbe negatively correlated leading to the discovery of negativeassociation rules Similarly the infrequent itemsets may havetheir subitems with a strong positive correlation leading tothe discovery of positive association rules

Let 119868 be the set of items in database119863 such that

119868 = 119860 cup 119861119860 cap 119861 = 120593

Thresholds minsup and minconf are given by the user

51 Generating Association Rules among Frequent ItemsetsSee Algorithm 3

6 The Scientific World Journal

Input TD Transaction (Document) Databasems minimum support thresholdOutput FIS frequent itemsets inFIS infrequent itemsets(1) initialize FIS = Φ inFIS = Φ(2) temp1 = forall119860 | 119860 isin top(N) IDF terms lowast get all 1-itemsets being in top-N IDF items lowast(21) FIS1 = 119860 | 119860 isin temp1 and support (119860) ge (ms) lowast get frequent 1-itemsets lowast(22) inFIS1 = temp1minus FIS1 lowast all infrequent 1-itemsets lowast119896 = 2 lowast initialize for itemsets greater than 1 lowast(3) while (temp

119896minus1= Φ) do begin

(31) 119862119896= generate (temp

119896minus1 ms) lowast candidate 119896-itemsets lowast

(32) for each transaction 119905 isin TDdo begin lowast scan database TDlowast119862119905= subset(119862

119896 119905) lowast get temp candidates in transaction 119905lowast

for each candidate 119888 isin 119862119905

ccount++ lowast increase count of itemsets if it exists in transaction lowastend

(33) csupport = (119888count|TD|

) lowast calculate support of candidate 119896-itemset lowast

(34) temp119896= 119888 | 119888 isin 119862

119896and (119888support ge minsup) lowast add to temp 119896-itemsets lowast

(4) FISk = 119860 | 119860 isin temp119896and 119860 support gems) lowast add to frequent 119896-itemsets if supp(119860) greater than minsupplowast

(5) inFISk = temp119896minus FIS

119896 lowast add to in-frequent 119896-itemsets having support less than minsupplowast

(6) FIS = cup119896FIS119896 lowast add generated frequent 119896-itemsets to FIS lowast

(7) inFIS = cup119896inFIS

119896 lowast add generated in-frequent 119896-itemsets to inFIS lowast

(8) k++ lowast increment itemset size by 1 lowastend(9) return FIS and inFIS

Algorithm 1

52 Generating Association Rules among Infrequent ItemsetsFor brevity we only consider one form of association fromboth positive and negative that is 119860 rArr 119861 and 119860 rArr not119861 theother forms can similarly be extracted

In the description of association rules as shown inAlgorithm 4 supp(119860 cup 119861) ge minsupp guarantees thatthe association rule describes the relationship among itemsof a frequent itemset whereas supp(119860 cup 119861) lt minsuppguarantees that the association rule describes the relationshipamong items of an infrequent itemset however the subitemsof the itemset need to be frequent as enforced by theconditions supp(119860) ge minsupp and supp(119861) ge minsuppThe interestingness measure lift has to be greater than 1articulating a positive dependency among the itemsets thevalue of lift less than 1 will articulate a negative relationshipamong the itemsets

The algorithm generates a complete set of positive andnegative association rules from both frequent and infrequentitemsets The frequent itemsets have traditionally been usedto generate positive association rules however we argue thatitems in frequent itemsets can be negatively correlated Thiscan be illustrated using the following example

minsupp = 02 minconf = 06supp(119860) = 07 supp(119861) = 05 supp(119860 cup 119861) = 025conf(119860 rArr 119861) = 03571 lift(119860 rArr 119861) = 07143conf(119861 rArr 119860) = 05 lift(119861 rArr 119860) = 07143supp(119860 cup not119861) = 07 minus 025 = 045 conf(119860 rArr not119861) =

04507 = 06429lift(119860 rArr not119861) = 045(07 lowast 05) = 12857

The above example clearly shows that itemsets despitebeing frequent can have negative relationships among theiritem subsetsTherefore in Step (22) of Algorithm 2 we try togenerate negative association rules using frequent itemsets

The infrequent itemsets on the other hand have eitherbeen totally ignored while generating associations or mostlyused to generate only negative association rules Howeverinfrequent itemsets have potentially valid and important pos-itive association rules among them having high confidenceand strongly positive correlationThis can be illustrated usingthe following example

minsupp = 03 minconf = 07supp(119860) = 08 supp(119861) = 02 supp(119860 cup 119861) = 02conf(119860 rArr 119861) = 025 lift(119860 rArr 119861) = 119conf(119861 rArr 119860) = 1 lift(119861 rArr 119860) = 119

We can visualize from the example that (119860 cup 119861) in spiteof being an infrequent itemset has a very strong positiveassociation 119861 rArr 119860 having 100 confidence and a positivecorrelation Our proposed algorithm covers the generationof such rules as explained in Step (31) of our proposedalgorithm

6 Experimental Results and Discussions

We performed our experiments on medical blogs datasetsmostly authored by patients writing about their problems andexperiences The datasets have been collected from differentmedical blog sites

(i) Cancer Survivors Network [httpcsncancerorg]

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

4 The Scientific World Journal

not focus on itemset dependency rather it focuses on ruledependency measures More specifically they have employedthe Piatetsky-Shapiro argument [24] about the associationrules that is a rule 119883 rArr 119884 is interesting if and onlyif supp(119860 rArr 119861) = supp(119860) supp(119861) This can be furtherexplained using the following

A rule is as follows 119860 rArr not119861 is valid if and only if

(1) lift(119860 not119861) = | supp(119860cupnot119861)minus supp(119860) supp(not119861)| gt 1

indicates a positive relationship between119883 and not119884(2) the other condition for a rule to be valid implies that

119883 119884 120598 frequent itemsets

The second condition can greatly increase the efficiencyof the rule mining process because all the antecedentor consequent parts of ARs can be ignored where 119883

119884 120598 Infrequent itemset This is needed because the itemsetis infrequent so Apriori does not guarantee the subsets to befrequent

Association rule mining research mostly concentrates onpositive association rules The negative association rule min-ing methods reported in literature generally target marketbasket data or other numeric or structured data Complexityof generating negative association rules from text data thusbecomes twofold that is dealing with the text and generatingnegative association rules as well as positive rules As wedemonstrate in the later sections negative association rulemining from text is different from discovering associationrules in numeric databases and identifying negative asso-ciations raises new problems such as dealing with frequentitemsets of interest and the number of involved infrequentitemsets This necessitates the exploration of specific andefficient mining models for discovering positive and negativeassociation rules from text databases

4 Extraction of Association Rules

We present an algorithm for mining both positive andnegative association rules from frequent and infrequentitemsets The algorithm discovers a complete set of positiveand negative association rules simultaneously There are fewalgorithms formining association rules for both frequent andinfrequent itemsets from textual datasets

We can divide association rule mining into the following

(1) finding the interesting frequent and infrequent item-sets in the database119863

(2) finding positive and negative association rules fromthe frequent and infrequent itemsets which we get inthe first step

The mining of association rules appears to be the coreissue however generation and selection of interesting fre-quent and infrequent itemsets is equally important Wediscuss the details of both in the following discussion

41 Identifying Frequent and Infrequent Itemsets As men-tioned above the number of extracted items (both frequentand infrequent) from the text datasets can be very large

with only a fraction of them being important enough forgenerating the interesting association rules Selection of theuseful itemsets therefore is challenging The support of anitem is a relativemeasure with respect to the databasecorpussize Let us suppose that the support of an itemset 119883 is04 in 100 transactions that is 40 of transactions containthe itemset Now if 100 more transactions are added to thedataset and only 10 of the added 100 transactions containthe itemset 119883 the overall support of the itemset 119883 will be025 that is 25 of the transactions now contain itemset 119883Therefore the support of an itemset is a relative measureHence we cannot rely only on support measure for theselection of importantfrequent itemsets

The handling of large number of itemsets which is thecase when dealingwith textual datasets ismore evident whendealing with infrequent itemsets This is because the numberof infrequent itemsets rises exponentially [1] Thereforewe only select those termsitems from the collection ofdocuments which have importance in the corpus This isdone using the IDF weight assigning method We filter outwords either not occurring frequently enough or having anear constant distribution among the different documentsWe use top-119873 age (for a user specified 119873 we used 60)as the final set of keywords to be used in the text miningphase [25] The keywords are sorted in descending order bythe algorithm based on their IDF scores

42 Identifying Valid Positive and Negative Association RulesBy a valid association rule we mean any expression of theform 119860 rArr 119861 where 119860 isin 119883 not119883 119861 isin 119884 not119884 119883119884 sub 119868 and119883 cap 119884 = 0 st Consider

(i) supp(119860 rArr 119861) ge ms(ii) supp(119883) ge ms supp(119884) ge ms(iii) conf(119860 rArr 119861) ge mc(iv) lift(119860 rArr 119861) gt 1

Let us consider an example from our medical ldquocancerrdquo blogsrsquotext dataset We analyze peoplersquos behavior about the cancerand mole We have

(i) supp(mole) = 04 supp(notmole) = 06(ii) supp(cancer) = 06 supp(notcancer) = 04(iii) supp(cancer cupmole) = 005(iv) min supp = 02(v) min conf = 06

From above we can see the following

(i) supp(cancer cup mole) = 005 lt min supp thereforemole cup cancer is an infrequent itemset (inFIS)

(ii) conf(mole rArr cancer) = 0125 lt min conf from (3)

(a) therefore (mole rArr cancer) cannot be generatedas a valid rule using the support-confidenceframework

Now we try to generate a negative rule from this example

The Scientific World Journal 5

(i) supp(molecupnotcancer) = supp(mole) minus supp(cancercupmole)

(a) = 04 minus 005 = 035 gt min supp from (6)

(ii) conf(mole rArr notcancer) = supp(mole cup notcancer)supp(mole) from (7)

(a) = 03504 = 0875 gt min conf therefore wecan generate (mole rArr notcancer) as a negativerule

(iii) lift(mole rArr notcancer) = supp(mole cup

notcancer) supp(mole) supp(notcancer)

(a) = 035(04 lowast 04) = 21875 gt 1 which ismuch greater than 1 showing a strong relationbetween the presence of mole and absence ofcancer therefore we can generate (mole rArr

notcancer) as a valid negative rule with 875confidence and strong association relationshipbetween the presence and absence of mole andcancer respectively

The above example clearly shows the importance ofinfrequent itemsets and the generated negative associationrules and their capability to track the important implica-tionsassociations which would have been missed whenmining only positive association rules

43 Proposed Algorithm Apriori FISinFIS (See Algorithm 1)The Apriori FISinFIS procedure generates all frequent andinfrequent itemsets of interest in a given database 119863 whereFIS is the set of all frequent itemsets of interest in119863 and inFISis the set of all infrequent itemsets in119863 FIS and inFIS containonly frequent and infrequent itemsets of interest respectively

The initialization is done in Step (1) Step (2) generatestemp1 all itemsets of size 1 in step (21) we generate FIS

1 all

frequent itemsets of size 1 while in step (22) all infrequentitemsets of size 1 in database119863 are generated in inFIS

1in the

first pass of119863Step (3) generates FIS

119896and inFIS

119896for 119896 ge 2 by a loop

where FIS119896is the set of all frequent 119896-itemsets which have

greater support than user defined minimum threshold inthe 119896th pass of 119863 inFIS

119896is the set of all infrequent 119896-

itemsets which have less support than user definedminimumthreshold The loop terminates when all the temporaryitemsets have been tried that is temp

119896minus1= 0 For each pass

of the database in Step (3) say pass 119896 there are five substepsas follows

Step (31) generates candidate itemsets 119862119896of all 119896-

itemsets in119863 where each 119896-itemset in119862119896is generated by two

frequent itemsets in temp119896minus1

The itemsets in 119862119896are counted

in 119863 using a loop in Step (32) Step (33) calculates supportof each itemset in 119862

119896and step (34) stores the generated

itemsets in a temporary data structure We have used animplementation of ldquoHashMaprdquo in our experimentation as atemporary data structure

Then FIS119896and inFIS

119896are generated in Steps (4) and (5)

respectively FIS119896is the set of all potentially useful frequent

119896-itemsets in temp119896 which have greater support value than

the minsupp inFIS119896is the set of all infrequent 119896-itemsets in

temp119896 which have less support values than the minsuppThe

FIS119896and inFIS

119896are added to the FIS and inFIS in Steps (6) and

(7) Step (8) increments the itemset size The procedure endsin Step (9)which outputs frequent and infrequent itemsets inFIS and inFIS respectively

44 Algorithm FISinFIS Based Positive and Negative Asso-ciation Rule Mining Algorithm 2 generates positive andnegative association rules from both the frequent itemsets(FIS) and infrequent itemsets (inFIS) Step (1) initializes thepositive and negative association rule sets as empty Step (2)

generates association rules from FIS in step (21) positiveassociation rules of the form 119860 rArr 119861 or 119861 rArr 119860 whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid positive associationrules Step (22) generates negative association rules of theform 119860 rArr not119861 not119860 rArr 119861 not119860 rArr not119861 and so forth whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid negative associationrules (Figure 2)

Step (3) generates association rules from inFIS in step(31) positive association rule of the form 119860 rArr 119861 or119861 rArr 119860 which has greater confidence than the user definedthreshold and lift greater than 1 is extracted as a valid positiveassociation rule Step (32) generates negative association ruleof the form 119860 rArr not119861 not119860 rArr 119861 or not119860 rArr not119861 which hasgreater confidence than the user defined threshold and liftgreater than 1 and is extracted as a valid negative associationrule

5 Discovering Association Rules amongFrequent and Infrequent Items

Mining positive association rules from frequent itemsets isrelatively a trivial issue and has been extensively studied inthe literature Mining negative association rules of the form119860 rArr not119861 not119860 rArr 119861 119861 rArr not119860 or not119861 rArr 119860 and so forth fromtextual datasets however is a difficult task where 119860 cup 119861 is anonfrequent itemset Database (corpus)119863 has an exponentialscore of nonfrequent itemsets therefore negative associationrule mining stipulates the examination of much more searchspace than positive association rules

We in this work propose that for the itemsets occurringfrequently given the user definedmin-sup their subitems canbe negatively correlated leading to the discovery of negativeassociation rules Similarly the infrequent itemsets may havetheir subitems with a strong positive correlation leading tothe discovery of positive association rules

Let 119868 be the set of items in database119863 such that

119868 = 119860 cup 119861119860 cap 119861 = 120593

Thresholds minsup and minconf are given by the user

51 Generating Association Rules among Frequent ItemsetsSee Algorithm 3

6 The Scientific World Journal

Input TD Transaction (Document) Databasems minimum support thresholdOutput FIS frequent itemsets inFIS infrequent itemsets(1) initialize FIS = Φ inFIS = Φ(2) temp1 = forall119860 | 119860 isin top(N) IDF terms lowast get all 1-itemsets being in top-N IDF items lowast(21) FIS1 = 119860 | 119860 isin temp1 and support (119860) ge (ms) lowast get frequent 1-itemsets lowast(22) inFIS1 = temp1minus FIS1 lowast all infrequent 1-itemsets lowast119896 = 2 lowast initialize for itemsets greater than 1 lowast(3) while (temp

119896minus1= Φ) do begin

(31) 119862119896= generate (temp

119896minus1 ms) lowast candidate 119896-itemsets lowast

(32) for each transaction 119905 isin TDdo begin lowast scan database TDlowast119862119905= subset(119862

119896 119905) lowast get temp candidates in transaction 119905lowast

for each candidate 119888 isin 119862119905

ccount++ lowast increase count of itemsets if it exists in transaction lowastend

(33) csupport = (119888count|TD|

) lowast calculate support of candidate 119896-itemset lowast

(34) temp119896= 119888 | 119888 isin 119862

119896and (119888support ge minsup) lowast add to temp 119896-itemsets lowast

(4) FISk = 119860 | 119860 isin temp119896and 119860 support gems) lowast add to frequent 119896-itemsets if supp(119860) greater than minsupplowast

(5) inFISk = temp119896minus FIS

119896 lowast add to in-frequent 119896-itemsets having support less than minsupplowast

(6) FIS = cup119896FIS119896 lowast add generated frequent 119896-itemsets to FIS lowast

(7) inFIS = cup119896inFIS

119896 lowast add generated in-frequent 119896-itemsets to inFIS lowast

(8) k++ lowast increment itemset size by 1 lowastend(9) return FIS and inFIS

Algorithm 1

52 Generating Association Rules among Infrequent ItemsetsFor brevity we only consider one form of association fromboth positive and negative that is 119860 rArr 119861 and 119860 rArr not119861 theother forms can similarly be extracted

In the description of association rules as shown inAlgorithm 4 supp(119860 cup 119861) ge minsupp guarantees thatthe association rule describes the relationship among itemsof a frequent itemset whereas supp(119860 cup 119861) lt minsuppguarantees that the association rule describes the relationshipamong items of an infrequent itemset however the subitemsof the itemset need to be frequent as enforced by theconditions supp(119860) ge minsupp and supp(119861) ge minsuppThe interestingness measure lift has to be greater than 1articulating a positive dependency among the itemsets thevalue of lift less than 1 will articulate a negative relationshipamong the itemsets

The algorithm generates a complete set of positive andnegative association rules from both frequent and infrequentitemsets The frequent itemsets have traditionally been usedto generate positive association rules however we argue thatitems in frequent itemsets can be negatively correlated Thiscan be illustrated using the following example

minsupp = 02 minconf = 06supp(119860) = 07 supp(119861) = 05 supp(119860 cup 119861) = 025conf(119860 rArr 119861) = 03571 lift(119860 rArr 119861) = 07143conf(119861 rArr 119860) = 05 lift(119861 rArr 119860) = 07143supp(119860 cup not119861) = 07 minus 025 = 045 conf(119860 rArr not119861) =

04507 = 06429lift(119860 rArr not119861) = 045(07 lowast 05) = 12857

The above example clearly shows that itemsets despitebeing frequent can have negative relationships among theiritem subsetsTherefore in Step (22) of Algorithm 2 we try togenerate negative association rules using frequent itemsets

The infrequent itemsets on the other hand have eitherbeen totally ignored while generating associations or mostlyused to generate only negative association rules Howeverinfrequent itemsets have potentially valid and important pos-itive association rules among them having high confidenceand strongly positive correlationThis can be illustrated usingthe following example

minsupp = 03 minconf = 07supp(119860) = 08 supp(119861) = 02 supp(119860 cup 119861) = 02conf(119860 rArr 119861) = 025 lift(119860 rArr 119861) = 119conf(119861 rArr 119860) = 1 lift(119861 rArr 119860) = 119

We can visualize from the example that (119860 cup 119861) in spiteof being an infrequent itemset has a very strong positiveassociation 119861 rArr 119860 having 100 confidence and a positivecorrelation Our proposed algorithm covers the generationof such rules as explained in Step (31) of our proposedalgorithm

6 Experimental Results and Discussions

We performed our experiments on medical blogs datasetsmostly authored by patients writing about their problems andexperiences The datasets have been collected from differentmedical blog sites

(i) Cancer Survivors Network [httpcsncancerorg]

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

The Scientific World Journal 5

(i) supp(molecupnotcancer) = supp(mole) minus supp(cancercupmole)

(a) = 04 minus 005 = 035 gt min supp from (6)

(ii) conf(mole rArr notcancer) = supp(mole cup notcancer)supp(mole) from (7)

(a) = 03504 = 0875 gt min conf therefore wecan generate (mole rArr notcancer) as a negativerule

(iii) lift(mole rArr notcancer) = supp(mole cup

notcancer) supp(mole) supp(notcancer)

(a) = 035(04 lowast 04) = 21875 gt 1 which ismuch greater than 1 showing a strong relationbetween the presence of mole and absence ofcancer therefore we can generate (mole rArr

notcancer) as a valid negative rule with 875confidence and strong association relationshipbetween the presence and absence of mole andcancer respectively

The above example clearly shows the importance ofinfrequent itemsets and the generated negative associationrules and their capability to track the important implica-tionsassociations which would have been missed whenmining only positive association rules

43 Proposed Algorithm Apriori FISinFIS (See Algorithm 1)The Apriori FISinFIS procedure generates all frequent andinfrequent itemsets of interest in a given database 119863 whereFIS is the set of all frequent itemsets of interest in119863 and inFISis the set of all infrequent itemsets in119863 FIS and inFIS containonly frequent and infrequent itemsets of interest respectively

The initialization is done in Step (1) Step (2) generatestemp1 all itemsets of size 1 in step (21) we generate FIS

1 all

frequent itemsets of size 1 while in step (22) all infrequentitemsets of size 1 in database119863 are generated in inFIS

1in the

first pass of119863Step (3) generates FIS

119896and inFIS

119896for 119896 ge 2 by a loop

where FIS119896is the set of all frequent 119896-itemsets which have

greater support than user defined minimum threshold inthe 119896th pass of 119863 inFIS

119896is the set of all infrequent 119896-

itemsets which have less support than user definedminimumthreshold The loop terminates when all the temporaryitemsets have been tried that is temp

119896minus1= 0 For each pass

of the database in Step (3) say pass 119896 there are five substepsas follows

Step (31) generates candidate itemsets 119862119896of all 119896-

itemsets in119863 where each 119896-itemset in119862119896is generated by two

frequent itemsets in temp119896minus1

The itemsets in 119862119896are counted

in 119863 using a loop in Step (32) Step (33) calculates supportof each itemset in 119862

119896and step (34) stores the generated

itemsets in a temporary data structure We have used animplementation of ldquoHashMaprdquo in our experimentation as atemporary data structure

Then FIS119896and inFIS

119896are generated in Steps (4) and (5)

respectively FIS119896is the set of all potentially useful frequent

119896-itemsets in temp119896 which have greater support value than

the minsupp inFIS119896is the set of all infrequent 119896-itemsets in

temp119896 which have less support values than the minsuppThe

FIS119896and inFIS

119896are added to the FIS and inFIS in Steps (6) and

(7) Step (8) increments the itemset size The procedure endsin Step (9)which outputs frequent and infrequent itemsets inFIS and inFIS respectively

44 Algorithm FISinFIS Based Positive and Negative Asso-ciation Rule Mining Algorithm 2 generates positive andnegative association rules from both the frequent itemsets(FIS) and infrequent itemsets (inFIS) Step (1) initializes thepositive and negative association rule sets as empty Step (2)

generates association rules from FIS in step (21) positiveassociation rules of the form 119860 rArr 119861 or 119861 rArr 119860 whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid positive associationrules Step (22) generates negative association rules of theform 119860 rArr not119861 not119860 rArr 119861 not119860 rArr not119861 and so forth whichhave greater confidence than the user defined threshold andlift greater than 1 are extracted as valid negative associationrules (Figure 2)

Step (3) generates association rules from inFIS in step(31) positive association rule of the form 119860 rArr 119861 or119861 rArr 119860 which has greater confidence than the user definedthreshold and lift greater than 1 is extracted as a valid positiveassociation rule Step (32) generates negative association ruleof the form 119860 rArr not119861 not119860 rArr 119861 or not119860 rArr not119861 which hasgreater confidence than the user defined threshold and liftgreater than 1 and is extracted as a valid negative associationrule

5 Discovering Association Rules amongFrequent and Infrequent Items

Mining positive association rules from frequent itemsets isrelatively a trivial issue and has been extensively studied inthe literature Mining negative association rules of the form119860 rArr not119861 not119860 rArr 119861 119861 rArr not119860 or not119861 rArr 119860 and so forth fromtextual datasets however is a difficult task where 119860 cup 119861 is anonfrequent itemset Database (corpus)119863 has an exponentialscore of nonfrequent itemsets therefore negative associationrule mining stipulates the examination of much more searchspace than positive association rules

We in this work propose that for the itemsets occurringfrequently given the user definedmin-sup their subitems canbe negatively correlated leading to the discovery of negativeassociation rules Similarly the infrequent itemsets may havetheir subitems with a strong positive correlation leading tothe discovery of positive association rules

Let 119868 be the set of items in database119863 such that

119868 = 119860 cup 119861119860 cap 119861 = 120593

Thresholds minsup and minconf are given by the user

51 Generating Association Rules among Frequent ItemsetsSee Algorithm 3

6 The Scientific World Journal

Input TD Transaction (Document) Databasems minimum support thresholdOutput FIS frequent itemsets inFIS infrequent itemsets(1) initialize FIS = Φ inFIS = Φ(2) temp1 = forall119860 | 119860 isin top(N) IDF terms lowast get all 1-itemsets being in top-N IDF items lowast(21) FIS1 = 119860 | 119860 isin temp1 and support (119860) ge (ms) lowast get frequent 1-itemsets lowast(22) inFIS1 = temp1minus FIS1 lowast all infrequent 1-itemsets lowast119896 = 2 lowast initialize for itemsets greater than 1 lowast(3) while (temp

119896minus1= Φ) do begin

(31) 119862119896= generate (temp

119896minus1 ms) lowast candidate 119896-itemsets lowast

(32) for each transaction 119905 isin TDdo begin lowast scan database TDlowast119862119905= subset(119862

119896 119905) lowast get temp candidates in transaction 119905lowast

for each candidate 119888 isin 119862119905

ccount++ lowast increase count of itemsets if it exists in transaction lowastend

(33) csupport = (119888count|TD|

) lowast calculate support of candidate 119896-itemset lowast

(34) temp119896= 119888 | 119888 isin 119862

119896and (119888support ge minsup) lowast add to temp 119896-itemsets lowast

(4) FISk = 119860 | 119860 isin temp119896and 119860 support gems) lowast add to frequent 119896-itemsets if supp(119860) greater than minsupplowast

(5) inFISk = temp119896minus FIS

119896 lowast add to in-frequent 119896-itemsets having support less than minsupplowast

(6) FIS = cup119896FIS119896 lowast add generated frequent 119896-itemsets to FIS lowast

(7) inFIS = cup119896inFIS

119896 lowast add generated in-frequent 119896-itemsets to inFIS lowast

(8) k++ lowast increment itemset size by 1 lowastend(9) return FIS and inFIS

Algorithm 1

52 Generating Association Rules among Infrequent ItemsetsFor brevity we only consider one form of association fromboth positive and negative that is 119860 rArr 119861 and 119860 rArr not119861 theother forms can similarly be extracted

In the description of association rules as shown inAlgorithm 4 supp(119860 cup 119861) ge minsupp guarantees thatthe association rule describes the relationship among itemsof a frequent itemset whereas supp(119860 cup 119861) lt minsuppguarantees that the association rule describes the relationshipamong items of an infrequent itemset however the subitemsof the itemset need to be frequent as enforced by theconditions supp(119860) ge minsupp and supp(119861) ge minsuppThe interestingness measure lift has to be greater than 1articulating a positive dependency among the itemsets thevalue of lift less than 1 will articulate a negative relationshipamong the itemsets

The algorithm generates a complete set of positive andnegative association rules from both frequent and infrequentitemsets The frequent itemsets have traditionally been usedto generate positive association rules however we argue thatitems in frequent itemsets can be negatively correlated Thiscan be illustrated using the following example

minsupp = 02 minconf = 06supp(119860) = 07 supp(119861) = 05 supp(119860 cup 119861) = 025conf(119860 rArr 119861) = 03571 lift(119860 rArr 119861) = 07143conf(119861 rArr 119860) = 05 lift(119861 rArr 119860) = 07143supp(119860 cup not119861) = 07 minus 025 = 045 conf(119860 rArr not119861) =

04507 = 06429lift(119860 rArr not119861) = 045(07 lowast 05) = 12857

The above example clearly shows that itemsets despitebeing frequent can have negative relationships among theiritem subsetsTherefore in Step (22) of Algorithm 2 we try togenerate negative association rules using frequent itemsets

The infrequent itemsets on the other hand have eitherbeen totally ignored while generating associations or mostlyused to generate only negative association rules Howeverinfrequent itemsets have potentially valid and important pos-itive association rules among them having high confidenceand strongly positive correlationThis can be illustrated usingthe following example

minsupp = 03 minconf = 07supp(119860) = 08 supp(119861) = 02 supp(119860 cup 119861) = 02conf(119860 rArr 119861) = 025 lift(119860 rArr 119861) = 119conf(119861 rArr 119860) = 1 lift(119861 rArr 119860) = 119

We can visualize from the example that (119860 cup 119861) in spiteof being an infrequent itemset has a very strong positiveassociation 119861 rArr 119860 having 100 confidence and a positivecorrelation Our proposed algorithm covers the generationof such rules as explained in Step (31) of our proposedalgorithm

6 Experimental Results and Discussions

We performed our experiments on medical blogs datasetsmostly authored by patients writing about their problems andexperiences The datasets have been collected from differentmedical blog sites

(i) Cancer Survivors Network [httpcsncancerorg]

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

6 The Scientific World Journal

Input TD Transaction (Document) Databasems minimum support thresholdOutput FIS frequent itemsets inFIS infrequent itemsets(1) initialize FIS = Φ inFIS = Φ(2) temp1 = forall119860 | 119860 isin top(N) IDF terms lowast get all 1-itemsets being in top-N IDF items lowast(21) FIS1 = 119860 | 119860 isin temp1 and support (119860) ge (ms) lowast get frequent 1-itemsets lowast(22) inFIS1 = temp1minus FIS1 lowast all infrequent 1-itemsets lowast119896 = 2 lowast initialize for itemsets greater than 1 lowast(3) while (temp

119896minus1= Φ) do begin

(31) 119862119896= generate (temp

119896minus1 ms) lowast candidate 119896-itemsets lowast

(32) for each transaction 119905 isin TDdo begin lowast scan database TDlowast119862119905= subset(119862

119896 119905) lowast get temp candidates in transaction 119905lowast

for each candidate 119888 isin 119862119905

ccount++ lowast increase count of itemsets if it exists in transaction lowastend

(33) csupport = (119888count|TD|

) lowast calculate support of candidate 119896-itemset lowast

(34) temp119896= 119888 | 119888 isin 119862

119896and (119888support ge minsup) lowast add to temp 119896-itemsets lowast

(4) FISk = 119860 | 119860 isin temp119896and 119860 support gems) lowast add to frequent 119896-itemsets if supp(119860) greater than minsupplowast

(5) inFISk = temp119896minus FIS

119896 lowast add to in-frequent 119896-itemsets having support less than minsupplowast

(6) FIS = cup119896FIS119896 lowast add generated frequent 119896-itemsets to FIS lowast

(7) inFIS = cup119896inFIS

119896 lowast add generated in-frequent 119896-itemsets to inFIS lowast

(8) k++ lowast increment itemset size by 1 lowastend(9) return FIS and inFIS

Algorithm 1

52 Generating Association Rules among Infrequent ItemsetsFor brevity we only consider one form of association fromboth positive and negative that is 119860 rArr 119861 and 119860 rArr not119861 theother forms can similarly be extracted

In the description of association rules as shown inAlgorithm 4 supp(119860 cup 119861) ge minsupp guarantees thatthe association rule describes the relationship among itemsof a frequent itemset whereas supp(119860 cup 119861) lt minsuppguarantees that the association rule describes the relationshipamong items of an infrequent itemset however the subitemsof the itemset need to be frequent as enforced by theconditions supp(119860) ge minsupp and supp(119861) ge minsuppThe interestingness measure lift has to be greater than 1articulating a positive dependency among the itemsets thevalue of lift less than 1 will articulate a negative relationshipamong the itemsets

The algorithm generates a complete set of positive andnegative association rules from both frequent and infrequentitemsets The frequent itemsets have traditionally been usedto generate positive association rules however we argue thatitems in frequent itemsets can be negatively correlated Thiscan be illustrated using the following example

minsupp = 02 minconf = 06supp(119860) = 07 supp(119861) = 05 supp(119860 cup 119861) = 025conf(119860 rArr 119861) = 03571 lift(119860 rArr 119861) = 07143conf(119861 rArr 119860) = 05 lift(119861 rArr 119860) = 07143supp(119860 cup not119861) = 07 minus 025 = 045 conf(119860 rArr not119861) =

04507 = 06429lift(119860 rArr not119861) = 045(07 lowast 05) = 12857

The above example clearly shows that itemsets despitebeing frequent can have negative relationships among theiritem subsetsTherefore in Step (22) of Algorithm 2 we try togenerate negative association rules using frequent itemsets

The infrequent itemsets on the other hand have eitherbeen totally ignored while generating associations or mostlyused to generate only negative association rules Howeverinfrequent itemsets have potentially valid and important pos-itive association rules among them having high confidenceand strongly positive correlationThis can be illustrated usingthe following example

minsupp = 03 minconf = 07supp(119860) = 08 supp(119861) = 02 supp(119860 cup 119861) = 02conf(119860 rArr 119861) = 025 lift(119860 rArr 119861) = 119conf(119861 rArr 119860) = 1 lift(119861 rArr 119860) = 119

We can visualize from the example that (119860 cup 119861) in spiteof being an infrequent itemset has a very strong positiveassociation 119861 rArr 119860 having 100 confidence and a positivecorrelation Our proposed algorithm covers the generationof such rules as explained in Step (31) of our proposedalgorithm

6 Experimental Results and Discussions

We performed our experiments on medical blogs datasetsmostly authored by patients writing about their problems andexperiences The datasets have been collected from differentmedical blog sites

(i) Cancer Survivors Network [httpcsncancerorg]

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

The Scientific World Journal 7

Inputs min sup minimum supportmin conf minimum confidence FIS (frequent itemsets) inFIS (infrequent itemsets)Output PAR set of [+ve]ARs NAR set of [minusve]ARs(1) PAR = 120593NAR = 120593(2) lowast generating all association rules from FIS (frequent itemsets) lowastFor each itemset I in FISdo begin

for each itemset 119860 cup 119861 = 119868A cap B = 120593

do begin(21) lowast generate rules of the form 119860 rArr 119861 lowast

If conf (A rArr B) ge min conf ampamp lift (A rArr B) ge 1then output the rule (A rArr B)PAR cup (A rArr B)

else(22) lowast generate rules of the form (A rArr notB) and (notA rArr B) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(3) lowast generating all association rules from inFIS lowastFor any itemset I in inFISdo begin

For each itemset 119860 cup 119861 = 119868A cap B = 120593 supp(A) geminsupp and supp(B) geminsuppdo begin(31) lowast generate rules of the form 119860 rArr 119861 lowast

If conf(A rArr B) ge min conf ampamp lift(A rArr B) ge 1then output the rule 119860 rArr 119861PAR cup (A rArr B)

else(32) lowast generate rules of the form (A rArr notB) (notA rArr B) and (notA rArr notB) lowast

if conf (A rArr notB) ge min conf ampamp lift (A rArr notB) ge 1output the rule (A rArr notB)NAR cup (A rArr notB)

if conf (notA rArr B) ge min conf ampamp lift (notA rArr B) ge 1output the rule (notA rArr B)NAR cup (notA rArr B)

if conf (notA rArr notB) ge min conf ampamp lift (notA rArr notB) ge 1output the rule (notA rArr notB)NAR cup (notA rArr notB)

end forend for(4) return PAR and NAR

Algorithm 2

Given supp(119860 cup 119861) ge minsuppIf conf (119860 rArr 119861) ge minconf and

lift (119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and there is a positive correlation betweenrule items A and B

Else if conf (119860 rArr 119861) lt minconf andlift (119860 rArr 119861) lt 1

then 119860 rArr 119861 is not a valid positive rule having lower than required minimum confidence and there is a negativecorrelation between rule items A and B Therefore we try to generate a negative association rule from itemset 119868

If conf (119860 rArr not119861) ge minconf andlift (119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and there is a positive correlation betweenrule items A and notB

Algorithm 3

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

8 The Scientific World Journal

Given supp(119860 cup 119861) lt minsupp and supp(119860 cup 119861) = 0

supp(119860) ge minsupp andsupp(119861) ge minsupp

If conf(119860 rArr 119861) ge minconf andlift(119860 rArr 119861) gt 1

then 119860 rArr 119861 is a valid positive rule having required minimum confidence and thereis a positive correlation between rule items A and B

Else Ifconf(119860 rArr not119861) ge minconf andlift(119860 rArr not119861) gt 1

then 119860 rArr not119861 is a valid negative rule having required minimum confidence and thereis a positive correlation between rule items A and notB

Algorithm 4

Table 2 Total generated frequent and infrequent itemsets usingdifferent support values

Support Frequent itemsets Infrequent itemsets01 146474 268879015 105397 32976602 94467 447614025 79871 49210803 57954 504320

(ii) Care Pages [httpwwwcarepagescomforumscan-cer]

The blogs text was preprocessed before the experimenta-tion that is stop words removal stemminglemmatizationnonmedical words removal and so forth We assignedweights to termsitems after preprocessing using the IDFscheme for selecting only the important and relevanttermsitems in the dataset The main parameters of thedatabases are as follows

(i) the total number of blogs (ie transactions) used inthis experimentation was 1926

(ii) the average number of words (attributes) per blog(transaction) was 145

(iii) the smallest blog contained 79 words(iv) the largest blog contained 376 words(v) the total number of words (ie attributes) was 280254

without stop words removal(vi) the total number of words (ie attributes) was 192738

after stop words removal(vii) the total number of words selected using top-119873 age

of IDF words was 81733(viii) algorithm is implemented in java

Table 2 summarizes the number of itemsets generatedwith varying minsup values We can see that the number offrequent itemsets decreases as we increase the minsup valueHowever a sharp increase in the number of infrequent item-sets can be observed This can also be visualized in Figure 1

146474105397 94467 79871 57954

268879

329766

447614492108 504320

0

100000

200000

300000

400000

500000

600000

01 015 02 025 03

Item

sets

Support

Frequent itemsetsInfrequent itemsets

Figure 1 Frequent and infrequent itemsets generated with varyingminimum support values

Table 3 gives an account of the experimental resultsfor different values of minimum support and minimumconfidenceThe lift value has to be greater than 1 for a positiverelationship between the itemsets the resulting rule howevermay itself be positive or negativeThe total number of positiverules and negative rules generated from both frequent andinfrequent itemsets is given

Although the experimental results greatly depend on thedatasets used they still flaunt the importance of IDF factorin selecting the frequent itemsets along with the generationof negative rules from frequent itemsets and the extraction ofpositive rules from infrequent itemsetsThenumber of negativerules generated greatly outnumbers the positive rules not onlybecause of the much more infrequent itemsets as comparedto frequent itemsets but also because of finding the negativecorrelation between the frequent itemsets using proposedapproach leading to the generation of negative associationrules

The frequent and infrequent itemset generation usingApriori algorithm takes only a little extra time as comparedto the traditional frequent itemset finding using Apriori

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

The Scientific World Journal 9

Table 3 Positive and negative association rules using varying support and confidence values

Support Confidence PARs from FIs PARs from IIs NARs from FIs NARs from IIs005 095 3993 254 27032 8360 05 09 4340 228 29348 154401 09 3731 196 30714 127901 085 3867 246 37832 1170015 08 3340 330 26917 1121

Table 4 Positive and negative rules from both frequent and infrequent itemsets

(a) Positive rules from frequent itemsets

Rule Support Confidence Liftchemo radiation treatment rarr tumor 02 1 20surgery tumor rarr radiation 025 1 14286radiation surgery rarr tumor 025 08333 16667treatment rarr radiation 045 09 12857treatment tumor rarr radiation 03 1 14286tumor rarr radiation 05 1 14286

(b) Negative rules from frequent itemsets

Rule Support Confidence Liftradiation rarr simcancer 05 07143 10204simtemodar rarr radiation 04825 08333 16667simradiation rarr glioblastoma 065 1 14285

(c) Positive rules from infrequent itemsets

Rule Support Confidence Liftcancer treatment rarr tumor 015 1 2doctor temodar rarr chemo 005 1 28571brain chemo rarr doctor 015 1 22222mri tumor rarr brain 01 1 25

(d) Negative rules from infrequent itemsets

Rule Support Confidence Liftchemo rarr simsurgery 035 1 15385glioblastoma rarr simtreatment 03 1 213chemo rarr simglioblastoma 035 095 14286

algorithmThis is because each itemrsquos support is calculated forchecking against the threshold support value to be classifiedas frequent and infrequent therefore we get the infrequentitems in the same pass as we get frequent items Howeverthe processing of frequent and infrequent itemsets for thegeneration of association rules is different For frequent itemsgenerated through Apriori algorithm they have an inherentproperty that their subsets are also frequent however wecannot guarantee that for the infrequent itemsets Thus weimpose an additional check on the infrequent itemsets thattheir subsets are frequent when generating association rulesamong them

The researches on mining association rules among thefrequent and infrequent itemsets have been far and fewespecially from the textual datasets We have proposed this

algorithm which can extract both types of association rulesthat is positive and negative among both frequent andinfrequent itemsets We give a sample of all four types ofassociation rules extracted using the algorithm

Table 4 gives a summary of the generated associationrules The four (4) types of generated association rules areillustrated Table 4(a) shows a sample of positive associa-tion rules generated from the frequent itemsets Table 4(b)shows negative association rules generated from the frequentitemsets This has not yet been explored by the researchcommunity Sample of positive association rules generatedfrom the infrequent itemsets are demonstrated in Table 4(c)This type of association rules would potentially be usefuland researchers are interested to extract them There isno research done in this domain of extracting positive

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

10 The Scientific World Journal

Confidence

Confidence

Positive rules

Positive rules

Negative rules

Negative rules0

5000

10000

15000

20000

25000

30000

35000

40000

005

005

0101

015

095

09

09

085

08

42474568

3927

4113

3670

Num

ber o

r rul

es

Rule support

Figure 2 Positive and negative rules generated with varying minimum supports and confidence values

association rules from infrequent itemsets in the textual databefore this research Table 4(d) shows the results of negativeassociation rules from the infrequent itemsets

7 Concluding Remarks and Future Work

Identification of associations among symptoms and diseasesis important in diagnosis The field of negative associationrules (NARs)mining holds enormous potential to helpmedi-cal practitioners in this regard Both the positive and negativeassociation rule mining (PNARM) can hugely benefit themedical domain Positive and negative associations amongdiseases symptoms and laboratory test results can help amedical practitioner reach a conclusion about the presence orabsence of a possible diseaseThere is a need to minimize theerrors in diagnosis and maximize the possibility of early dis-ease identification by developing a decision support systemthat takes advantage of the NARs Positive association rulessuch as 119865119897119906 rArr 119867119890119886119889119886119888ℎ119890 can tell us thatHeadache is experi-enced by a person who is suffering from Flu On the contrarynegative association rules such as not119879ℎ119903119900119887119887119894119899119892-119867119890119886119889119886119888ℎ119890 rArr

not119872119894119892119903119886119894119899119890 tell us that ifHeadache experienced by a person isnotThrobbing then he may not haveMigraine with a certaindegree of confidence The applications of this work includethe development of medical decision support system amongothers by finding associations and dissociations amongdiseases symptoms and other health-related terminologiesThe current algorithm does not account for the context andsemantics of the termsitems in the textual data In futurewe plan to assimilate the context of the features in our workin order to perk up the quality and efficacy of the generatedassociation rules

In this paper contributions to the NARM research weremade by proposing an algorithm for efficiently generatingnegative association rules along with the positive associationrules We have proposed a novel method that capturesthe negative associations among frequent itemsets and alsoextracts the positive associations among the infrequent item-sets Whereas traditional association rule mining algorithms

have focused on frequent items for generating positiveassociation rules and have used the infrequent items for thegeneration of negative association rules The experimentalresults have demonstrated that the proposed approach iseffective efficient and promising

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] X Wu C Zhang and S Zhang ldquoEfficient mining of bothpositive and negative association rulesrdquo ACM Transactions onInformation Systems vol 22 no 3 pp 381ndash405 2004

[2] R J Bayardo Jr ldquoEfficiently mining long patterns from data-basesrdquo ACM SIGMOD Record vol 27 no 2 pp 85ndash93 1998

[3] J Han J Pei and Y Yin ldquoMining frequent patterns withoutcandidate generationrdquo ACM SIGMOD Record vol 29 no 2 pp1ndash12 2000

[4] N Pasquier Y Bastide R Taouil and L Lakhal ldquoDiscoveringfrequent closed itemsets for association rulesrdquo in DatabaseTheorymdashICDT rsquo99 pp 398ndash416 Springer 1999

[5] H Toivonen ldquoSampling large databases for association rulesrdquo inProceedings of the International Conference on Very Large DataBases (VLDB rsquo96) pp 134ndash145 1996

[6] J Wang J Han and J Pei ldquoCLOSET+ searching for the beststrategies for mining frequent closed itemsetsrdquo in Proceedingsof the 9th ACMSIGKDD International Conference on KnowledgeDiscovery andDataMining (KDD rsquo03) pp 236ndash245 usa August2003

[7] R JW Cline and KMHaynes ldquoConsumer health informationseeking on the internet the state of the artrdquo Health EducationResearch vol 16 no 6 pp 671ndash692 2001

[8] J M Morahan-Martin ldquoHow internet users find evaluate anduse online health information a cross-cultural reviewrdquo Cyber-psychology and Behavior vol 7 no 5 pp 497ndash510 2004

[9] S Mahmood M Shahbaz and Z Rehman ldquoExtraction ofpositive and negative association rules from text a temporalapproachrdquoPakistan Journal of Science vol 65 pp 407ndash413 2013

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

The Scientific World Journal 11

[10] M Rajman and R Besancon ldquoText mining natural languagetechniques and text mining applicationsrdquo in Proceedings of the7th IFIP 2 6 Working Conference on Database Semantics (DSrsquo97) pp 50ndash64 1997

[11] B Lent R Agrawal and R Srikant ldquoDiscovering trends in textdatabasesrdquo in Proceedings of the ACM SIGMOD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo93) pp 227ndash230 1997

[12] S Ananiadou and J McNaught Text Mining for Biology andBiomedicine Artech House London UK 2006

[13] J Paralic and P Bednar ldquoText mining for document annotationand ontology supportrdquo in Intelligent Systems at the Service ofMankind pp 237ndash248 Ubooks 2003

[14] R Agrawal T Imielinski and A Swami ldquoMining associationrules between sets of items in large databasesrdquo ACM SIGMODRecord vol 22 no 1 pp 207ndash216 1993

[15] M L Antonie and O Zaıane ldquoMining positive and negativeassociation rules an approach for confined rulesrdquo inKnowledgeDiscovery in Databases PKDD 2004 vol 3202 pp 27ndash38Springer 2004

[16] W U X Dong Z C Qi and S C Zhang ldquoMining both positiveand negative association rulesrdquo in Proceedings of the 19th Inter-national Conference on Machine Learning (ICML rsquo2002) pp658ndash665 2002

[17] X Dong F Sun X Han and R Hou ldquoStudy of positive andnegative association rules based on multi-confidence and chi-squared testrdquo in Advanced Data Mining and Applications pp100ndash109 Springer 2006

[18] M Gan M Zhang and SWang ldquoOne extended form for nega-tive association rules and the correspondingmining algorithmrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) pp 1716ndash1721 August2005

[19] L Sharma O Vyas U Tiwary and R Vyas ldquoA novel approachof multilevel positive and negative association rule mining forspatial databasesrdquo in Machine Learning and Data Mining inPattern Recognition pp 620ndash629 Springer 2005

[20] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA for-mal model for mining fuzzy rules using the RL representationtheoryrdquo Information Sciences vol 181 no 23 pp 5194ndash52132011

[21] M Delgado M D Ruiz D Sanchez and J M Serrano ldquoA fuzzyrule mining approach involving absent itemsrdquo in Proceedings ofthe 7th Conference of the European Society for Fuzzy Logic andTechnology pp 275ndash282 Atlantis Press 2011

[22] A Savasere E Omiecinski and S Navathe ldquoMining for strongnegative rules for statistically dependent itemsrdquo in Proceedingsof the IEEE International Conference on Data Mining (ICDMrsquo02) pp 442ndash449 December 2002

[23] M Morzy ldquoEfficient mining of dissociation rulesrdquo in DataWarehousing and Knowledge Discovery pp 228ndash237 Springer2006

[24] G Piatetsky-Shapiro ldquoDiscovery analysis and presentation ofstrong rulesrdquo inKnowledgeDiscovery inDatabases pp 229ndash238AAAI Press 1991

[25] AW Fu RW Kwong and J Tang ldquoMining n-most interestingitemsetsrdquo in Foundations of Intelligent Systems Lecture Notes inComputer Science pp 59ndash67 Springer 2000

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article Negative and Positive Association Rules ...downloads.hindawi.com/journals/tswj/2014/973750.pdfResearch Article Negative and Positive Association Rules Mining from

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014