[IEEE 2011 Third International Conference on Advanced Computing (ICoAC) - Chennai, India...

Preview:

Citation preview

978-1-4673-0671-3/11/$26.00©2011 IEEE 210 IEEE-ICoAC 2011

A Comparative study on existing methodologies to Predict Dominating Patterns amongst Biological

Sequences

G. Lakshmi Priya1 Department of Computer Science & Engineering1

J. J College of Engineering and Technology1 Tiruchirapalli, Tamil Nadu, India glpriya.research11@gmail.com

Shanmugasundaram Hariharan2 Department of Information Technology2

Pavendar Bharathidasan College of Engg. & Technology2 Tiruchirapalli, Tamil Nadu, India

mailtos.hariharan@gmail.com

Abstract—Data Mining is the process of extracting or mining the patterns from very large amount of biological datasets. Utilization of Data mining algorithms can reveal biological relevant associations between different genes and gene expression. In Data Mining, several techniques are available for predicting frequent patterns. One among the technique is association rule mining algorithm; which can be applied for solving the crucial problems faced in the field of biological science. From the literature, various algorithms have been employed in generating frequent patterns for distinct application. These algorithms have some limitations in predicting frequent patterns, such as space, time complexity and accuracy. In order to overcome these drawbacks, the study is made on existing algorithms for generating frequent patterns from the biological sequences. The literature survey gives a significant number of methods were generated for predicting associative patterns. The proposed system has to be developed for solving problems in biological science. Biological sequence may be a collection of DNA sequence, Gene expression sequence or Protein sequence for a specific viral disease. Amino acids are the building blocks of proteins. Proteins are organic compounds made up of amino acids arranged in a linear chain and folded into a globular form. The future proposal not only leads in predicting the frequent patterns; it will also satisfy some factors such as: time complexity, space and predict accurate solution to the required problem. With the help of these three factors into consideration and efficient algorithm can be identified for predicting the dominating amino acids for any kind of specific biological implication.

Keywords-Data Mining, Clustering techniques, Association Rules and Bioinformatics

I. INTRODUCTION Data mining refers extracting or “mining” knowledge from

large amount of data. It is defined as “the process of discovering meaningful new correlations, patterns and trends by digging into large amounts of data stored in warehouses”. Data Mining is called as Knowledge Discovery in Databases (KDD).As data sets have grown in size and complexity, the modern technologies of computers, networks and sensors have made data collection and organization much easier. However, the captured data needs to be converted into information and knowledge to become more useful. Data Mining is the entire

process of applying computer-based methodology, including new techniques for knowledge discovery from data.

Data Mining approaches seem ideally suited for Biological Data Mining, since it is data-rich, but lacks a comprehensive theory of life’s organization at the molecular level. The extensive databases of biological information create both challenges and opportunities for development of novel KDD methods. Mining biological data helps to extract useful knowledge from massive datasets gathered in biology, and in other related life sciences areas such as medicine and neuroscience.

A crucial challenge in the future of bioinformatics involves putting that data to work. Now life scientists hope to plan large experiments, collect lots of data, analyze it, compare data between experiments, and eventually combine all of that information to improve basic theories, biotechnology, and medicine. The average trends in bioinformatics research areas are shown in the following Fig.1.

Biological Science Research Areas

6%

25%

19%

26%

12% 12%

0%

5%

10%

15%

20%

25%

30%

Net

wor

ks&

Mod

elin

g

Sequ

ence

Mot

ifs,

Alig

nmen

t&

fam

ilies

Prot

ein

Stru

ctur

e &

Mod

elin

g

Met

hods

Sequ

ence

&Ph

ylog

eny

Gen

e St

ruct

ure

Reg

ulat

ion

&M

odel

ing

Methodologies

Ave

rage

Figure 1. Average trends in Biological science research areas

In the past two decades the challenges faced by the research pupil in the field of biomedical is for an explosive growth of biomedical data (i.e., ranging from those collected in pharmaceutical studies and cancer therapy) investigations to those identified from genomics and proteomic research by

978-1-4673-0671-3/11/$26.00©2011 IEEE 211 IEEE-ICoAC 2011

discovering sequential patterns, gene functions, and protein-protein interactions. The rapid progress of biotechnology and bio data analysis methods has led to the emergence and fast growth of promising new field coined as Bio Data Mining. Applications of data mining to Bio Data Mining includes gene finding, protein function domain detection, function motif detection, protein function inference, disease diagnosis, disease prognosis, disease treatment optimization, protein and gene interaction network reconstruction, data cleansing, and protein sub-cellular location prediction.

A. Amino acids Amino acids play central roles both as building blocks of

proteins and as intermediates in metabolism. The chemical properties of the amino acids of proteins determine the biological activity of the protein. Proteins not only catalyze all or most of the reactions in living cells, they control virtually all cellular process. The general structure of an �-amino acid is depicted in Fig.2, Which represents the amino group on the left, the carboxyl group on the right and R a side chain to each amino acid.

B. Protein Structure Proteins are an important class of biological

macromolecules present in all biological organisms, made up of elements such as carbon, hydrogen, nitrogen, phosphorous, oxygen and sulfur. The elements of a protein and the tertiary structure of protein are depicted in Fig. 3. There are four distinct aspects of a protein structure such as Primary structure, Secondary structure, Tertiary structure and Quaternary structure.

Figure 2. Structure of � - amino acid

Figure 3. The elements and tertiary structure of a protein

C. General Structure and functions of Amino Acids Amino acids are the building blocks of proteins. Proteins

are large molecules composed of one or more chains of amino acids in a specific order. The order is determined by the base sequence of nucleotides in the gene that codes the protein.

Amino acids combine in a condensation reaction that releases water and the new amino acid residue that is held together by a peptide bond. Proteins are defined by their unique sequence of amino acid residues; amino acids can be linked in varying sequences to form a vast variety of proteins. Twenty standard amino acids are used by cells in protein biosynthesis, and these are specified by general genetic code. The Table I illustrates the list of essential amino acids and Table II shows the list of non- The one-letter and three-letter codes for amino acids used in the knowledgebase are those adopted by the commission on Biochemical Nomenclature of the IUPAC-IUB.

TABLE I. LIST OF ESSENTIAL AMINO ACIDS

Amino Acid 3-Letter 1-Letter

Arginine Arg R

Histidine His H

Isoleucine Ile I

Leucine Leu L

Lysine Lys K

Methionine Met M

Phenylalanine Phe F

Threonine Thr T

Tryptophan Trp W

Valine Val V

TABLE II. LIST OF NON-ESSENTIAL AMINO ACIDS

Amino Acid 3-Letter 1-Letter

Alanine Ala A

Asparagine Asn N

Aspartate Asp D

Cysteine Cys C

Glutamate Glu E

Glutamine Gln Q

Glycine Gly G

Proline Pro P

Serine Ser S

Tyrosine Tyr Y

D. Bioinformatics Datasets Bioinformatics is the science and technique for organizing

and analyzing biological data. Bioinformatics is conceptualizing biology in terms of molecules and applying

978-1-4673-0671-3/11/$26.00©2011 IEEE 212 IEEE-ICoAC 2011

informatics techniques to understand and organize the information associated with these molecules, on a large scale.

The Swiss-Prot group develops, annotates and maintains the UniProtKB/Swiss-Prot protein sequence database, the most widely used protein information resource in the world. The Bioinformatics Database group also develops and maintains other databases including PROSITE, a database of protein families and domains, and ENZYME, a database of enzyme nomenclature. The group also co-heads the development and maintenance of the ExPASy proteomics website. UniProtKB consists of two sections namely

� UniProtKB/Swiss-Prot. - Protein sequence database is manually annotated and is reviewed

� UniProtKB/TrEMBL. - Protein sequence database is automatically annotated and is not reviewed

E. Chapter organization Chapter I deal with introduction to Data mining concepts,

functionalities, and applications of data mining in the field of Biological science its protein structure and amino acids. Chapter II illustrates the usage of Apriori algorithm under various environments, situations for developing various applications. Chapter III refines the crucial challenges faced in biological science and the proposed architecture illustrates, to identify an efficient algorithm from Association rule mining algorithms. Chapter IV illustrates the drawbacks faced in the field of biological science and the recovery concludes and future investigation on mining frequent itemsets as the outlined research work.

II. RELATED WORK Literature survey is the act of studying the existing system

and analyzing the need for the system to be reengineered. It is the first and foremost step in designing a software and its defined as “a study of the operations or a set of connected elements and of the inner connection between the elements”. Literature survey has covered the following applications.

� Knowledge Discovery in Databases � Bio-Medical Applications � Granular Computing and W3C Applications � Soft Computing Applications

From the Literature, Zalak [1] has proposed a new algorithm to overcome the drawbacks arisen in Apriori and RAAT algorithm. This algorithm improves the efficiency for generating more than 2-candidate itemsets using modified Tag parameter, which includes three arguments namely minimum number, maximum number, total number of items for particular transaction. By counting the change in support parameter the Modified RAAT algorithm may also be used in finding various emerging patterns like JEP. The pattern whose support changes abruptly from zero to nonzero is named as JEP.

A. Knowledge Discovery in Large Databases Yiwu [2] has proposed IApriori algorithm to overcome the

following aspects that affect the efficiency of the Apriori

algorithm: (i) the frequent scanning of the database, (ii) large scale of the candidate itemsets. Finally, the author has concluded that IApriori algorithm can reduce the scanning time of the database and optimize the join procedure of frequent itemsets generated in order to reduce the size of the candidate itemsets.

The Research of Improved Apriori Algorithm for Mining Association Rules reduces one redundant pruning operations of C2(candidate 2-itemsets).The author[3] has proposed RAAT algorithm, which decreases pruning operations of candidate 2-itemsets for saving time and increasing efficiency. RAAT optimizes subset operation, through the transaction tag to speed up support calculations.

A New Improvement on Apriori Algorithm [4] has proposed the high dimension oriented (HDO) Apriori algorithm to improve the classical Apriori algorithm. HDO algorithm has proposed for mining the association rules in the high dimensional data. Based on the classical Apriori, the proposed algorithm can cut down the redundant generation of identical sub-itemsets from candidate itemsets, by means of pruning the candidate itemsets with the infrequent itemsets with lower dimension. So, HDO algorithm can obtain a higher efficiency than that of the original algorithm when the dimension of data is high.

Han Feng [5] has proposed the AprioriMend algorithm to overcome the weakness present in Apriori algorithm by analyzing the classic method of mining association rules. AprioriMend algorithm uses the default minimum support to prune the database, deleting the needless one and then grouped the pruned database according to the transaction length, establishing a sub-groups tables to meet the group table quickly find all the characteristics of the frequent item sets. So, the AprioriMend algorithm improves the efficiency compared with the Apriori algorithm.

Mining Association Rules between Sets of Items in Large Databases [6] has proposed an algorithm for mining a large collection of basket data type transactions using association rules between set of items with minimum specified confidence. Each customer can have a transaction, which consists of items purchased by a customer in a visit. The existing algorithm generates all significant association rules between items in the database and incorporates buffer management and novel estimation and pruning techniques.

B. Bio-Medical Applications Association Rule Mining in Genomics proposed by

Anandhavalli [7] has two major goals for analyzing massive genomic data: (i) To determine how the expression of any particular gene might affect the expression of other genes, (ii) To determine what genes are expressed as a result of certain cellular conditions using association and clustering concepts. The author selected an efficient algorithm to facilitate these analysis, the number of passes were not a major factor to be considered. Finally the author has concluded that the number of genes in one single transaction was very large.

978-1-4673-0671-3/11/$26.00©2011 IEEE 213 IEEE-ICoAC 2011

The Apriori Property of Sequence Pattern Mining with Wildcard Gaps proposed by Fan Min [8] has an alternative definition of the number of offset sequences by adding a number of dummy characters at the tail of sequence. Data miners designed pattern growth algorithms to obtain frequent patterns with periodical wildcard gaps, where the pattern frequency was defined as the number of pattern occurrences divided by the number of offset sequences. With the proposed definition, these uninteresting patterns were no longer frequent and the Apriori property holds good, hence Apriori algorithm can mine all frequent patterns with minimal endeavor.

Chien-Hua Wang [9] has proposed Fuzzy Frequent Pattern growth (FFP-growth) to derive from fuzzy association rules. In this approach first, fuzzy partition methods have been applied to decide a membership function of quantitative value for each transaction item and then implement FFP-growth to deal with the process of data mining. This FFP-growth need not to generate candidate itemsets and improves the efficiency of repetitious database scanning. Compared with other pattern mining algorithms the proposed one has achieved better executive efficiency.

A Vector Operation Based Fast Association Rules Mining Algorithm proposed by Zhi Liu [10] has proposed a vector operation based association rule mining algorithm (V_Apriori algorithm) which solves the problem of multiple scanning of the database. With this algorithm the transaction database need to be scanned only one time to generate the boolean matrix which is stored in bit mode, so that memory space is greatly saved. The frequent itemsets are predicted through the AND operation on the vectors in the matrix, and the number of the candidates itemsets were reduced significantly. Compared with the traditional Apriori algorithm the new V_Apriori algorithm has been improved on time and space factor.

The author [11] has proposed a data mining system for the assessment of heart event related risk factors using association analysis based on the apriori algorithm. The events investigated were: myocardial infarction (MI), percutaneous coronary intervention (PCI), and coronary artery bypass graft surgery (CABG). There are several factors that contribute to the development of a coronary heart event. These risk factors may be classified into two categories: (i) Not-modifiable includes factors that cannot be altered by intervention such as age, gender, family history and genetic attributes,(ii) Modifiable currently includes smoking, elevated cholesterol , hypertension, and diabetes. This can be monitored / lowered with the doctor’s advice and medications so that the incidence of heart episodes can be lowered. The risk of coronary heart disease (CHD) of a patient may be reduced through a proper control of these factors by EUROASPIRE I, II, and III surveys. Thus, data mining could help in the identification of high and low risk subgroups of patients, a decisive factor for the selection of therapy, i.e. medical or surgical.

Effective algorithm of mining frequent itemsets for association rules [12] has proposed the AprioriFREQ algorithm by Combining the generation and anti-monotone of

itemsets, which was the improvement of Apriori algorithm. In this paper, the author has proposed the concepts of the generation and the ordinal itemsets tree to describe the vegetal ability of the supersets of frequent itemsets in the lattice. Through the study of association rules the author has concluded that all frequent itemsets are not all vegetal itemsets and all vegetal itemsets are all frequent itemsets. AprioriFREQ algorithm can reduce the cardinal number of the candidate itemsets and improve the efficiency of mining association rules effectively when the distribution of the cardinal number of itemsets is symmetrical; otherwise the efficiency of algorithm AprioriFREQ will be the same as the Apriori's.

C. Granular Computing and W3C Applications Vaibhav Sharma [13] has proposed a granular computing-

based algorithm for extracting association rules. The author compares the running time between the presented algorithm and the Apriori algorithm and its running procedure was illustrated by a real world example. The proposed algorithm can efficiently reduce the number of candidate elements, and avoids repeatedly scanning the information table.

Dhivya [14] has proposed a new data structure named Compressed FP-Tree and an algorithm named CT-PRO that performs better than other algorithms. The CT-Apriori algorithm uses a compact tree structure, called CT-tree, to compress the original transactional data and to generate frequent patterns quickly by skipping the initial database scan and reducing a great amount of I/O time per database scan. The CT-PRO algorithm uses a compact tree structure called CFP-Tree, which is more compact than the FP-Tree of the FP-Growth algorithm. The CT-PRO algorithm divides the CFP-Tree into several projections represented by CFP-Trees. Then CT-PRO conquers the CFP-Tree for mining all frequent patterns in each projection. The execution speed results also indicated that the CT-PRO algorithm was the fastest among all the algorithms.

PCAR: an Efficient Approach for Mining Association Rules has been proposed by Peien [15]. Pruning-Classification Association Rule (PCAR) algorithm is meant for mining association rule at large volumes of items and small frequency of itemsets. PCAR first deletes infrequent items from itemsets, then classifies itemsets based on frequency of itemsets, finally discovers frequent itemsets. It reduces number of candidate itemsets, therefore operation time and memory requirement could be decreased.

D. Soft Computing Applications Mirela Pater [16] has proposed an efficient version of

APRIORI algorithm namely Depth First Multi-Level Apriori (DFMLA) for mining multi-level association rules in large databases to solve market-basket problem. The mining of multiple level rules can provide more information for the users, enhances the flexibility and power of data mining systems. DFMLA algorithm uses the benefits of multi-leveled databases, by using the information gained by studying items from one concept level for the study of the items from the following concept levels. This algorithm finds new rules by using the

978-1-4673-0671-3/11/$26.00©2011 IEEE 214 IEEE-ICoAC 2011

knowledge gained from previously found rules. If a rule fails at the first concept level many rules from the following concept levels won’t be studied and thus, the more concept levels a database has, the faster it will be to get results compared to other algorithms. Finally the author concludes that, DFMLA was a simple, practical, straight forward and fast algorithm for finding all frequent itemsets.

III. PROPOSED SYSTEM From the existing methodologies it is not easy to solve

crucial problems faced in the field of biological science in case of emergencies. In the Bio-medical field huge volumes of data are predominantly increasing time to time. This leads in identifying an efficient algorithm for predicting the frequent patterns from the biological sequences. The existing algorithm doesn’t satisfy the researchers or find the solution to the real time problem under various situations due to the time complexity, space and accuracy. These problems could be rectified by the proposed system which is illustrated in Fig.4 and Fig.5 in two steps;

1) Find an efficient algorithm based on three factors such as time, memory and precision.

2) Find the dominating frequent patterns from biological sequences using Association rule mining algorithms. From the literature, Genome-sequencing projects [17] are

currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs could automatically and accurately classify sequences into families become a necessity.

A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published methods.

Association rule mining algorithms are generally meant for mining frequent itemsets. The frequent itemsets could be generated in two ways, such as follows;

a) Mining frequent patterns using candidate generation; o Apriori algorithm

o Dynamic Itemset Counting algorithm

b) Mining frequent patterns without candidate generation;

o Frequent Pattern-Tree growth algorithm

The proposed model PROCAD in Fig. 4 illustrates the comparative status of an efficient algorithm and in turn to discover dominating patterns from known to unknown sequence. Fig. 5 focuses on efficiently applying Association rule mining algorithms over protein sequence datasets, and predicts frequent patterns with candidate generation and without candidate generation from the comparative study which results in identifying an efficient algorithm as illustrated in Fig. 6.

Mining Methods

Time Analysis

Memory Analysis

Precision

� Apriori � DIC � FP_Tree

Comparative Analysis � Graphical View � Report View

Associative Pattern Discovery

PROCAD

� Apriori � DIC � FP_Tree

� Apriori � DIC � FP_Tree

Frequent Pattern Discovery

Confidence Measures

Report � Analyzed report � Graphical view

Figure 4. Architecture of the proposed model to find an efficient algorithm

based on time, memory and precision.

Associative Pattern Discovery

Load datasets Select the protein sequence

Assign Transaction IDs From the datasets divide attributes into equal intervals and name unique TID.

Frequent Patten Discovery Apriori Algorithm

Confidence Measures Apply association rules for the obtained

frequent itemsets

Report� Graphical view � Analyzed report

Figure 5. Architecture of the Associative Pattern Discovery Module.

978-1-4673-0671-3/11/$26.00©2011 IEEE 215 IEEE-ICoAC 2011

Read the protein

sequence data

Read the protein

sequence data

Read the

protein

Use Apriori algorithm to generate the frequent itemsets

Use DIC algorithm to generate the

frequent itemsets

Use FP-Tree algorithm to generate the

frequent itemsets

Extract the frequent itemsets from the data sets by applying the

Strong association rules

Apply Strong association rules for the obtained

frequent itemsets

Apply Strong association rules for the obtained

frequent itemsets

The dominating amino acids are predicted and analyzed with the confidence greater than 95%

for causing the viral disease.

Data sets � Human data(real time) � ExPASY Protein data

sets

Cluster the protein data sets

Figure 6. Comparative study on predicted results.

IV. CONCLUSION AND FUTURE WORK From the study of literature, it is known that an efficient

algorithm is required to predict frequent pattern. The survey concludes that frequent itemsets could be generated from a clustered protein sequence which causes the viral disease in human. Among the generated frequent itemsets few amino acids may be found to be strongly associated. Using the results retrieved from the clustered protein sequence, a focus has to be given on the most dominating amino acids by forming association rules. Various protein sequences could be applied on the proposed system which is being developed for identifying the dominating amino acids. Further investigation will be involved by mining frequent itemsets with and without candidate generation and their results will be compared with the predicted results. In future this work could be extended to other protein sequence which causes other viral disease like flue, Dengue Fever, viral fever, swine flu, etc. Finally this work helps and it is more beneficial in preparing medicines to cure the disease caused by these viral infections during the case of emergency.

REFERENCES [1] Zalak V Vyas, Amit P. Ganatra, Dr. Y.P.Kosta and C. K. Bhesadadia,

“Modified RAAT (Reduced Apriori Algorithm using Tag) for Efficiency Improvement with EP(Emerging Patterns) and JEP(Jumping EP)” International Conference on Advances in Computer Engineering, DOI 10.1109/ACE.2010.92.

[2] Yiwu Xie, Yutong Li, Chunli Wang and Mingyu Lu, “The Optimization and Improvement of the Apriori Algorithm”, Intelligent Information Technology Application Workshops, 2008, IITAW '08, pp. 1101 – 1103.

[3] Fangyi Wang, Erkang Wang and Bowen Chen, “The Research of Improved Apriori Algorithm for Mining Association Rules.”, 11th IEEE International Conference on Communication Technology, ICCT 2008, ISBN: 978-1-4244-2250-0, pp-513-516.

[4] Lei Ji, Baowen Zhang and Jianhua Li, “A New Improvement on Apriori Algorithm”, International Conference on Computational Intelligence and Security, Vol-1, pp. 84-844.

[5] HAN Feng, ZHANG Shu-mao and DU Ying-shuang, “The analysis and improvement of Apriori algorithm”, Journal of Communication and Computer, ISSN1548-7709, 2008, USA, Volume 5.

[6] Agarwal, Imielienski and Swami, “Mining Association Rules between Sets of Items in Large Databases”,Proceedings of the ACM SIGMOD Conference on Management of data, 1993, pp. 207-216.

[7] Anandhavalli, IACSIT, IAENG, Ghose and Gauthaman, “Association Rule Mining in Genomics” International Journal of Computer Theory and Engineering, 2010, Vol. 2, No. 2

[8] Fan Min, Youxi Wu and Xindong Wu, “The Apriori Property of Sequence Pattern Mining with Wildcard Gaps” IEEE International Conference on Bioinformatics and Biomedicine Workshops, 2010.

[9] Chien-Hua Wang, Wei-Hsuan Lee and Chin-Tzong Pang, “Applying Fuzzy FP-Growth to Mine Fuzzy Association Rules” World Academy of Science, Engineering and Technology 65, 2010.

[10] Zhi Liu, Guoming Sang and Mingyu Lu, “A Vector Operation Based Fast Association Rules Mining Algorithm”, International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, 2009.

[11] Karaolis, Moutiris, Papaconstantinou and Pattichis, “Association Rule Analysis for the Assessment of the Risk of Coronary Heart Events” Annual International Conference of the IEEE EMBS Minneapolis, Minnesota, USA, 2009.

[12] Pei-qi Lin, Zeng-zhi Li and Yin-lung Zhao, "Effective Algorithm of Mining Frequent Itemsets for Association Rules", Proceedmgs of the Third International Conference on Machine Leaming and Cybemetics, Shanghai, 2004.

[13] Vaibhav Sharma and Sufyan Beg, “A Probabilistic Approach to Apriori Algorithm” IEEE International Conference on Granular Computing, 2010.

[14] Dhivya and Kalpana, “A Study on the Performance of CT-APRIORI and CT-PRO Algorithms using Compressed Structures for Pattern Mining” Journal of Global Research in Computer Science Volume 1, No.2, 2010.

[15] Peien Feng, Hui Zhang, Qingying Qiu and Zhaoxia Wang (2008) “PCAR: an Efficient Approach for Mining Association Rules” Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[16] Mirela Pater and Daniela E. Popescu, “Market-Basket Problem Solved With Depth First Multi-Level Apriori Mining Algorithm”, 3rd International Workshop on Soft Computing Applications, 2009.

[17] Sondes Fayech, Nadia Essoussi and Mohamed Linam, “Partitioning clustering algorithms for protein sequence data sets”, Published in BioData Mining 2009, vol. 2:3.

Recommended