Upload
leelavathi-bakthavathchalam
View
216
Download
0
Embed Size (px)
Citation preview
8/12/2019 Mining Best Utility Pattern from RFID Data
1/12
8/12/2019 Mining Best Utility Pattern from RFID Data
2/12
including marketing [10], manufacturing, process
control, and fraud detection [9], bioinformatics,
information retrieval, adaptive hypermedia,
electronic commerce and network management [4].
Descriptive mining and Predictive mining are the
two types of data mining tasks [5]. The fundamental
characteristics or common properties of the data in
the database are portrayed by a technique denoted bydescriptive mining. The technique of predictive
mining figures out patterns from the data, this
enables predictions to be made. Tasks like
Classification, Regression and Deviation detection
are included in predictive mining methods.
Many latest and emerging applications are found by
mining information from a huge database. One of
the fields that incorporate the sequential pattern
mining in RFID database is the Radio FrequencyI d en t if ic a ti o n ( R FI D ). R a di o F r eq u en c y
Identification (RFID) is a high-speed, real-time,
precise information gathering and processing
technology, which by employing radio-frequency
signal identifies the objects distinctively [6]. An
extensive variety of organizations and individuals
are being helped by RFID technology, for instance,
hospitals and patients, retailers and customers, and
manufacturers and distributors all through the
supply chain to achieve substantial productivitygains and efficiencies [11]. Motivated by long
sequences in text data, biological data, software
engineering, and sensor networks, mining repetitive
gapped subsequences was studied to capture the
occurrences of sequential patterns repeating within
each sequence of a large database and then use them
as features for classification or prediction. The tags
are very diverse from printed barcodes in their
ability to hold data, at which range the tags can be
read, and the absence of line-of-sight constraints
[12].
Finding all frequent sequential patterns with a user-
specified least support is the goal of sequential
pattern mining. Usually, the sequential pattern
mining approaches are either generate-and-test (also
known as Apriori) or pattern growth (also known as
divide-and-conquer) or vertical format method
approach [13]. Of the many approaches [15] that
have been proposed in sequential pattern mining
most of them are focused on the following two
issues: (1) enhancing the competency of the mining
process and (2) widening the mining of sequential
patterns to other types of time related patterns [16].
The retailing industry problems motivated the issue
of sequential patterns discovery. However, theresults are applicable to numerous scientific and
business domains, like stocks and markets basket
analysis, natural disasters (e.g. earthquakes), DNA
sequence analyses, gene structure analyses, web log
click stream analyses, and so on [18]. Time is the
most important feature for this task, mainly when the
results are necessary in a limited period of time [17].
In many cases, sequential pattern mining still faces
hard challenges in both efficacy and competence,nevertheless efficiency of mining the whole set of
sequential patterns has been enhanced considerably.
On the one hand, in a large database there could be a
huge quantity of sequential patterns. Only a small
subset of such patterns often interests a user. By
presenting the complete set of sequential patterns
the mining result would be tough to understand and
hard to employ [22].To optimize the cost of the
interesting sequential patterns Genetic Algorithm
(GA) is employed. GA optimizers are vigorous andthey function well with discontinuous and non
differentiable functions where the customary local
optimizers fail. Processes such as genetic
combination, mutation, and natural selection in a
design based on the concepts of evolution are used
by the optimization techniques.
Even efficient algorithms that have been proposed
for mining, it can be found that mining large amount
of sequential patterns from huge databases is acomputationally expensive task. In this work, an
effective data mining system that generates the
optimum sequential pattern is proposed. The main
aim of the exploration is to develop a utility
considered RFID data mining technique. It is
intended to discover an optimum sequential pattern
based on their utility. The rest of the paper is
organized as follows: section 2 describes some of
VIPS VIVEKANANDA JOURNAL OF RESEARCH(2)
8/12/2019 Mining Best Utility Pattern from RFID Data
3/12
the recent related works. Section 3 briefs about GA
and section 4 details about the proposed method,
optimization of sequential patterns using GA.
Experimental results and analysis of the proposed
methodology are discussed in Section 5. Finally,
concluding remarks are provided in Section 6.
Numerous researches have been proposed by
researchers for an effective data mining process. In
this section, a brief review of some important
contributions from the existing literature is
presented.
For frequent item set mining that identifies high-
utility item combinations an algorithm was
presented by J. Hu and A. Mojsilovic [18]. In
difference to the customary association rule and
frequent item mining methods, the objective of thealgorithm was to locate segments of data, defined
through combinations of some items (rules), which
gratify certain conditions as a group and maximize a
predefined objective function. They devise the task
as an optimization problem, present a competent
estimation to resolve it by specialized partition trees,
called High-Yield Partition Trees, and examine the
functioning of diverse splitting strategies. The
algorithm was tested on real-world datasets, and it
accomplishedvery good results.
F o r n u me ro u s s e qu e nt ia l p a tt e rn m i ni ng
applications, Jian Pei et al [19] proposed that the
Constraints were vital. Nevertheless, no systematic
study was available on constraint-based sequential
pattern mining. In their paper, that issue was
investigated and it was pointed out that the
framework which was developed for constrained
frequent-pattern mining did not fit our mission well.
On the basis of a sequential pattern growthmethodology an extended framework was
developed. Their study illustrates that under this
new framework the constraints can be effectively
and efficiently pushed deep into the sequential
pattern mining. Furthermore, their framework can
be extended to constraint-based structured pattern
mining as well.
A methodology with two processes for sequence
classification that utilizes sequential pattern mining
and optimization was presented by Themis P.
Exarchos et al. [21]. In the first stage, a series
classification model, which was found on a set of
sequential patterns, was defined and two sets of
weights one for the patterns and the other for classes
were set up. In the second stage, by employing anoptimization technique the weight values were
assessed to achieve best classification precision. By
altering the number of sequences, the number of
patterns and the number of classes, extensive
appraisal was done on the methodology, and it has
compared with similar sequence classification
approaches.
Data mining is a well accepted verity that the
process of data mining produces numerous patterns
from the given data and it was proposed by
S.Shankar et al. [22]. The procedure of discovering
frequent item sets and association rules were the
most important tasks in data mining. For mining
frequent item sets and association rules several
competent algorithms were attainable in the
literature. In recent years incorporating utility
considerations in data mining tasks was gaining
fame. The business value has been improved by
certain association rules and these rules of interestwere accredited by the data mining community over
a long time. The discovery of frequent item sets and
association rules from transaction databases
benefits numerou s business applicati ons. A
complete survey and study of a variety of techniques
in existence for frequent item set mining,
association rule mining with utility considerations
have beenproposed in their paper.
Mining Sequential Patterns in large databases has
become a vital data mining task with broad
applications and this was described by Mourad
Ykhlef and Hebah ElGibreen [23]. In the field of
data mining it was an important task, which
describes potential sequenced relationships among
items in a database. Numerous diverse algorithms
were introduced for their task. The precise optimal
Sequential Pattern rule were found by the
RELATED WORKS
VIPS VIVEKANANDA JOURNAL OF RESEARCH(3)
8/12/2019 Mining Best Utility Pattern from RFID Data
4/12
conventional algorithms but particularly when they
were applied on large databases it takes a long time.
Currently, some evolutionary algorithms, namely
Particle Swarm Optimization and Genetic
Algorithm, were proposed and have been applied to
solve their problem. A new variety of hybrid
evolutionary algorithm that combines Genetic
Algorithm (GA) with Particle Swarm Optimization(PSO) to mine Sequential Pattern was introduced in
their paper, so as to enhance the pace of evolutionary
algorithms convergence. Their algorithm was
referred to as SP-GAPSO.
A search and optimization technique which is
inspired by nature's evolutionary processes is
genetic algorithm (GA). A population of candidates
iterates through multiple generations of selection,crossover, and mutation until an optimized solution
survives, much in the manner of survival of the
fittest. GAs are computer based optimization
techniques that employs the Darwinian evolution of
nature as a model [24]. The work of Holland (1975)
obtained a huge popularity for them. Usually, they
are employed for problems, which have an immense
and complex search space with an increased number
of local optimums [27]. The strength behind GAs is
the fact that the search space is traversed in parallel
by arbitrarily generating solutions and those
solutions are endlessly evaluated with a fitness
function [25]. Generally, three different search
phases are there in GA: (1) creating an initial
population; (2) Evaluating the population by a
fitness function; (3) producing a new population
[21]. In GA, the solutions are termed as individuals
or chromosomes [27]. The genetic search starts with
an arbitrarily generated population inside which, afitness function evaluates every individual.
The individuals of existing and following
generations are duplicated or eliminated on the basis
of the fitness values. By applying GA operators
further generations are produced [21] i .e.
reproduction, crossover and mutation which are
sequentially applied to each individual with certain
probabilities [23], [22]. The first operator which is
the production operator (elitism) produces one or
more copies of any individual that posses a high
fitness value; or else, the individual is detached from
the solution pool [29]. Two randomly chosen parent
individuals are taken by the crossover operator as
input, and then they are combined and they generate
two children. This process of combining takes placeby choosing two crossover points in the strings of
the parents and then exchanging the genes between
these two points [26]. The mutation of individuals
through the alteration of parts of their genes is the
next step in each generation [30]. Mutation brings
inconsistency into the population of the succeeding
generation by altering a gene of a chromosome.
Making sure that the search algorithm is not bound
on a local optimum is its main goal [22]. It is used to
make sure that all likely alleles can go into the
population and hence preserve the population
diversity [21]. It is a very important component of
GAs and to produce diversity for GAs it is a
variation operator [28].
By means of a novel data cleaning, transformation
and loading technique the RFID data has been
effectively warehoused, which was dedicatedly
proposed for RFID data. The previous works
illustrated that the required knowledge from the
warehoused RFID data was efficiently mined by the
proposed novel RFID data mining system. The
present work is intended to discover an optimum
sequential pattern on their cost, termed as utility
assigned. To identify the optimal sequential pattern
the GA-based technique is employed. After the
fuzzy rules are created from the sequential patterns,the optimal sequential patterns are recognized by the
GA based method as per their utility assigned. The
sequential pattern with maximum profit is
discovered by the fitness function of the GA. For
easy understanding of the proposed mining system
the optimal sequential pattern of RFID data is
briefed in the following sub-section, prior to detail
the proposed mining system.
GENETICALGORITHM(GA)
AN EFFICIENT DATA MINING SYSTEM
BASED ON GA
VIPS VIVEKANANDA JOURNAL OF RESEARCH(4)
8/12/2019 Mining Best Utility Pattern from RFID Data
5/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(5)
8/12/2019 Mining Best Utility Pattern from RFID Data
6/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(6)
8/12/2019 Mining Best Utility Pattern from RFID Data
7/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(7)
8/12/2019 Mining Best Utility Pattern from RFID Data
8/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(8)
8/12/2019 Mining Best Utility Pattern from RFID Data
9/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(9)
8/12/2019 Mining Best Utility Pattern from RFID Data
10/12
CONCLUSION
REFERENCES
In this paper, we have presented a data mining
system for mining the information that are
applicable to the type of movement of the tags,
which are attached to the warehouse goods. The
proposed mining system mined knowledge from the
warehoused data by generating I-dataset, miningsequential patterns and then by generating fuzzy
rules from the sequential patterns. After that, on the
basis of their assigned utility, the sequential patterns
are optimized by using GA. The outcome of the
system, optimized fuzzy rules with corresponding
profit, has detailed the type of the tag movement
with a fuzzy score. Given a part of the tag (indirectly
it refers to a product) movement, the fuzzy rules
clasp the persisting path of the tag (product). In this
manner, diverse length combinations of the tags
have been taken into consideration and theirmovement has been understood. The movements
are considered only for some important tags and
combinations and not for all tags and their
combinations. From the implementation results and
comparative analysis, we observed that our
proposed system will efficiently identify the
optimum sequential pattern. So, with the help of the
presented optimized data mining system, tracking
of goods in large warehouses can be executed
efficiently. As we only concentrated on the
optimized sequential patterns the cost of mining thesequential patterns is minimized. The extracted
information would be helpful for warehouse
management.
1. Bin Li and Dennis Shasha,
ACM SIGMOD Record, Vol.27, No.2,
pp.541-543, June 1998.
2. Anand, Bell and Hughes,
Data and Knowledge Engineering,
Vol.18,No.3, pp.189-223, 1996.
3. Agrawal, Imielinsk and Swami,
IEEE
Transaction Knowledge and Data Engineering,
vol. 5, no. 6, pp.914-925, 1993.
4. Chen and Liu,
International
Journal of Business Intelligence and Data
Mining, Vol.1, No.1, pp.4-11, 2005.
5. Yashpal Singh and Alok Singh Chauhan,
Journal of
T h e or e ti c a l a n d A p p li e d I n f or m a ti o nTechnology, Vol.5, No.6, pp.36-42,2009.
6. C.M. Roberts,
Computers & Security, Vol.25,pp. 18
26, 2006.
7. Hatim A. Aboalsamh,
WSEAS Transactions on
Computers, Vol.7, No.8, pp.1352-1361, August
2008.
8. Sathiyamoorthi and Murali Bhaskaran,
International Journal of
Recent Trends in Engineering, Vol. 2, No.
3,pp.1-5, November 2009
9. Jayanthi Ranjan and Vishal Bhatnagar,
J ournal of
Knowledge Management Practice, Vol. 9, No.1, March 2008.
10. Michael J. Shaw, Chandrasekar Subramaniam,
Gek Woo Tan and Michael E. Welge,
Decision support systems, Vol.31,
No.1, pp.127-137, May 2001
11. Asghar Sabbaghi and Ganesh Vaidyanathan,
Journal of
Theoretical and Applied Electronic Commerce
Research, Vol. 3, No. 2, p.p. 71-81, 2008, ISSN
07181876.
12. Asif, Z., Mandviwalla, M.,
Communications of the
Association for Information Systems, Vol. 15,
"Free Parallel Data
Mining",
"EDM: A generalframework for data mining based on evidence
theory",
Database
Mining: A Performance Perspective,
"Data mining from 1994 to 2004:
an application-oriented review",
"Neural Networks In Data Mining",
"Radio frequency identification
(RFID)",
"A novel Boolean
algebraic framework for association and
pattern mining",
"Data
Mining for Intelligent Enterprise Resource
Planning System",
"A
Review of Data Mining Tools In Customer
Re la ti on sh ip Ma nag em en t" ,
"Knowledge management and data mining for
marketing",
Effect iven ess and Effici ency of RFID
technology in Supply Chain Management:Strategic values and Challenges,
"Integrating the
supply chain with RFID: a technical and
business analysis",
VIPS VIVEKANANDA JOURNAL OF RESEARCH(10)
8/12/2019 Mining Best Utility Pattern from RFID Data
11/12
No. 24, pp.393-427, 2005.
13. Jian Pei,Jiawei Han, Behzad Mortazavi-Asl,
Jianyong Wang, Helen Pinto, Qiming Chen,
Umeshwar Dayal and Mei-Chun Hsum,
IEEE
Tr ans act ions on Knowledge and DataEngineering, Vol. 16, No. 10, pp.1-17, October
2004.
14. M.S. Chen, J. Han, P.S. Yu,
IEEE
Tr ans act ions on Knowledge and Data
Engineering,Vol.8, No.6,pp.866 883, 1996.
15. Yen-Liang Chen and Ya-Han Hu,
Decision Support Systems, Vol. 42, pp. 1203-
1215, 2006.
16. Kuen-Fang Jea, Ke-Chung Lin and I-En Liao,
International
Journal of Innovative Computing, Information
and Control, Vol.5, No.8,August 2009.
17. Dhany Saputra, Dayang R.A.Rambli and Oi
Mean Foong,International Journal of
Computer Science and Engineering, Vol. 2,
No.2, pp.49-554, 2008.
18. J. Hu and A. Mojsilovic,
Pattern Recognition, Vol. 40, pp.
3317 3324,2007.
19. J i a n P e i, J ia we i H an a nd We iWa n g,
Journal of
Intelligent Information Systems,Vol.28
,No.2,pp.133 -160,April 2007.
20. Shigeaki Sakurai, Youichi Kitahara and Ryohei
Orihara,
International Journal of Computational
Intelligence, Vol. 4, No.4, pp.252-260, 2008.
21. Themis P. Exarchos, Markos G. Tsipouras,
Costas Papaloukas and Dimitrios I. Fotiadis, "A
t wo - st a ge m e th o do lo g y f or s e qu e nc e
classification based on sequential pattern
mining and optimization", Data & KnowledgeEngineering,Vol.66, pp.467487,2008.
22. Shankar and Purusothaman, "
International Journal of Soft Computing
Applications, Vol.10, No.4, pp.81-95, 2009.
23. Mourad Ykhlef and Hebah ElGibreen,
World Academy of Science,Engineering and Technology,Vol.60,pp.863-
870,2009.
24. Jyothi Pillai and O.P.Vyas,
International Journal of Computer Applications
(0975 8887), Vol. 5, No.11, pp.9-13,August
2010.
25. M. Sedighizadeh and A. Rezazadeh,
World Academy of Science,
Engineering and Technology, Vol. 37, 2008.
26. P. Radhakrishnan, V.M. Prasad and M.R.
Gopalan,
Journal of Computer Science,
Vol. 5, No. 3, pp. 233-241, 2009.
27. Basheer M. Al-Maqaleh and Kamal K.Bharadwaj,
World Academy of Science, Engineering and
Technology, vol. 11, pp. 43-46, 2005.
28. Timo Mantere,
"Mining Sequential Patterns by Pattern-
Growth: The PrefixSpan Approach",
Data mining: an
overview from a database perspective,
"Constraint-
based s equenti al patt er n mining: The
consideration of recency and compactness",
"Mining hybrid sequential patterns by
hierarchical mining technique",
"Mining Sequential PatternsUsing I-PrefixSpan",
High-utility pattern
mining: A method for discovery of high-utility
item sets,
"Constraint-based sequential pattern mining:The pattern-growth methods",
"A Sequential Pattern Mining Method
based on Sequenti al I nteres ti ngness ",
Utility Sentient
Frequent Itemset Mining and Association Rule
Mining: A Literature Survey and Comparative
Study",
"Mining
Sequential Patterns Using Hybrid Evolutionary
Algorithm",
"Overview of Itemset
U ti li ty M in in g a n d i ts A pp li ca ti on s" ,
"Using
Genetic Algorithm for Distributed GenerationAllocation to Reduce Losses and Improve
Voltage Profile,
"Optimizing Inventory Using Genetic
Al gor it hm fo r Eff icie nt Supply Chain
Management,"
"Genetic Programming Approach
to Hierarchical Production Rule Discovery,"
A Min-Max Genetic Algorithm
with Alternating Multiple Sorting for Solving
VIPS VIVEKANANDA JOURNAL OF RESEARCH(11)
8/12/2019 Mining Best Utility Pattern from RFID Data
12/12
Constrained Problems,
Improved Off-Line Intrusion Detection Using
A Genetic Algorithm,
"Selection of RTOS for
an Efficient Design of Embedded Systems,"
Combining
Genetic Algorithms With Imperfect AndSubdivided Features For The Automatic
Registration Of Point Clouds (GAREG-ISF),
"A Comparative Study of Adaptive
Mutation Operators for Genetic Algorithms,"
"The Rank-
scaled Mutation Rate for Genetic Algorithms,
"A Genetic Algorithm-based Solution for
Intrusion Detection,"
in Proceedings of the
Ninth Scandinavian Conference on Artificial
Intelligence, 2006.
29. Pedro A. Diaz-Gomez and Dean F. Hougen,
Proceedings of the
Seventh International Conference on EnterpriseInformation Systems, 25-28, 2005, pp. 66-73,
May 25-28, Miami,USA, 2005.
30. S. Ramanarayana Reddy,
International Journal of Computer Science and
Network Security, Vol.6 No.6, pp. 29-37, June
2006
31. Stefan Schenk and Klaus Hanke,
Proceedings of the 3rd ISPRS International
Workshop, Vol. 38,
32. I m t ia z K o re j o, S h en g xi an g Ya n g a n d
ChangheLi,
in
pro ceedin gs of the 8th Me ta heuri st ic
International Conference, July 1316, 2009.
33. Mike Sewell, Jagath Samarabandu, Ranga
Rodrigo, and Kenneth McIsaac,
I n te r na t io n a l J o ur n al o f I n fo r ma t io n
Technology, Vol. 3, No. 1, 2006.
34. Zorana Bankovic, Jos M. Moya,AlfaroAraujo,
Slobodan Bojanic and Octavio Nieto-Taladriz,
Journal of InformationAssurance and Security, Vol. 4, pp. 192-199,
2009.
VIPS VIVEKANANDA JOURNAL OF RESEARCH(12)