Mining Best Utility Pattern from RFID Data

Embed Size (px)

Citation preview

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    1/12

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    2/12

    including marketing [10], manufacturing, process

    control, and fraud detection [9], bioinformatics,

    information retrieval, adaptive hypermedia,

    electronic commerce and network management [4].

    Descriptive mining and Predictive mining are the

    two types of data mining tasks [5]. The fundamental

    characteristics or common properties of the data in

    the database are portrayed by a technique denoted bydescriptive mining. The technique of predictive

    mining figures out patterns from the data, this

    enables predictions to be made. Tasks like

    Classification, Regression and Deviation detection

    are included in predictive mining methods.

    Many latest and emerging applications are found by

    mining information from a huge database. One of

    the fields that incorporate the sequential pattern

    mining in RFID database is the Radio FrequencyI d en t if ic a ti o n ( R FI D ). R a di o F r eq u en c y

    Identification (RFID) is a high-speed, real-time,

    precise information gathering and processing

    technology, which by employing radio-frequency

    signal identifies the objects distinctively [6]. An

    extensive variety of organizations and individuals

    are being helped by RFID technology, for instance,

    hospitals and patients, retailers and customers, and

    manufacturers and distributors all through the

    supply chain to achieve substantial productivitygains and efficiencies [11]. Motivated by long

    sequences in text data, biological data, software

    engineering, and sensor networks, mining repetitive

    gapped subsequences was studied to capture the

    occurrences of sequential patterns repeating within

    each sequence of a large database and then use them

    as features for classification or prediction. The tags

    are very diverse from printed barcodes in their

    ability to hold data, at which range the tags can be

    read, and the absence of line-of-sight constraints

    [12].

    Finding all frequent sequential patterns with a user-

    specified least support is the goal of sequential

    pattern mining. Usually, the sequential pattern

    mining approaches are either generate-and-test (also

    known as Apriori) or pattern growth (also known as

    divide-and-conquer) or vertical format method

    approach [13]. Of the many approaches [15] that

    have been proposed in sequential pattern mining

    most of them are focused on the following two

    issues: (1) enhancing the competency of the mining

    process and (2) widening the mining of sequential

    patterns to other types of time related patterns [16].

    The retailing industry problems motivated the issue

    of sequential patterns discovery. However, theresults are applicable to numerous scientific and

    business domains, like stocks and markets basket

    analysis, natural disasters (e.g. earthquakes), DNA

    sequence analyses, gene structure analyses, web log

    click stream analyses, and so on [18]. Time is the

    most important feature for this task, mainly when the

    results are necessary in a limited period of time [17].

    In many cases, sequential pattern mining still faces

    hard challenges in both efficacy and competence,nevertheless efficiency of mining the whole set of

    sequential patterns has been enhanced considerably.

    On the one hand, in a large database there could be a

    huge quantity of sequential patterns. Only a small

    subset of such patterns often interests a user. By

    presenting the complete set of sequential patterns

    the mining result would be tough to understand and

    hard to employ [22].To optimize the cost of the

    interesting sequential patterns Genetic Algorithm

    (GA) is employed. GA optimizers are vigorous andthey function well with discontinuous and non

    differentiable functions where the customary local

    optimizers fail. Processes such as genetic

    combination, mutation, and natural selection in a

    design based on the concepts of evolution are used

    by the optimization techniques.

    Even efficient algorithms that have been proposed

    for mining, it can be found that mining large amount

    of sequential patterns from huge databases is acomputationally expensive task. In this work, an

    effective data mining system that generates the

    optimum sequential pattern is proposed. The main

    aim of the exploration is to develop a utility

    considered RFID data mining technique. It is

    intended to discover an optimum sequential pattern

    based on their utility. The rest of the paper is

    organized as follows: section 2 describes some of

    VIPS VIVEKANANDA JOURNAL OF RESEARCH(2)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    3/12

    the recent related works. Section 3 briefs about GA

    and section 4 details about the proposed method,

    optimization of sequential patterns using GA.

    Experimental results and analysis of the proposed

    methodology are discussed in Section 5. Finally,

    concluding remarks are provided in Section 6.

    Numerous researches have been proposed by

    researchers for an effective data mining process. In

    this section, a brief review of some important

    contributions from the existing literature is

    presented.

    For frequent item set mining that identifies high-

    utility item combinations an algorithm was

    presented by J. Hu and A. Mojsilovic [18]. In

    difference to the customary association rule and

    frequent item mining methods, the objective of thealgorithm was to locate segments of data, defined

    through combinations of some items (rules), which

    gratify certain conditions as a group and maximize a

    predefined objective function. They devise the task

    as an optimization problem, present a competent

    estimation to resolve it by specialized partition trees,

    called High-Yield Partition Trees, and examine the

    functioning of diverse splitting strategies. The

    algorithm was tested on real-world datasets, and it

    accomplishedvery good results.

    F o r n u me ro u s s e qu e nt ia l p a tt e rn m i ni ng

    applications, Jian Pei et al [19] proposed that the

    Constraints were vital. Nevertheless, no systematic

    study was available on constraint-based sequential

    pattern mining. In their paper, that issue was

    investigated and it was pointed out that the

    framework which was developed for constrained

    frequent-pattern mining did not fit our mission well.

    On the basis of a sequential pattern growthmethodology an extended framework was

    developed. Their study illustrates that under this

    new framework the constraints can be effectively

    and efficiently pushed deep into the sequential

    pattern mining. Furthermore, their framework can

    be extended to constraint-based structured pattern

    mining as well.

    A methodology with two processes for sequence

    classification that utilizes sequential pattern mining

    and optimization was presented by Themis P.

    Exarchos et al. [21]. In the first stage, a series

    classification model, which was found on a set of

    sequential patterns, was defined and two sets of

    weights one for the patterns and the other for classes

    were set up. In the second stage, by employing anoptimization technique the weight values were

    assessed to achieve best classification precision. By

    altering the number of sequences, the number of

    patterns and the number of classes, extensive

    appraisal was done on the methodology, and it has

    compared with similar sequence classification

    approaches.

    Data mining is a well accepted verity that the

    process of data mining produces numerous patterns

    from the given data and it was proposed by

    S.Shankar et al. [22]. The procedure of discovering

    frequent item sets and association rules were the

    most important tasks in data mining. For mining

    frequent item sets and association rules several

    competent algorithms were attainable in the

    literature. In recent years incorporating utility

    considerations in data mining tasks was gaining

    fame. The business value has been improved by

    certain association rules and these rules of interestwere accredited by the data mining community over

    a long time. The discovery of frequent item sets and

    association rules from transaction databases

    benefits numerou s business applicati ons. A

    complete survey and study of a variety of techniques

    in existence for frequent item set mining,

    association rule mining with utility considerations

    have beenproposed in their paper.

    Mining Sequential Patterns in large databases has

    become a vital data mining task with broad

    applications and this was described by Mourad

    Ykhlef and Hebah ElGibreen [23]. In the field of

    data mining it was an important task, which

    describes potential sequenced relationships among

    items in a database. Numerous diverse algorithms

    were introduced for their task. The precise optimal

    Sequential Pattern rule were found by the

    RELATED WORKS

    VIPS VIVEKANANDA JOURNAL OF RESEARCH(3)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    4/12

    conventional algorithms but particularly when they

    were applied on large databases it takes a long time.

    Currently, some evolutionary algorithms, namely

    Particle Swarm Optimization and Genetic

    Algorithm, were proposed and have been applied to

    solve their problem. A new variety of hybrid

    evolutionary algorithm that combines Genetic

    Algorithm (GA) with Particle Swarm Optimization(PSO) to mine Sequential Pattern was introduced in

    their paper, so as to enhance the pace of evolutionary

    algorithms convergence. Their algorithm was

    referred to as SP-GAPSO.

    A search and optimization technique which is

    inspired by nature's evolutionary processes is

    genetic algorithm (GA). A population of candidates

    iterates through multiple generations of selection,crossover, and mutation until an optimized solution

    survives, much in the manner of survival of the

    fittest. GAs are computer based optimization

    techniques that employs the Darwinian evolution of

    nature as a model [24]. The work of Holland (1975)

    obtained a huge popularity for them. Usually, they

    are employed for problems, which have an immense

    and complex search space with an increased number

    of local optimums [27]. The strength behind GAs is

    the fact that the search space is traversed in parallel

    by arbitrarily generating solutions and those

    solutions are endlessly evaluated with a fitness

    function [25]. Generally, three different search

    phases are there in GA: (1) creating an initial

    population; (2) Evaluating the population by a

    fitness function; (3) producing a new population

    [21]. In GA, the solutions are termed as individuals

    or chromosomes [27]. The genetic search starts with

    an arbitrarily generated population inside which, afitness function evaluates every individual.

    The individuals of existing and following

    generations are duplicated or eliminated on the basis

    of the fitness values. By applying GA operators

    further generations are produced [21] i .e.

    reproduction, crossover and mutation which are

    sequentially applied to each individual with certain

    probabilities [23], [22]. The first operator which is

    the production operator (elitism) produces one or

    more copies of any individual that posses a high

    fitness value; or else, the individual is detached from

    the solution pool [29]. Two randomly chosen parent

    individuals are taken by the crossover operator as

    input, and then they are combined and they generate

    two children. This process of combining takes placeby choosing two crossover points in the strings of

    the parents and then exchanging the genes between

    these two points [26]. The mutation of individuals

    through the alteration of parts of their genes is the

    next step in each generation [30]. Mutation brings

    inconsistency into the population of the succeeding

    generation by altering a gene of a chromosome.

    Making sure that the search algorithm is not bound

    on a local optimum is its main goal [22]. It is used to

    make sure that all likely alleles can go into the

    population and hence preserve the population

    diversity [21]. It is a very important component of

    GAs and to produce diversity for GAs it is a

    variation operator [28].

    By means of a novel data cleaning, transformation

    and loading technique the RFID data has been

    effectively warehoused, which was dedicatedly

    proposed for RFID data. The previous works

    illustrated that the required knowledge from the

    warehoused RFID data was efficiently mined by the

    proposed novel RFID data mining system. The

    present work is intended to discover an optimum

    sequential pattern on their cost, termed as utility

    assigned. To identify the optimal sequential pattern

    the GA-based technique is employed. After the

    fuzzy rules are created from the sequential patterns,the optimal sequential patterns are recognized by the

    GA based method as per their utility assigned. The

    sequential pattern with maximum profit is

    discovered by the fitness function of the GA. For

    easy understanding of the proposed mining system

    the optimal sequential pattern of RFID data is

    briefed in the following sub-section, prior to detail

    the proposed mining system.

    GENETICALGORITHM(GA)

    AN EFFICIENT DATA MINING SYSTEM

    BASED ON GA

    VIPS VIVEKANANDA JOURNAL OF RESEARCH(4)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    5/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(5)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    6/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(6)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    7/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(7)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    8/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(8)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    9/12VIPS VIVEKANANDA JOURNAL OF RESEARCH(9)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    10/12

    CONCLUSION

    REFERENCES

    In this paper, we have presented a data mining

    system for mining the information that are

    applicable to the type of movement of the tags,

    which are attached to the warehouse goods. The

    proposed mining system mined knowledge from the

    warehoused data by generating I-dataset, miningsequential patterns and then by generating fuzzy

    rules from the sequential patterns. After that, on the

    basis of their assigned utility, the sequential patterns

    are optimized by using GA. The outcome of the

    system, optimized fuzzy rules with corresponding

    profit, has detailed the type of the tag movement

    with a fuzzy score. Given a part of the tag (indirectly

    it refers to a product) movement, the fuzzy rules

    clasp the persisting path of the tag (product). In this

    manner, diverse length combinations of the tags

    have been taken into consideration and theirmovement has been understood. The movements

    are considered only for some important tags and

    combinations and not for all tags and their

    combinations. From the implementation results and

    comparative analysis, we observed that our

    proposed system will efficiently identify the

    optimum sequential pattern. So, with the help of the

    presented optimized data mining system, tracking

    of goods in large warehouses can be executed

    efficiently. As we only concentrated on the

    optimized sequential patterns the cost of mining thesequential patterns is minimized. The extracted

    information would be helpful for warehouse

    management.

    1. Bin Li and Dennis Shasha,

    ACM SIGMOD Record, Vol.27, No.2,

    pp.541-543, June 1998.

    2. Anand, Bell and Hughes,

    Data and Knowledge Engineering,

    Vol.18,No.3, pp.189-223, 1996.

    3. Agrawal, Imielinsk and Swami,

    IEEE

    Transaction Knowledge and Data Engineering,

    vol. 5, no. 6, pp.914-925, 1993.

    4. Chen and Liu,

    International

    Journal of Business Intelligence and Data

    Mining, Vol.1, No.1, pp.4-11, 2005.

    5. Yashpal Singh and Alok Singh Chauhan,

    Journal of

    T h e or e ti c a l a n d A p p li e d I n f or m a ti o nTechnology, Vol.5, No.6, pp.36-42,2009.

    6. C.M. Roberts,

    Computers & Security, Vol.25,pp. 18

    26, 2006.

    7. Hatim A. Aboalsamh,

    WSEAS Transactions on

    Computers, Vol.7, No.8, pp.1352-1361, August

    2008.

    8. Sathiyamoorthi and Murali Bhaskaran,

    International Journal of

    Recent Trends in Engineering, Vol. 2, No.

    3,pp.1-5, November 2009

    9. Jayanthi Ranjan and Vishal Bhatnagar,

    J ournal of

    Knowledge Management Practice, Vol. 9, No.1, March 2008.

    10. Michael J. Shaw, Chandrasekar Subramaniam,

    Gek Woo Tan and Michael E. Welge,

    Decision support systems, Vol.31,

    No.1, pp.127-137, May 2001

    11. Asghar Sabbaghi and Ganesh Vaidyanathan,

    Journal of

    Theoretical and Applied Electronic Commerce

    Research, Vol. 3, No. 2, p.p. 71-81, 2008, ISSN

    07181876.

    12. Asif, Z., Mandviwalla, M.,

    Communications of the

    Association for Information Systems, Vol. 15,

    "Free Parallel Data

    Mining",

    "EDM: A generalframework for data mining based on evidence

    theory",

    Database

    Mining: A Performance Perspective,

    "Data mining from 1994 to 2004:

    an application-oriented review",

    "Neural Networks In Data Mining",

    "Radio frequency identification

    (RFID)",

    "A novel Boolean

    algebraic framework for association and

    pattern mining",

    "Data

    Mining for Intelligent Enterprise Resource

    Planning System",

    "A

    Review of Data Mining Tools In Customer

    Re la ti on sh ip Ma nag em en t" ,

    "Knowledge management and data mining for

    marketing",

    Effect iven ess and Effici ency of RFID

    technology in Supply Chain Management:Strategic values and Challenges,

    "Integrating the

    supply chain with RFID: a technical and

    business analysis",

    VIPS VIVEKANANDA JOURNAL OF RESEARCH(10)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    11/12

    No. 24, pp.393-427, 2005.

    13. Jian Pei,Jiawei Han, Behzad Mortazavi-Asl,

    Jianyong Wang, Helen Pinto, Qiming Chen,

    Umeshwar Dayal and Mei-Chun Hsum,

    IEEE

    Tr ans act ions on Knowledge and DataEngineering, Vol. 16, No. 10, pp.1-17, October

    2004.

    14. M.S. Chen, J. Han, P.S. Yu,

    IEEE

    Tr ans act ions on Knowledge and Data

    Engineering,Vol.8, No.6,pp.866 883, 1996.

    15. Yen-Liang Chen and Ya-Han Hu,

    Decision Support Systems, Vol. 42, pp. 1203-

    1215, 2006.

    16. Kuen-Fang Jea, Ke-Chung Lin and I-En Liao,

    International

    Journal of Innovative Computing, Information

    and Control, Vol.5, No.8,August 2009.

    17. Dhany Saputra, Dayang R.A.Rambli and Oi

    Mean Foong,International Journal of

    Computer Science and Engineering, Vol. 2,

    No.2, pp.49-554, 2008.

    18. J. Hu and A. Mojsilovic,

    Pattern Recognition, Vol. 40, pp.

    3317 3324,2007.

    19. J i a n P e i, J ia we i H an a nd We iWa n g,

    Journal of

    Intelligent Information Systems,Vol.28

    ,No.2,pp.133 -160,April 2007.

    20. Shigeaki Sakurai, Youichi Kitahara and Ryohei

    Orihara,

    International Journal of Computational

    Intelligence, Vol. 4, No.4, pp.252-260, 2008.

    21. Themis P. Exarchos, Markos G. Tsipouras,

    Costas Papaloukas and Dimitrios I. Fotiadis, "A

    t wo - st a ge m e th o do lo g y f or s e qu e nc e

    classification based on sequential pattern

    mining and optimization", Data & KnowledgeEngineering,Vol.66, pp.467487,2008.

    22. Shankar and Purusothaman, "

    International Journal of Soft Computing

    Applications, Vol.10, No.4, pp.81-95, 2009.

    23. Mourad Ykhlef and Hebah ElGibreen,

    World Academy of Science,Engineering and Technology,Vol.60,pp.863-

    870,2009.

    24. Jyothi Pillai and O.P.Vyas,

    International Journal of Computer Applications

    (0975 8887), Vol. 5, No.11, pp.9-13,August

    2010.

    25. M. Sedighizadeh and A. Rezazadeh,

    World Academy of Science,

    Engineering and Technology, Vol. 37, 2008.

    26. P. Radhakrishnan, V.M. Prasad and M.R.

    Gopalan,

    Journal of Computer Science,

    Vol. 5, No. 3, pp. 233-241, 2009.

    27. Basheer M. Al-Maqaleh and Kamal K.Bharadwaj,

    World Academy of Science, Engineering and

    Technology, vol. 11, pp. 43-46, 2005.

    28. Timo Mantere,

    "Mining Sequential Patterns by Pattern-

    Growth: The PrefixSpan Approach",

    Data mining: an

    overview from a database perspective,

    "Constraint-

    based s equenti al patt er n mining: The

    consideration of recency and compactness",

    "Mining hybrid sequential patterns by

    hierarchical mining technique",

    "Mining Sequential PatternsUsing I-PrefixSpan",

    High-utility pattern

    mining: A method for discovery of high-utility

    item sets,

    "Constraint-based sequential pattern mining:The pattern-growth methods",

    "A Sequential Pattern Mining Method

    based on Sequenti al I nteres ti ngness ",

    Utility Sentient

    Frequent Itemset Mining and Association Rule

    Mining: A Literature Survey and Comparative

    Study",

    "Mining

    Sequential Patterns Using Hybrid Evolutionary

    Algorithm",

    "Overview of Itemset

    U ti li ty M in in g a n d i ts A pp li ca ti on s" ,

    "Using

    Genetic Algorithm for Distributed GenerationAllocation to Reduce Losses and Improve

    Voltage Profile,

    "Optimizing Inventory Using Genetic

    Al gor it hm fo r Eff icie nt Supply Chain

    Management,"

    "Genetic Programming Approach

    to Hierarchical Production Rule Discovery,"

    A Min-Max Genetic Algorithm

    with Alternating Multiple Sorting for Solving

    VIPS VIVEKANANDA JOURNAL OF RESEARCH(11)

  • 8/12/2019 Mining Best Utility Pattern from RFID Data

    12/12

    Constrained Problems,

    Improved Off-Line Intrusion Detection Using

    A Genetic Algorithm,

    "Selection of RTOS for

    an Efficient Design of Embedded Systems,"

    Combining

    Genetic Algorithms With Imperfect AndSubdivided Features For The Automatic

    Registration Of Point Clouds (GAREG-ISF),

    "A Comparative Study of Adaptive

    Mutation Operators for Genetic Algorithms,"

    "The Rank-

    scaled Mutation Rate for Genetic Algorithms,

    "A Genetic Algorithm-based Solution for

    Intrusion Detection,"

    in Proceedings of the

    Ninth Scandinavian Conference on Artificial

    Intelligence, 2006.

    29. Pedro A. Diaz-Gomez and Dean F. Hougen,

    Proceedings of the

    Seventh International Conference on EnterpriseInformation Systems, 25-28, 2005, pp. 66-73,

    May 25-28, Miami,USA, 2005.

    30. S. Ramanarayana Reddy,

    International Journal of Computer Science and

    Network Security, Vol.6 No.6, pp. 29-37, June

    2006

    31. Stefan Schenk and Klaus Hanke,

    Proceedings of the 3rd ISPRS International

    Workshop, Vol. 38,

    32. I m t ia z K o re j o, S h en g xi an g Ya n g a n d

    ChangheLi,

    in

    pro ceedin gs of the 8th Me ta heuri st ic

    International Conference, July 1316, 2009.

    33. Mike Sewell, Jagath Samarabandu, Ranga

    Rodrigo, and Kenneth McIsaac,

    I n te r na t io n a l J o ur n al o f I n fo r ma t io n

    Technology, Vol. 3, No. 1, 2006.

    34. Zorana Bankovic, Jos M. Moya,AlfaroAraujo,

    Slobodan Bojanic and Octavio Nieto-Taladriz,

    Journal of InformationAssurance and Security, Vol. 4, pp. 192-199,

    2009.

    VIPS VIVEKANANDA JOURNAL OF RESEARCH(12)