24
CHAPTER II
REVIEW OF LITERATURE
Database keeps growing rapidly because of the availability of powerful
and affordable database systems. This explosive growth in data and databases
has generated an urgent need for new techniques and tools that can intelligently
and automatically transform the processed data into useful information and
knowledge. Consequently, data mining has become a research area with
increasing importance. To design an effective data mining technique several
issues to be taken into account such as types of data, efficiency and scalability
of data mining algorithms, usefulness, different sources of data, protection of
privacy & data security and so on.
The problem of finding privacy-preserving data mining has found
considerable attention in because of recent concerns on the privacy of
underlying data.
Privacy-preserving data mining considers the problem of running data
mining algorithms on large databases consisting of confidential data that is not
supposed to be revealed even to the party running the algorithm. The sharing of
data and/or knowledge may come at a cost to privacy, primarily due to If the
data contains business (or organizational) information, then the disclosure of
this data or any knowledge extracted from the data may potentially reveal
sensitive trade secrets, whose knowledge can provide a significant advantage to
competitors and could cause the data holder to lose business.
In shared knowledge concept, privacy can be achieved in the process of
finding association rule mining in different ways using various methods.
Several approaches have been proposed in the research literature to offer
privacy in data mining. The majority of the proposed approaches can be
classified along two principal research directions: (1) data hiding approaches
and (2) knowledge hiding approaches. The data hiding approaches aims at the
removal of confidential or private information from the original data prior to its
disclosure by applying techniques such as perturbation, sampling, generalization
25
or suppression, transformation using the cryptography techniques. The
knowledge hiding approaches, aim to protect the sensitive data mining results
rather than the raw data itself, which are produced by the application of data
mining tools on the original database. This direction of approaches mainly deals
with heuristic, reconstruction and blocking techniques.
Privacy-preserving distributed data mining is a multidisciplinary field
and requires close cooperation between researchers and practitioners from the
fields of cryptography, data mining, public policy and law. Now, the question is
how to compute the results without pooling the data in a way that reveals
nothing but the final results of the data mining computation. This question of
privacy-preserving data mining is actually a special case of a long-studied
problem in cryptography called secure multiparty computation. This problem
deals with a setting where a set of parties with private inputs wishes to jointly
compute some function of their inputs. This joint computation should have the
property that the parties learn the correct output and nothing else, even if some
of the parties maliciously collude to obtain more information. Clearly, a
protocol is needed to solve privacy-preserving distributed data mining
problems.
Basically there are four cryptography based methods exist for privacy-
preserving distributed data mining such as Secure Sum, Secure Set Union,
Secure Size of Set Intersection, Scalar Product. Secure sum is a simple example
of secure multi party computation and is used when three or more parties need
to compute securely a sum where no collusion occurs. Secure union methods
are useful in data mining where each party needs to give rules, frequent item
sets and so on without revealing the owner. When several parties having their
own set of items from a common domain then Secure Size of Set Intersection
method is used to securely compute the cardinality/size of the intersection of the
parties over local sets. Scalar product is a powerful component technique and
can be used to solve data mining problems by computing the scalar product of
two vectors securely. These computing techniques are mainly used for different
data distribution models such as horizontal and vertical data distributions for
finding data mining results without violating privacy constraints.
26
The association rule mining was first introduced in 1993. Since its
inception, association rule mining has become one of the core data-mining tasks
and has attracted tremendous interest among researchers and practitioners.
Privacy preserving association rule mining is one of the most popular
pattern discovery methods in the new and rapidly emerging research area of
privacy preserving data mining. Several privacy-preserving techniques for
association rule mining have also been proposed in the recent years. Various
approaches and algorithms have been developed for centralized data, while
others refer to a distributed data scenario. Distributed data scenarios can also be
classified as horizontal data distribution, vertical data distribution and mixed
data distribution. The approaches for privacy preserving association rule
mining can be categorized into three categories such as heuristic-based
techniques, reconstruction-based techniques, cryptography based techniques.
The following sections gives the review of literature related to data
mining, association rule mining, privacy preserving data mining in centralized
as well as in distributed database environment and emphasis is given on
privacy preserving association rule mining in centralized and also for
distributed environment.
2.1 Data Mining
In [6], requirements and challenges of data mining are studied such as
handling of different types of data, efficiency and scalability of data mining
algorithms, usefulness, certainty, and expressiveness of data mining results,
expression of various kinds of data mining requests and results, interactive
mining knowledge at multiple abstraction levels, mining information from
different sources of data, protection of privacy and data security. A
comprehensive overview of recently developed data mining techniques by
considering the requirements and challenges of data mining is studied to
understand the user behavior, to improve the services, and to increase the
business opportunities. A classification of the available data mining techniques
and a comparative study of each technique are also discussed.
27
The different tasks of data mining and the various applications where
data mining techniques can be used is addressed by the authors [7].
An overview of tasks involved in knowledge discovery system and the
approaches to solve these tasks are provided by the authors and they also
described the software tools which are available to use for knowledge discovery
tasks and also proposed a feature classification scheme which can be used to
study knowledge and data mining software tools. Based on the general
characteristics, database connectivity and characteristics of data mining,
software tools are classified. They further investigated 43 software products in
which some are research prototypes and some are commercial packages. From
their analysis, they specify features which should exist in knowledge discovery
software in order to accommodate its novice users as well as experienced
analysts effectively, also discussed the issues which are not addressed or not
solved yet [8].
A comprehensive review of different classification techniques in data
mining is presented and different kinds of classification techniques such as
decision tree induction, Bayesian networks, k-nearest neighbor techniques are
discussed with algorithms [9]. They also presented methods of case-based
reasoning, genetic algorithm and fuzzy logic techniques with suitable examples.
Distributed data mining focuses the attention of the researchers because
of its potentials in dealing with distributed data and its performance advantages.
The basic introduction to distributed data mining and issues to be considered in
distributed data mining are presented in [10]. The author also addressed the
various problems exist in distributed data mining, significance of distributed
data mining and the progress of distributed data mining with classification,
association and clustering techniques.
2.2 Association Rule Mining
R Agarwal, et al proposed an algorithm [11] for finding association rules
between items or item sets for a database consisting of transactions and each
transaction consists of items purchased by a customer in a visit. The authors
incorporated buffer management, novel estimation and pruning techniques to
28
find significant association rules for the proposed algorithm. They presented the
formal definition of association rule, algorithm for finding large item sets as
well as association rules. The results of applying proposed algorithm to market
basket data obtained from a large retailing company are presented and they
proved that proposed algorithm is more effective.
Two new algorithms apriori and apriori-Tid are discussed for finding
association rule mining that are fundamentally different from the known
algorithms. From empirical evaluation the authors showed that the developed
algorithms outperform the known algorithms by factors ranging from three for
small problems to more than an order of magnitude for large problems. They
combined the features of these two algorithms and developed a new algorithm
called AprioriHybrid algorithm which scales linearly with the number of
transactions and number of items in the database [12]. The authors discussed
generalization concept with association rule mining by giving hierarchies over
items and to capture interesting rules at all levels of multiple hierarchies. They
developed method [13] for finding association rules for large database with
generalization hierarchy. They also conducted experiments with supermarket
dataset to show the methods effectiveness.
In [14], author presented a survey on large-scale parallel and distributed
data mining algorithms and systems. Research issues and challenges that must
be overcome for designing and implementing successful tools for large-scale
data mining are also discussed by the author.
A pattern decomposition algorithm is proposed to reduce the size of the
dataset on each pass in the process of mining all frequent patterns in a large
database[15]. This algorithm minimizes the cost incurred for process of
generating candidate set and also saves a great amount of counting time. The
empirical evaluation showed that the algorithm outperforms Apriori by one
order of magnitude and is more scalable.
The authors in [16], discussed the basic concepts of association rule
mining and existing association rule mining techniques. They also discussed the
issues related to the efficiency of the association rule mining algorithms such as
29
reduction of number of passes over the database, sampling the database, adding
extra constraints on the structure of patterns and through parallelization. The
authors also presented the categories of databases in which Association Rules
are applied, the progress made in recent years related to this area such as
Redundant Association Rules, Rare Association Rules Generalized Association
Rules and Negative Association Rules. Other measures of interestingness of an
association are also addressed in this article.
In [17], the authors studied association rule mining technique and
proved that mining association rules algorithm based on support, confidence
and interestingness is improved, aiming at creating interestingness over rules
which are not useful. Useless rules are cancelled, creating more reasonable
association rules including negative items. Based on this facts algorithm was
developed and implemented with 2002 student score list of computer
specialized field in Inner Mongolia university of science and technology to
mine association rules.
2.3 Privacy Preserving Data Mining
In 1996, Clifton et al. [18] presented a number of ideas to protect the
privacy of individuals in the database. The authors provided examples which
indicates the applications of applying data mining algorithms on a database
reveals critical information to business rivals. Clifton[19] presented a technique
to prevent the disclosure of sensitive information by releasing only samples of
the original data. This technique is applicable independently of the specific data
mining algorithm to be used. Clifton et al. [20], introduced some definitions for
PPDM and discussed some metrics for information disclosure in data mining. In
[21], the authors defined privacy preserving data mining (PPDM) as data
mining methods which have to meet two targets: (1) meeting privacy
requirements and (2) providing valid data mining results. They described the
problems in defining what information is private in data mining and discussed
how privacy can be violated in data mining. Privacy preservation in data mining
based on users' personal information and information concerning their collective
activity are also addressed.
30
The authors in [22], proposed classification of privacy preserving data
mining techniques based on different dimensions such as data distribution, data
modification, data mining algorithm, data or rule hiding, privacy preservation.
They also discussed various methods exist in each classification of methodology
based on the dimension. The existing methodologies are discussed for different
privacy preserving data mining techniques such as classification, association
rule mining and clustering in various dimensions. They evaluated the algorithms
related to heuristic-based techniques, cryptography-based techniques, and
reconstruction-based techniques for different data mining techniques.
The article [23] shows how technology from the security community can
change data mining for the better, providing all its benefits while still
maintaining privacy. The authors presented the discussion over existing privacy
preserving algorithms in case of centralized as well as in distributed
environment. For centralized database applications, the authors discussed
various existing methodologies such as de-identification, data perturbation,
randomization, and reconstruction based technique with suitable examples. The
basic idea in distributed applications is that, the parties hold their own data, but
cooperate to get the final result and also provided the solution using secure
multiparty computation concept. They discussed the advantages and drawbacks
with various secure sum computation methods.
The state of the art in the area of privacy preserving data mining
(PPDM) techniques is discussed by the authors in [24]. The authors presented
the classification of privacy preserving techniques based on the five dimensions
such as data distribution, data modification, data mining algorithm, data or rule
hiding, privacy preservation. They also discussed the methodologies based on
heuristic for classification, association rule mining and clustering techniques
and also cryptography based techniques for vertically partitioned and
horizontally partitioned databases in multi distributed environment for
association rule mining and classification technique. Privacy preserving
clustering problem’s solution is discussed in this paper with expectation-
maximization algorithm. They also studied Reconstruction-Based Techniques
for Binary and Categorical Data.
31
An overview of the state-of-the-art in PPDM and some current
suggestions for proceeding towards standardization in PPDM are summarized in
[25]. This is followed by considerations of how PPDM could be improved
based on the European Directive 95/46/EC, additionally taking into account
procedural and process-related considerations.
The aim of the PPDM algorithms is the extraction of relevant knowledge
from large amount of data, while protecting sensitive data or information. The
several existing data mining techniques, incorporating privacy protection
mechanisms such as association rule mining, classification and clustering
techniques are discussed in [26]. An important aspect is discussed in
determining suitable algorithms for various data mining techniques to protect
sensitive data or information by doing modifications to the original database
before releasing it to the intended parties and they also presented
comprehensive set of criteria with respect to existing PPDM algorithms which
helps the designer to determine which algorithm meets specific requirements.
The authors have also been defined parameters such as efficiency, scalability,
level of privacy, data quality and hiding failure. Then evaluated set of
association rule hiding algorithms with these parameters and showed the quality
and performance of each methodology.
The authors in [27], proposed classification of PPDM techniques based
on different dimensions such as data distribution, data modification, data
mining algorithm, data or rule hiding, privacy preservation. They also
discussed various methods exist in each classification of methodology based on
the dimension. The existing methodologies are discussed for different privacy
preserving data mining techniques such as classification, association rule
mining and clustering in various dimensions. They evaluated the algorithms
related to heuristic-based techniques, cryptography-based techniques, and
reconstruction-based techniques for different data mining techniques.
The problem of protecting sensitive knowledge in large databases is
addressed[28]. The authors introduced an efficient algorithm that improves the
balance between protection of sensitive knowledge and pattern discovery, called
Sliding Window Algorithm (SWA) The experimental results revealed that SWA
is effective and can achieve significant improvement over the other previous
approaches.
32
In [29], privacy-preserving distributed association rule mining protocol
based on a new semi-trusted mixer mode was proposed by the authors which
can protect the privacy of each distributed database against the coalition up
to n − 2 other data sites or even the mixer if the mixer does not collude with any
data site.
The authors in [30] addressed PPDM technique and presented national
security applications where privacy is the main concern. They viewed the
privacy problem as a form of inference problem and introduced the notion of
privacy constraints. They described an approach for privacy constraint
processing. Finally, some directions for future research on privacy related to
data mining are presented.
The authors surveyed the current state of the art in Statistical Disclosure
Control methods for protecting individual data (micro data) [31]. A
classification of micro data protection methods such as perturbative masking
methods, nonperturbative masking methods and synthetic microdata generation
are presented. They discussed several information loss and disclosure risk
measures and then analyzed several ways of combining them to assess the
performance of the various methods. Additive noise, micro aggregation, rank
swapping, rounding, resampling and so on are perturbative method and
sampling, global recoding, top coding, bottom coding, local suppression are
non-perturbative methods which do not rely on distortion of the original data
but relies on partial suppressions or reductions for categorical as well as
continuous data are presented.
The authors in [32] emphasized the important aspects such as
identification of suitable evaluation criteria and the development of related
benchmarks required in the design of privacy preserving data mining
algorithms. In this article, they also discussed issues related to recent research in
the privacy preserving data mining to balance the trade-off between the right to
privacy and the need of knowledge discovery. From their analysis, they pointed
out that no privacy preserving algorithm exists that outperforms all the others on
all possible criteria and therefore they provided a comprehensive view on a set
of metrics related to existing privacy preserving algorithms. These metrics can
be used to evaluate the privacy preserving techniques.
33
A state of art in privacy preserving data mining techniques was provided
in [33]. The authors addressed the methods for preserving private data mining
such as randomization technique, K-anonymization and distributed privacy-
preserving data mining. The computational and theoretical limits associated
with privacy-preservation over high dimensional data sets were also presented.
The output of data mining applications needs to be sanitized for privacy-
preservation is discussed.
The authors intend to reiterate several privacy preserving data mining
technologies clearly and then they analyzed the merits and shortcomings of
these technologies [34]. They stated the concepts involved in discovering
knowledge from large databases of various privacy preserving data mining
techniques such as k-anonymity, the perturbation approach, cryptographic
techniques, randomized response techniques, the condensation approach. They
also illustrate the working nature of each method with suitable database.
Aris Gkoulalas Divanis et al. [35] discussed the two broad categories of
privacy preserving data mining which prohibits leakage of private and sensitive
information when data or information is to be shared to many people. The
authors also discussed the privacy issues with micro-data and provided existing
methodologies such as data modification approaches and synthetic data
generation approaches. The existing methodologies for finding Privacy
preserving data mining in case of distributed database environment as well as in
collaborative environment is discussed with secure multi party computation
scheme particularly for horizontal and vertical data distribution is also
discussed. The second category of approach to find privacy preserving
association rule mining while preserving sensitive mined results is termed as
association rule hiding and is studied along three principal directions: heuristic
approaches, border-based approaches, and exact approaches. The merits and
demerits of each approach is also analyzed which makes interest to the
researchers to study further.
34
The authors presented the goals of privacy preserving data mining and
discussed main classification Privacy-Preserving data mining such as Privacy-
Preserving Association Rule Mining, Privacy-Preserving Classification Mining,
Privacy-Preserving Clustering Mining [36].
Y. Lindell et al. studied the basic paradigms and notions of secure
multiparty computation and discuss their relevance to the field of privacy-
preserving data mining [37]. They described in this article about some simple
protocols that are often used as basic building blocks, or primitives, of secure
computation protocols such as oblivious transfer and oblivious polynomial
evaluation, which are two-party protocols, and homomorphic encryption, which
is an encryption system with special properties. The authors discussed the issue
of relationship between secure multiparty computation and privacy-preserving
data mining, and showed which problems it solves and which problems it does
not. They also addressed the issue of generic protocols that implement secure
computation for any probabilistic polynomial time function and also described
that the protocols are different for a scenario in which there are two parties, and
for the multiparty scenario where there are m > 2 parties. They are interested to
highlight common errors that may occur in secure protocols in order to inform
to the researchers when they design secure protocols.
The wide availability of personal data has made the problem of privacy
preserving data mining an important issue especially in finding association
rules. The authors addressed the issue of preserving the data, before the data is
published and categorized the existing methodologies into two such as k-
anonymity and probability based methodologies. Analysis of several existing
privacy preserving data mining techniques has made clearly and analyzed the
merits and demerits of each one [38].
Privacy preserving association rule mining is widely used in many real
applications. The following section discusses the earlier work related to privacy
preserving association rule mining in different database environments like
centralized and distributed.
35
2.4 Privacy Preserving Association Rule Mining
Data and knowledge hiding are two research directions that investigate
how the privacy of raw data, or information, can be maintained either before or
after the course of mining association rules. By focusing on the knowledge
hiding thread, the authors presented taxonomy and surveyed recent approaches
that have been applied to the association rule hiding problem [39]. They also
provided a thorough comparison of the surveyed approaches which are used for
other data mining tasks by focusing on its metrics. The metrics which are used
to evaluate the performance of the approaches is also presented in this article.
Privacy preserving data mining is a novel research area to preserve
privacy for sensitive knowledge from disclosure. The authors presented a
detailed overview and classification of approaches which can be applied to
knowledge hiding in the context of association rule mining [40]. Evaluation
metrics which can be used to evaluate the performance of various hiding
algorithms are presented in this article.
The discussion of various proposed methodologies, algorithms for
finding privacy preserving association rule mining is summarized, analyzed the
advantages as well as disadvantages of each methodology is presented [41]. The
authors classified the methodologies into three such as heuristic-based
techniques, reconstruction-based techniques, and cryptography-based
techniques and each methodology is discussed. In [42], an overview of
knowledge hiding methodology related to classification, clustering and
sequence discovery are studied.
After the event occurred in September 11, 2001, more attention is
received from United States and elsewhere to the use of multiple government
and private databases for the identification of possible perpetrators of future
attacks, as well as an unprecedented expansion of federal
government data mining activities, many involving databases containing
personal information. The authors in [43], claimed that prospective data mining
could be used to find the “signature” of terrorist cells embedded in larger
networks. They focused in this article on the matching problem across databases
and the concept of “selective revelation” and their confidentiality implications.
36
The authors in [44] presented a literature on different privacy preserving
data mining approaches existing along with details showing how to develop
specific solutions within each. They studied the privacy preserving data mining
techniques belongs to these models such as predictive and descriptive models.
Different data partitioning methods such as homogeneous and heterogeneous
partitioning methods are addressed. Various protocols for various situations in
privacy preserving in different data mining techniques are broadly classified
into two categories, one is data perturbation and other one is secure multi party
computation techniques each may further classified into many like
cryptographic techniques, K-anonymity technique, blocking, randomization and
so on. For each protocol, the authors analyzed the performance to show its
effectiveness in terms of security, computations and communications
complexities.
A Survey on privacy protection in distributed environment with four
cryptographic based techniques is discussed in [45]. The others described a
privacy preserving technique for learning Bayesian networks for vertically
partitioned databases between two sites. Three privacy-preserving data mining
techniques in a fully distributed setting are also presented.
An overview of distributed data mining applications and algorithms for
peer to peer environments are addressed in [46]. The authors addressed the
issues related to problems of existing privacy-preserving multi-party data
mining techniques. This article offered a more realistic formulation of the
PPDM problem as a multi-party game and focuses on some recent results.
The Earlier work related to association rule mining in centralized
database as well as distributed database environment are given in the following
sections.
2.5 Privacy Preserving Association Rule Mining in Centralized
Database Environment
The various approaches to find privacy preserving association rule
mining are heuristic, border based and exact approaches. The earlier research
work related to heuristic is presented below:
37
An extensive research work in the area of statistics has been discussed
[47,48], to provide statistical information without compromising sensitive
information about individuals to find privacy preserving association rule
mining. Evfimievski et al. presented a new framework for preserving privacy
using randomization technique [49]. Randomization technique is analyzed and
also presented the privacy breaches that may occur in privacy preservation.
They proposed a class of randomization operators and proved its efficiency by
comparing with randomization in limiting the breaches. Also experimental
results of algorithm with real datasets are given in this article. In [50], the
authors generalized the privacy preserving association rule mining by allowing
different attributes to have different levels of privacy. Different randomization
factors are used for different attributes in the randomization process and they
developed an efficient algorithm called Recursive Estimation to estimate the
support of an item set for this framework. They also proved that non uniform
randomization factors improve the accuracy compared to uniform
randomization approaches.
In [51], authors proposed a privacy preserving association rule mining
algorithm called DDIL based on data disturbance and inquiry limitation. The
proposed method, disturbes and hides the original data with high degree of
privacy-preserving specially, a high effective method of generating frequent
items from transformed data sets is proposed. From the experimental study they
proved that proposed methods are effective in balancing privacy and accuracy.
The term “association rule hiding” has been mentioned for the first time
in 1999 by Atallah et al. The concept of data sanitization is nothing but reducing
the support of sensitive item set to hide it from disclosure and was first
proposed [52], to solve association rule hiding problem. They also proved that
the optimal sanitization is an NP-hard problem.
A key problem and still not sufficiently investigated is the need to
balance the confidentiality of the disclosed data with the legitimate needs of the
data users. Dasseni et al., developed three strategies [53], to hide sensitive
association rules in the mining process based on the two approaches These
strategies work either on support, confidence of the rule by decreasing either
38
one of these until the sensitive rule is hidden. Proposed algorithms are based on
single rule heuristic hiding approaches, following any one of either support or
confidence of antecedent or consequent of the sensitive rule.
A new informative rule set is defined in [54], to generate prediction
sequences equal to those generated by the association rule set by the confidence
priority. The authors presented an algorithm to directly generate the informative
rule set, without generating all frequent item sets first. Less number of database
accesses are required than unconstrained direct methods. From the experimental
results, they proved that the informative rule set is smaller than both the
association rule set and the non-redundant association rule set.
Oliveria et al. introduced multiple rules hiding approach for hiding
multiple association rules and it requires two scans only [55]. An index file
created in first scan to speed up the process of finding sensitive transactions and
to retrieve them quickly. In the second scan, the algorithms sanitize the database
by selectively removing the least amount of individual items that accommodate
the hiding of the sensitive knowledge. An interesting novelty feature in these
approaches is considering an account of the impact of sanitization on hiding the
sensitive patterns, but also the impact related to the hiding of non sensitive
knowledge.
Privacy is the main threat to many people in discovering knowledge
from large databases using data mining techniques. In [56], authors presented a
scheme based on probabilistic distortion of user data that can simultaneously
provide a high degree of privacy to the user and retain a high level of accuracy
in the mining results. In the same article, authors analyzed their algorithm with
real and synthetic datasets and proved that the proposed algorithm preserves
privacy while providing accurate results to the users without generating
spurious rules.
Two sanitizing algorithms such as round robin and the random algorithm
is proposed for balancing privacy and knowledge discovery in privacy
preserving association rule mining [57]. These algorithms require only two
scans regardless of the database size and the number of restrictive association
39
rules that must be protected where the first scan is required to build the index
(inverted file) for speeding up the sanitization process and second scan is used
to sanitize the original database. They compared and analyzed their algorithms
in terms of effectiveness and scalability with previously proposed algorithms.
The analyses in this article proved that proposed sanitizing algorithms are
significantly improved over the previous algorithms for hiding sensitive
association rules. The authors also proved that their sanitization methods are
robust in the sense that there is no de-sanitization possible.
In [58], the authors, proposed two algorithms, ISL (Increase Support of
LHS) and DSR (Decrease Support of RHS), to automatically hiding informative
association rule sets without pre-mining and selection of hidden rules. Analysis
is performed to illustrate effectiveness of the proposed algorithms. They also
recommended appropriate usage of the proposed algorithms based on the
characteristics of databases.
The authors in [59], proposed two distortion based heuristics algorithms
to selectively hide the sensitive association rules accepting limited side effects.
The first algorithm, called Priority-based Distortion Algorithm (PDA), reduces
the confidence of a sensitive association rule by reversing 1’s to 0’s in items
belonging in the rule’s consequent. The second algorithm, called Weight-based
Sorting Distortion Algorithm (WDA), concentrates on the optimization of the
hiding process in an attempt to achieve the least side-effects and the minimum
complexity. They proved that both PDA and WDA produces hiding solutions of
better quality in terms of computation complexity and privacy.
Menon, et al. studied about finding frequent item sets for privacy
preserving association rule mining while maximizing the accuracy of shared
database. The authors presented an optimal approach for hiding sensitive item
sets while keeping the number of modified transactions to a minimum. They
also proved that proposed approach works well with databases with millions of
transactions and presented the experimental results with real data as well as
synthetic data[60].
40
Chih-Chia Weng, et al. proposed an efficient algorithm called Fast
Hiding Sensitive Association Rules denoted as FHSAR for hiding sensitive
association rule [61]. The algorithm can completely hide any given sensitive
association rules by scanning the database only once, which significantly
reduces the execution time. Experimental results showed that FHSAR
outperforms previous works in terms of execution time and side effects. The
number of new rules generated in hiding process is minimized and is
independent of the size of database. In addition to FHSAR algorithm, authors
also proposed two heuristic approaches for improving the performance of
algorithms. First, a heuristic function is used to obtain a prior weight for each
transaction, by which the order of transactions modified can be efficiently
decided. Second, the correlations between the sensitive association rules and
each transaction in the original database are analyzed.
In order to find privacy preserving association rule mining in centralized
database, a new algorithm is presented in [62] and after the mining phase filter
is used to weed out or hide the restricted discovered association rules. Before
applying the algorithm, the data structure of the database and sensitive
association rule mining set is analyzed to build the more effective model. This
new algorithm can be used to balance privacy preserving and knowledge
discovery in association rule mining.
A new association rule hiding algorithm is proposed [63] and its
algorithm are stated in detail aiming to hide simple rules, including single rule
and composed rule. Weak association rules and strong association rules are
distinguished in this work to do sanitization process easily and effectively to
avoid side effects. Four item modification methods are designed for updating
the selected weak association transactions. Only a small number of transactions
are required in updating to keep the original features in the mined dataset. The
modification factor is used to achieve hiding rate 100% to reduce the number of
lost rules and newly generated rules. The authors proved that this proposed
approach is robust in finding privacy preserving association rule mining.
41
In the proposed approach [64], an efficient data structure (FCET) is used
to store maximal frequent item sets to support scalability. The authors also
proposed a new framework called greedy approximation approach by
combining efficient techniques of hiding sensitive rules with four lemmas and
the transaction retrieval engine based on the FCET index tree. The items in the
transactions are modified based on four lemmas and its strategies.
The authors proposed a greedy based approach which is a variant of
greedy approximation algorithm called greedy exhausted algorithm which also
hides sensitive rules by their confidence or support below a user specified
threshold [65]. From their experimental results, they proved that both methods
works well for hiding sensitive rules completely but the later algorithm
considers cost for side effects and based on the cost of side effects, suitable
modifications will be made to the database to reduce the side effects further.
A new approach was proposed in [66], called ISSRH (Increase Support
Sensitive Rule Hiding) to hide sensitive association rules that contain sensitive
items. The approach has six steps and each one performs specific task and
clustering technique is also one of the steps which is used to group the similar
items related to specified sensitive rules. This approach considers different
characteristics while hiding sensitive rules to increase the efficiency of the
algorithm. The authors presented an algorithm for the proposed approach and
also showed the results with examples. The authors analyzed the algorithm and
proved the effectiveness of the algorithm in terms of privacy, computational
cost, number of database scans and minimal modifications.
The authors investigated the issue of exact knowledge hiding and
proposed three schemes that are suitable for identifying exact solutions of high
quality [67]. They also introduced a structural decomposition to partition the
original constrains satisfaction problem (CSP) into numerous independent
components and parallelization framework, which can be applied to all the three
schemes and which dramatically improves the runtime of the hiding algorithm.
In the same article they introduced a novel framework for decomposition and
parallel solving of hiding problems, which are handled by the exact hiding
approaches. This novel framework is efficient in solving large size database and
42
significantly decreases their runtime. The authors conducted experiment and
proved that effectiveness of the approaches towards providing high quality
knowledge hiding solutions.
An investigation on efficient reconstruction based techniques for
association rule hiding is addressed by Yuhong Guo and proposed a frequent
pattern tree based method for inverse frequent item set mining which is used in
reconstruction based framework for finding privacy preserving association rules
mining [68]. The proposed model has three phases, first phase generates
frequent item sets from the original database, second phase performs
sanitization algorithm over frequent item sets by selecting hiding strategy and
identifying sensitive frequent items sets according to sensitive association rules.
The third phase generates sanitized database by using inverse frequent item set
mining algorithm and then releases this database. Hiding effects, data utility and
time complexity are considered as performance measures for the proposed
protocol and are analyzed in this article.
A real example for individual identifiability problem in privacy
preserving data mining is given in [69]. Suppose medical data was disclosed
without name and address but linking with publicly available voter registration
records using birth date, gender, and postal code may reveal the name and
address corresponding to the medical records. This raises a key point that
absence of identity of an individual in data is not sufficient since with joining
the data with other sources may reveal identity of the individual. The authors
proposed an approach using the concepts from [70] and introduced quasi
identifier to solve this problem. k-anonymity is used which states that any
record must not be unique in its quasi identifiers there must be at least k records
with the same quasi-identifier.
Y. Saygin, et al. developed two algorithms to generate sanitized
database from the original database by modifying the value of items with
unknown value for selected transactions to hide the sensitive association rules
[71]. The first algorithm focuses on hiding the rules by reducing the minimum
support of the item sets which generated the sensitive rules and the second one
focuses on reducing the minimum confidence of the rules in two different ways.
43
In the first method, confidence of a rule is reduced by replacing ls with ?s, while
the second method replaces 0s with ?s. Analysis of each algorithm is made and
proved that these algorithms are effective in hiding sensitive frequent item sets
in order to hide sensitive association rules.
The authors studied various data altering techniques for hiding
association rules, classification and clustering rules [72]. Usually entire data
mining process needs to be executed to find the hidden rules. The authors
proposed two algorithms, ISL (increase support of LHS) and DSR (decrease
support of RHS), to replace data by unknowns in database so that sensitive
predicative rules containing specified items on the left hand side of rule cannot
be inferred through association rule mining. They analyzed the performance of
the algorithm in terms of privacy, number of database scans & number of
pruning hidden rules. Compared with approach in [73], this approach hides all
the rules containing hidden items on the left hand side.
Pontikakis et al. argued that the main disadvantage of blocking is, an
adversary can disclose the hidden association rules simply by identifying the
generating item sets that contain question marks and can lead to rules with a
maximum confidence that lies above the minimum confidence threshold. The
authors proposed a blocking algorithm to avoid disclosing sensitive patterns
which generates rules that were not exist in the original dataset In order to
balance the trade-off between the level of privacy and data utility, the proposed
algorithm incorporates a safety margin [74].
Data reconstruction methods put the original data aside and start from
sanitizing the so-called “knowledge base”. The new released data is then
reconstructed from the sanitized knowledge base. This idea is first depicted in
[75] but the proposed approach is still very incomplete and limited in aspects
such as does not giving concrete guidance on how to sanitize the item set lattice
according to the sensitive association rules and the feasibility of the data
reconstruction process is restricted to knowledge sanitization process which can
produce an item set lattice with consistent support value configuration
relationship. This method cannot guarantee to find a consistent one within a
polynomial time. .
44
A study of finding an appropriate balance between a need for privacy
and information discovery on frequent patterns is discussed in [76]. The authors
proposed an innovative technique for hiding sensitive patterns. In their
approach, a sanitization matrix is defined. By multiplying the original
transaction database and the sanitization matrix, a new database, called sanitized
database is obtained which preserves sensitive item sets. En Tzu Wang, et al.
also studied the same problem in [74] and proposed a method based on
sanitization concept. A probability policy is additionally adopted in this method
against the recovery of sensitive patterns to avoid forward inference attack
absolutely where the confidence level is given by the users approximates to 1.
They also discussed the efficiency of the proposed method.
Some of relevant works which utilizes the border based approaches are
as follows:
The first frequent item set hiding methodology based on the concept of
the border revision of the non sensitive frequent item sets to track the impact of
altering transactions in the original database is proposed by Sun & Yu in [77].
A study on hiding sensitive frequent item sets in the process of
computing privacy preserving association rule mining [78], by modifying the
transactions in the database considering the quality of the sanitized database
especially on preserving the non-sensitive frequent item sets. To preserve the
non sensitive frequent item sets, the authors proposed border-based approach to
efficiently evaluate the impact of any modification to the database in the process
of hiding sensitive frequent item sets and also qualitative database can be well
maintained by greedily selecting the modifications with minimal side effect.
The authors also analyzed the performance of their proposed approach in terms
of privacy and cost. They also proved that the proposed approach finds solution
by satisfying privacy preserving goals effectively using border revision
approach.
The authors in [79], presented a new algorithm for sanitizing raw data
from sensitive knowledge in the context of mining of association rules. This
approach relies on the maxmin criterion which is a method in decision theory
45
for maximizing the minimum gain, and then builds upon the border theory of
frequent item sets. They proved that the proposed method is efficient by
conducting experiments.
The earlier works in privacy preserving association rule mining based on
exact approach are presented as follows:
A novel methodology based on exact approach is proposed to find
optimal solution for association rule hiding without producing any side effects
using border revision concept and integer programming technique [80]. They
formulated the hiding process as constraints satisfaction problem and the
solution for association rule hiding is nothing but determining a sanitized
database by satisfying constraints. Minimizing the distance between original
database and sanitized database is the main concept in this approach while
finding optimal solution. They also demonstrated the effectiveness of the
algorithm with suitable database.
The authors in [81], proposed a novel methodology based on exact
approach which provides an optimal solution for hiding of sensitive frequent
item sets. This approach minimally extends the original database by a
synthetically generated database called extended database and formulates the
construction of the extended database as a constraint satisfaction problem which
is then solved by using Binary Integer Programming (BIP). They also proved
that privacy preserving association rule mining using hybrid approach
provides an approximate solution close to the optimal one when an ideal
solution does not exist without producing any side effects.
A. Gkoulalas-Divanis, et al. proposed a novel approach based on exact
approach to find optimal solution for the association rule hiding problem to
avoid side effects [82]. The approach adopts border revision concept and integer
programming technique to find optimal solution for created database called
extended database. The approach works by determining minimally extended
database based on the positive border item sets and negative border item sets
and their threshold values. Then the hiding problem is formulated as a
constraint satisfaction problem and finally applies BIP to find the solution for
46
the extended database. The authors proved that this approach finds optimal
solutions to an extended database having higher quality than already developed
exact based approaches.
The authors in [83] proposed an algorithm and it is an extension of
functionality of inline algorithm for hiding sensitive association rules called two
phase iterative algorithm by computing original and revised borders in a
transactional database based on the given minimum support threshold value.
The approach has two phases that iterates until either an exact solution of the
given problem instance is identified, or a prespecified number of subsequent
iterations have taken place. The two–phase iterative algorithm is constantly
superior to the inline algorithm since its worst performance equals the
performance of the inline algorithm. The experimental results indicate that there
are several settings in which the two–phase iterative algorithm finds an optimal
hiding solution. The two–phase iterative algorithm can capture all the exact
solutions which can be identified by the inline approach.
The authors investigated the issue of exact knowledge hiding and
proposed three schemes that are suitable for identifying exact solutions of high
quality [84]. They also introduced a structural decomposition to partition the
original CSP into numerous independent components and parallelization
framework, which can be applied to all the three schemes and which
dramatically improves the runtime of the hiding algorithm. A novel framework
for decomposition and parallel solving of hiding problems, which are handled
by the exact hiding approaches is also presented. This framework is efficient in
solving large size database and significantly decreases runtime. The authors
conducted experiments and proved the effectiveness of the approaches towards
providing high quality knowledge hiding solutions.
A. Gkoulalas Divanis, et al. addressed many issues related to privacy
preserving data mining, association rule hiding, classes of association rule
hiding methodologies and also rule hiding in classification technique, privacy
preserving clustering & sequence hiding [85]. They also presented the goals of
association rule hiding. Many approaches are proposed for border based as well
as for exact based approaches and also presented algorithms for these
47
approaches such as BBA, Max-Min algorithm using border revision concept,
Menon’s, Inline, Two phase iterative and Hybrid algorithms based on exact
approach. The authors conducted several experiments to prove the effectiveness
of each algorithm with examples and also discussed the difficulties in some
situations.
2.6 Privacy preserving Association Rule Mining in Distributed
Database Environment
The database may be partitioned into horizontal, vertical and mixed mode
in distributed database environment. Some of the relevant works to find
privacy preserving association rule mining when data is partitioned horizontally
is presented as follows:
The problem of knowing who is richer without disclosing their wealth is
addressed in two milliner’s problem and which belongs to secure multi party
computation. The authors proposed protocols for two milliner’s problem and
also proposed for multi party case [86].
Yao first postulated the two-party computation problem and developed a
provably secure solution [87]. A new tool is proposed for controlling the
knowledge transfer process in cryptographic protocol design by the author. The
authors showed that how two parties A and B can interactively generate a
random integer N = pċq such that its secret, that is the prime factors (p, q), is
hidden from either party individually but is recoverable jointly if desired. Using
this concept, they proposed a two party protocol with private values i and j to
compute any polynomial computable functions f(i,j) and g(i,j) with minimal
knowledge transfer. A framework for secure multiparty computation is
developed in [88] and proved that computing a function privately is equivalent
to computing it securely. This protocol is extended to multiparty computations
by Goldreich et al. [89].
It is important to investigate efficient methods for distributed mining of
association rules when the databases size is large and it requires to achieve high
scalability of distributed systems with the easy partitioning. The study in [90],
discloses some interesting relationships between locally large and globally large
48
item sets and proposed an interesting distributed association rule mining
algorithm, FDM (Fast Distributed Mining of association rules), which generates
a small number of candidate sets and substantially reduces the number of
messages to be passed at mining association rules. The authors analyzed the
performance of the proposed algorithm and showed that FDM has a superior
performance over the direct application of a sequential algorithm.
Naor M, et al. addressed a protocol called Oblivious Transfer protocol
and is used to provide communication between sender and receiver in secure
manner [91]. In this protocol one party, the sender transmits part of its inputs to
another party, the receiver, in such a way that protects both of them. The sender
is assured that the receiver does not receive more information than it is entitled,
while the receiver is assured that the sender does not learn which part of the
inputs it received. This protocol is used as a key component in many
applications of cryptography. The authors analyzed the protocol to measure the
performance of the protocol and presented the merits and demerits.
In [92], authors showed that two of the private scalar product protocols
proposed previously are insecure. They described a private scalar product
protocol based on homomorphic encryption and the efficiency of the protocol is
demonstrated with massive datasets.
For many data mining applications, data is typically represented as
attribute-vectors and the scalar (dot) product can be considered as one of the
fundamental operations. The authors in [93], presented a very efficient and very
practical secure scalar product protocol for horizontally partitioned database.
They compared it with most common scalar product protocols and also proved
the efficiency of the proposed protocol by taking real data set.
Clifton proposed a toolkit of components that can be combined for
specific privacy preserving data mining applications [94]. They showed that
how components of toolkit can be used to solve different privacy preserving
data mining problems. They also presented four efficient protocols such as
Secure sum, Secure set union, Secure size of set intersection and Scalar product
are the protocols for privacy preserving computations which can be used to
support data mining. They also demonstrated some of the protocols for finding
privacy preserving data mining problems in distributed environment.
49
In [95], authors proposed a unique approach for mining knowledge from
grid scale system while ensuring that the data is cryptographically safe using
third party model. The proposed algorithm called Private-Majority-Rule a k-
private distributed association rule mining algorithm involves no global
communication patterns and dynamically adjusts to changes in the data or to the
failure and recovery of resources. The architecture adopted Majority-Rule – a
highly scalable distributed association rule mining algorithm [96]. By
simulations with thousands of resources, authors proved that the algorithm
quickly converges to the correct result while using reasonable communication.
In many real life applications data is split between multiple
organizations and these organizations wish to utilize all of the data to create
more accurate predictive models while revealing neither their training data nor
the instances to be classified. To address the issue, Naive Bayes Classifier is
used. In their study [97], authors presented a privacy preserving Naive Bayes
Classifier for horizontally partitioned data to address this issue.
Most of the privacy preserving distributed data mining algorithms are
based on perturbation and secure multi party computation by accepting
reduction in accuracy and some overheads. Alex et al., offer a new approach to
perform privacy preserving distributed data mining without using secure
computation or perturbation. They adopted two new entities termed miner and
calculator who do not possess databases and developed three algorithms based
on this new approach for handling three cases horizontally partitioned,
vertically partitioned and any data mining method in distributed databases
environment. The authors compared each algorithm based on computation,
communication overheads with secure computation or perturbation [98].
The article [99] makes primary contributions on two different grounds.
First, it explores independent component analysis as a possible tool for
breaching privacy in deterministic multiplicative perturbation-based models
such as random orthogonal transformation and random rotation. Then, it
proposes an approximate random projection-based technique to improve the
level of privacy protection while still preserving certain statistical
50
characteristics of the data. An extensive theoretical analysis and experimental
results are also presented in this paper. The authors proved that the proposed
technique is effective and can be successfully used for different types of
privacy-preserving data mining applications.
In many data mining applications, data exist from different parties with
different schemas. In [100], the authors addressed the problem of privacy-
preserving frequent pattern mining in many to many schemas across two
dimension sites where the sites are not trusted and they are semi-honest. A
method is proposed to address this issue in data mining techniques based on the
concept of semi-join which do not involve data encryption which is used
commonly. Experimental results are presented to study the efficiency of the
proposed methods.
A new protocol [101] has been proposed for horizontally partitioned
databases in the process of finding privacy preserving association rules. The
same method also provides an additional benefit of finding privately discover
association rules. The authors also proved that the protocol is more efficient
than previous methods. The protocol supports to achieve privacy goals such as
every party can access only their data, no party is able to learn the links between
other parties and their data, and no party learns any transactions of the other
parties' databases.
The authors in [102], proposed a new algorithm for semi-honest model
with negligible collision probability is a modified algorithm of privacy
preserving association rule mining on distributed homogenous database
algorithm. This new algorithm adopts public key cryptosystem which
overcomes the overheads involved in employing the algorithm with
commutative encryption system. The authors proved that this new algorithm is
faster than old one by considering privacy and accuracy of results. One of the
important feature of this algorithm is scalability which means the same
algorithm can be extended to any number of sites. From the experimental
results, the authors proved that new algorithm has a high performance in
computations, communications, time and accuracy than the previous algorithms
due to the total bit-communication cost for this algorithm is function in N, N
indicates the number of sites.
51
The author studied the problem of finding privacy preserving association
rule mining when data are partitioned and placed in different locations and no
site owner willing to provide their data or information to any one in any site in
the thesis work. The algorithms for privacy preserving association rules mining
over horizontally, vertically and mixed partitioned database are presented in this
thesis work [103]. Several experiments are conducted in each partitioning
method to analyze the performance and also to find out the limitations that may
exist in any method.
Kantarcioglu proposed methods to mine horizontally partitioned data
without violating privacy and discussed how to use the data mining results by
preserving privacy. The proposed methods incorporated cryptographic
techniques to minimize the information shared, while adding as little as possible
overhead to the mining and processing task [104].
Secure mining of association rules over horizontally partitioned database
using cryptographic technique to minimize the information shared by adding
overhead to the mining process is presented in [105]. Using cryptographic tools,
the authors proposed two protocols to mine distributed association rules on
horizontally partitioned data securely in semi honest model. Communication
and computation costs of mining with the two protocols are discussed.
A new solution for privacy preserving association rule mining by
integrating the advantages of two approaches such as protecting the private data
by using an extended role based access control approach and the second
approach which finds solution by adopting cryptographic techniques when
sensitive information is to be preserved is proposed by the authors in [106].
They classified the data into two as sensitive objects and non sensitive objects,
sensitive objects are encrypted and stored, the permitted user allowed to access
the sensitive objects only after decryption, ensuring privacy. By using these
techniques the authors proved that the new algorithm minimizes the information
loss and privacy loss. The cryptographic technique helps to store sensitive data
and providing access to the stored data based on an individual’s role which
ensures that the data is safe from privacy breaches.
52
Support vector machine classification is one of the most widely used
classification methodologies in data mining and machine learning and it is
based on solid theoretical foundations and has wide practical application.
Privacy-preserving algorithm was developed in [107], for support vector
machine nonlinear classification in horizontally partitioned databases. Secure
set intersection cardinality to securely compute the gram matrix was adopted in
this algorithm.
The earlier research work when the database is partitioned vertically or
in mixed mode in order to find privacy preserving association rule mining is
given below:
In [108], authors addressed the problem of association rule mining in
vertically partitioned database by using cryptography based approach. Each site
holds some attributes of each transaction and the sites wish to collaborate to
identify global valid association rules. In this article, the authors defined the
definition of global frequent item sets in the case of vertically partitioned
database where every site possess different set of attributes for the common set
of tuples. Based on secure computation of scalar product protocol, the authors
developed algorithm for two party case for finding global frequent item sets
and their supports efficiently without violating any one’s privacy constraints.
Analysis of security and communication over the proposed algorithm is
presented and it proves that the algorithm is efficient.
The authors in [109], addressed several private scalar product protocols
and also analyzed their insecurity. A two party scalar product protocol is
proposed with a un trusted third party using algebraic computations. They also
analyzed and proved that the proposed protocol is secure and practical by taking
low cost of communications and computations.
Privacy preserving association rule mining problem is addressed and a
new approach is proposed when one data miner and multiple data providers
exist. The proposed approach finds privacy preserving association rule mining
accurately but discloses less private information based on algebraic technique
with randomization technique [110].
53
The problem of finding association rules for vertically
partitioned databases while preserving the confidentiality of each database in all
sites is addressed in [111]. The authors proposed two algorithms for discovering
frequent item sets and for calculating the confidence of the rules. They also
analyzed the algorithms considering privacy properties, and compared them to
existing algorithms.
In their study [112], authors developed a privacy preserving version of
the popular clustering algorithm DBSCAN based on density-based and notion of
clustering allows discovering clusters of arbitrary shape. DBSCAN uses R-trees
to support efficient associative queries and it requires only two input
parameters, but it offers some support in determining appropriate values. They
also proved that privacy preserving DBSCAN requires privacy preserving R-
trees which is achieved in the developed algorithm.
In [113], simple technique of transforming the categorical and numeric
sensitive data using a mapping table and graded grouping technique respectively
is discussed by treating distributed data as centralized. The authors also
discussed the proposed technique with data mining tasks such as classification,
clustering and association rule mining and the results are analyzed.
Secure scalar product protocol is an important fundamental protocol in
secure multi-party computation. Based on additive homomorphism public key
cryptosystem, the authors developed a new secure scalar product protocol under
semi-honest model with low communication complexity [114]. Furthermore,
they applied it to position relationship decision for privacy preserving space
vectors.
In medical domains, medical practitioners use many data mining
techniques to make right decisions. As medical data holds patient’s personal
information revealing all information is a adversary situation and this makes to
prevent private information such as patient’s personal identification information
from disclosing. In [115], authors addressed preservation of privacy in
classification techniques for two cases which are centralized and distributed
database environment. They proposed architecture for privacy preservation in
classification technique for mixed partitioned distributed database model and
this model has a combination of vertical and horizontal for Breast cancer
dataset.
54
Database in distributed applications is partitioned commonly in two
types that is horizontally partitioned and vertically partitioned databases but in
many distributed applications mixed partitioned database partitioning methods
are also used. The authors in [116] addressed the issue of finding privacy
preserving association rule mining in mixed partitioned database and developed
an algorithm which is a modified algorithm in [90] based on cryptography
technique. Algorithm is evaluated and showed the efficiency in finding global
results without violating privacy constraints based on metrics.
In [117], authors discussed broad areas of privacy preserving data
mining, the underlying algorithms and methodologies such as randomization
and k anonymity model. They also discussed the existing methodologies for
privacy preserving association rule mining techniques in the distributed
environment of different partitioning methods such as horizontal and vertical
models and showed the limitations of each method. The authors proposed a
novel approach to preserve privacy in association rule mining using secure hash
function on semi-anonymize sensitive attributes to eliminate the possibility of
original data re-construction.
Danfeng Yao et al., presented a private distributed scalar product
protocol in [118], which can be used for obtaining trust values from private
recommendations. They proposed a credential-based trust model where the
trustworthiness of a user is computed based on his or her affiliations and role
assignments. This trust model is proved simple to compute and scalable for
many users.
Privacy preserving data mining is getting more attention from
researchers to find effective solution without producing any side effects. There
are many real applications where privacy preserving data mining techniques are
used in surveillance which is naturally supposed to be “privacy-violating”
applications. A number of techniques have been discussed for facial de-
identification, bio-surveillance, and identity theft [119, 120 & 121] which uses
privacy preserving data mining algorithms.