Dm_NN

Embed Size (px)

Citation preview

  • 8/6/2019 Dm_NN

    1/21

    Data Mining Based on Neural Networks

    Fore Word: The First Question Which Scratches our Brain

    What is a Neural Network? The Ultimate Answer Is

    Similarity with biological networkFundamental processing elements of a neural networkis a neuron1. Receives inputs from other source2. Combines them in someway3. Performs a generally nonlinear operation on the

    result4. Outputs the final result.

    Fundamental processing element of a neural network is a neuron

  • 8/6/2019 Dm_NN

    2/21

    A human brain has 100 billion neurons

    An ant brain has 250,000 neurons

    Introduction:

    The application of neural networks in the data mining hasbecome wider. Although neural networks may have complex structure,long training time, and uneasily understandable representation of results,neural networks have high acceptance ability for noisy data and highaccuracy and are preferable in data mining.

    In this demonstration the data mining based on neuralnetworks is researched in detail, and the key technology and ways toachieve the data mining based on neural networks are also researched.

    With the continuous development of database technologyand the extensive applications of database management system, the datavolume stored in database increases rapidly and in the large amounts of

    data much important information is hidden.If the information can be extracted from the database

    they will create a lot of potential profit for the companies, and thetechnology of mining information from the massive database is known asdata mining.

    Data mining tools can forecast the future trends andactivities to support the decision of people. For example, throughanalyzing the whole database system of the company the data miningtools can answer the problems such as Which customer is most likely torespond to the e-mail marketing activities of our company, why, and

    other similar problems.Some data mining tools can also resolve some traditionalproblems which consumed much time, this is because that they canrapidly browse the entire database and find some useful informationexperts unnoticed. Neural network is a parallel processing network whichgenerated with simulating the image intuitive thinking of human, on thebasis of the research of biological neural network, according to thefeatures of biological neurons and neural network and by simplifying,summarizing and refining

    It uses the idea of non-linear mapping, the method ofparallel processing and the structure of the neural network itself to

    express the associated knowledge of input and output. Initially, theapplication of the neural network in data mining was not optimistic, andthe main reasons are that the neural network has the defects of complexstructure, poor interpretability and long training time.

    But its advantages such as high affordabilityto the noise data and low error rate, the continuously advancing andoptimization of various network training algorithms, especially thecontinuously advancing and improvement of various network pruningalgorithms and rules extracting algorithm, make the application of theneural network in the data mining increasingly favored by theoverwhelming majority of users.

  • 8/6/2019 Dm_NN

    3/21

    NEURAL NETWORK METHOD IN DATA MINING:

    There are seven common methods and techniques ofdata mining which are the methods of statistical analysis, rough set,covering positive and rejecting inverse cases, formula found, fuzzymethod, as well as visualization technology. Here, we focus on neural

    network method.Neural network method is used for classification,

    clustering, feature mining, prediction and pattern recognition. It imitatesthe neurons structure of animals, bases on the M-P model and Hebblearning rule, so in essence it is a distributed matrix structure. Throughtraining data mining, the neural network method gradually calculates(including repeated iteration or cumulative calculation) the weights theneural network connected.

    The neural network model can be broadly divided into the following threetypes:

    (1) Feed-forward networks: it regards the perception back-propagationmodel and the function network as representatives, and mainly used inthe areas such as prediction and pattern recognition.

    (2) Feedback network: it regards Hopfield discrete model and continuousmodel as representatives, and mainly used for associative memory andoptimization calculation;

    (3) Self-organization networks: it regards adaptive resonance theory (ART)

    model and Kohonen model as representatives, and mainly used for clusteranalysis. At present, the neural network most commonly used in dataMining is BP network.

    Of course, artificial neural network is the developingscience, and some theories have not really taken shape, such as theproblems of convergence, stability, local minimum and parametersadjustment.

    For the BP network the frequent problems itencountered are that the training is slow, may fall into local minimum andit is difficult to determine training parameters.

    Aiming at these problems some people adoptedthe method of combining artificial neural networks and genetic genealgorithms and achieved better results.

    Artificial neural network has the characteristics ofdistributed information storage, parallel processing, information,reasoning, and self-organization learning, and has the capability of rapidfitting the non-linear data, so it can solve many problems which aredifficult for other methods to solve.

  • 8/6/2019 Dm_NN

    4/21

    DATA MINING PROCESS BASED ON NEURAL NETWORK:Data mining process can be composed by three main phases:

    1. Data preparation2. Data mining, expression3. Interpretation of the results

    The data mining based on neural network is composed by

    1. Data preparation2. Rules extracting3. Rules assessment

    A. Data Preparation

  • 8/6/2019 Dm_NN

    5/21

    Data preparation is to define and process the mining data tomake it fit specific data mining method. Data preparation is the firstimportant step in the data mining and plays a decisive role in the entiredata mining process.

    It mainly includes the following four processes.

    1) Data cleaningData cleansing is to fill the vacancy value of the data, eliminate

    the noise data and correct the inconsistencies data in the data.

    2) Data optionData option is to select the data arrange and row used in this mining.

    3) Data preprocessingData preprocessing is to enhanced process the clean data which has beenselected.

    4) Data expressionData expression is to transform the data after preprocessing into the formwhich can be accepted by the data mining algorithm based on neuralnetwork.

    The data mining based on neural network can only handlenumerical data, so it is need to transform the sign data into numericaldata. The simplest method is to establish a table with one-to-one

    correspondence between the sign data and the numerical data.The other more complex approach is to adopt appropriate Hash

    function to generate a unique numerical data according to given string.Although there are many data types in relational database, but they allbasically can be simply come down to sign data, discrete numerical dataand serial numerical data three logical data types.

  • 8/6/2019 Dm_NN

    6/21

    The symbol Apple in the figure can be transformed into thecorresponding discrete numerical data by using symbol table or Hashfunction. Then, the discrete numerical data can be quantified intocontinuous numerical data and can also be encoded into coding data.

    B. Rules ExtractingThere are many methods to extract rules, in which the most commonlyused methods are LRE method, black-box method, the method ofextracting fuzzy rules, the method of extracting rules from recursivenetwork, the algorithm of binary input and output rules extracting (BIO-RE), partial rules extracting algorithm (Partial-RE) and full rules extractingalgorithm (Full-RE).

    C. Rules AssessmentAlthough the objective of rules assessment depends on each

    specific application, but, in general terms, the rules can be assessed inaccordance with the following objectives.(1) Find the optimal sequence of extracting rules, making it obtains thebest results in the given data set;(2) Test the accuracy of the rules extracted;(3) Detect how much knowledge in the neural network has not beenextracted;(4) Detect the inconsistency between the extracted rules and the trainedneural network.

    IV. DATA MINING TYPES BASED ON NEURAL NETWORK

    The types of data mining based on neural network are hundreds, butthere are only two types most used which are the data mining based onthe self-organization neural network and on the fuzzy neural network.

    A. Data Mining Based on Self-Organization Neural Network

    Self-organization process is a process of learning without teachers.Through the study, the important characteristics or some inherentknowledge in a group of data, such as the characteristics of thedistribution or clustering according to certain feature.

    Scholars T. Kohonen of Finland considers that the neighboringmodules in the neural network are similar to the brain neurons and playdifferent rules, through interaction they can be adaptively developed tobe special detector to detect different signal.

    Because the brain neurons in different brain space parts playdifferent rules, so they are sensitive to different input modes. T.Kohonenalso proposed a kind of learning mode

    This makes the input signal be mapped to thelow-dimensional space, and maintain that the input signal with same

    characteristics can be corresponding to regional region in space, which isthe so-called self-organization feature map (S0FM).

  • 8/6/2019 Dm_NN

    7/21

    B. Data Mining Based on Fuzzy Neural NetworkAlthough neural network has strong functions of

    learning, classification, association and memory, but in the use of theneural network for data mining, the greatest difficulty is that the outputresults can not be intuitively illuminated.

    After the introduction of the fuzzy processing function intothe neural network, it can not only increase its output expression capacitybut also the system becomes more stable. The fuzzy neural networksfrequently used in data mining are fuzzy perception model, fuzzy BPnetwork, fuzzy clustering Kohonen network, fuzzy inference network and

    fuzzy ART model.In which the fuzzy BP network is developed from the

    traditional BP network. In the traditional BP network, if the samplesbelonged to the first kcategory, then except the output value of the firstkoutput node is 1, the output value of other output nodes all is 0, that is,the output value of the traditional BP network only can be 0 or 1, is notambiguous. However, in fuzzy BP networks, the expected output value ofthe samples is replaced by the expected membership of the samplescorresponding to various types.

    After training the samples and their expected membership

    corresponding to various types in learning stage fuzzy BP network willhave the ability to reflect the affiliation relation between the input andoutput in training set, and can give the membership of the recognitionpattern in data mining.

    Fuzzy clustering Kohonen networks achieved fuzzy not onlyin output expression, but also introduced the sample membership into theamendment rules of the weight coefficient, which makes the amendmentrules of the weight coefficient has also realized the fuzzy.

    V. KEY TECHNIQUES AND APPROACHES OF IMPLEMENTATION

    A. Effective Combination of Neural Network and Data MiningTechnology

    The technology almost uses the original ANN softwarepackage or transformed from existing ANN development tools, theworkflow of data mining should be understood in depth, the data modeland application interfaces should be described with standardized form,then the two technologies can be effectively integrated and togethercomplete data mining tasks. Therefore, the approach of organicallycombining the ANN and data mining technologies should be found toimprove and optimize the data mining technology.

  • 8/6/2019 Dm_NN

    8/21

    B. Effective Combination of Knowledge Processing and NeuralComputation

    Evaluating whether a data mining implementation algorithm isfine the following indicators and characteristics can be used:(1) Whether high-quality modeling under the circumstances of noise anddata half-baked;

    (2) The model must be understood by users and can be used for decision-making;(3) The model can receive area knowledge (rules enter and extraction) toimprove the modeling quality.

    Existing neural network has high precision in the quality ofmodeling but low in the latter two indicators. Neural network actually canbe seen as a black box for users, the application restrictions makes theclassification and prediction process can not be understood by users anddirectly used for decision-making.

    For data mining, it not enough to depend on the neuralnetwork model providing results because that before important decision-

    making users need to understand the rationale and justification for thedecision-making.

    Therefore, in the ANN data mining knowledge baseshould be established in order to accede domain knowledge and theknowledge ANN learning to the system in the data mining process. That isto say, in the ANN data mining, it is necessary to use knowledge methodto extract knowledge from the data mining process and realize theinosculation of the knowledge processing and neural network.

    In addition, in the system an effective decision andexplanation mechanism should also be considered to be established to

    improve the validity and practicability of the ANN data mining technology.

    C. Input/Output Interface

    Considering that the method of using neural network tools orneural network software package to obtain data is laggard, then goodinterface with relational database, multi-dimensional database and datawarehouse should be established to meet the needs of data mining.

    At present, data mining is a new and importantarea of research, and neural network itself is very suitable for solving theproblems of data mining because its characteristics of good robustness,

    self-organizing adaptive, parallel processing, distributed storage and highdegree of fault tolerance.

    Combination of data mining method and neuralnetwork model can greatly improve the efficiency of data miningmethods, and it has been widely used. It also will receive more and moreattention.

    Data mining tools are used widely to solve real-worldproblems in engineering, science, and business. As the number of datamining software vendors increases, however, it has become morechallenging to assess which of their rapidly-updated tools are mosteffective for a given application.

  • 8/6/2019 Dm_NN

    9/21

    Such judgment is particularly useful for the high-endproducts due to the investment (money and time) required becomingproficient in their use.

    Reviews by objective testers are very useful in theselection process, but most published to date have provided somewhatlimited critiques, and havent uncovered the critical benefits and

    shortcomings which can probably only be discovered after using the toolfor an extended period of time on real data.Here, five of the most highly acclaimed data

    mining tools are so compared on a fraud detection application, withdescriptions of their distinctive strengths and weaknesses, and lessonslearned by the authors during the process of evaluating the products.

    Decision Trees under Neural Networks:

  • 8/6/2019 Dm_NN

    10/21

    Polynomial Networks:

  • 8/6/2019 Dm_NN

    11/21

    Consensus Models(Data Points)

    Contributory Models

    Properties of Algorithms for Data Mining:

  • 8/6/2019 Dm_NN

    12/21

    Case Study Observation:

    Data Mining Tools Can: Enhance inference process

    Speed up design cycleData Mining Tools Can Not:

    Substitute for statistical and domain expertise

    Users are advised to: Get training on tools

    Be alert for product upgrades

  • 8/6/2019 Dm_NN

    13/21

    Case Study:

    Data mining which emerged during the late 1980s has great

    strides during 1990s. In the last few years, almost in every meeting whichhas anything to do with the databases, neural networks, geneticalgorithms, e-commerce, or artificial intelligence has had a theme orsession on data mining and data warehousing.

    Since the data mining is a multidisciplinary field, theresearchers from different disciplines are gradually getting attracted towork in this new frontier of research. Vast amount of operational data areroutinely collected and stored away in the archives of many organizations.

    In the last five years, there has been a tremendous improvement inhardware leading to new computer programs which shift massive amountof operational data, recognize pattern and provide hints to formulatehypothesis for tactical and strategic decision making that can now beexecuted in a reasonable time.

    Some of the popular data mining techniques which are applicable fordatabasesare divided into the

    1. Traditional Techniques (Statistics, Neighborhood and

    Clustering)2. New Generation Techniques

  • 8/6/2019 Dm_NN

    14/21

    (Decision Tree, Neural Network and Association Rule).

    Traditional Techniques:

    The main techniques that are used 99.9% of the time onexisting business problems can be used for mining library databases as

    well. These cover only those techniques that work consistently are equallyuseful for library databases and are understandable and explainable.

    By strict definition a statistics or statistical techniques are notdatamining techniques. They were being used even before the term datamining was coined to apply to business application. However, statisticaltechniques are driven by the data and are used to discover patterns andbuild predictive models. For a data mining problem, one has to solve itwith statistical methods or to use other data mining techniques. Hence, itis important to have some idea of how statistical techniques work andhow they can be applied.

    Difference between statistics and data mining:

    The techniques used in data mining are successful forprecisely the same reasons that statistical techniques are successful. Sowhat is the difference? Why arent we as excited about statistics as weare about data mining?There are several reasons.

    1. The first is that the statistical data mining techniques such as

    CART, Neural Networks and Nearest Neighbor techniques can not be usedby less expert users.

    2.The other reason is that the time is right. Because of the use ofcomputers for closed loop business data storage and generation, therenow exists large quantities of data that is available to users.

    Likewise the fact that computer hardware hasdramatically improved in order of magnitude in storing and processing thedata makes some of the most powerful data mining techniques feasibletoday.

    New Generation Techniques

    The data mining techniques in this section represent the most often usedtechniquesthat have been developed over the last two decades of research in data

    mining.A. Decision Trees:

  • 8/6/2019 Dm_NN

    15/21

    The older decision tree techniques such as CHAID are highlyused but the new techniques such as CART are gaining wider acceptance.A decision tree is a predictive model that, as its name implies, can beviewed as a tree. Specificallyeach branch of the tree is a classification question and the leaves of thetree are

    partitions of the dataset with their classification.Decision tree can be used for exploration analysis, datapreprocessing and prediction work. The process in decision treealgorithms is very similar when they build trees. These algorithms look atall possible distinguishing questions that could possibly break up theoriginal trainingData set into segments that is nearly homogeneous with respect to thedifferent classes being predicted.

    Some decision tree algorithms may use heuristics inorder to pick the questions or even pick them at random. CART picks thequestions in a much unsophisticated way as it tries them all. After it has

    tried them all, CART picks the best one, uses it to split the data into twomore organized segment and then again ask all possible questions oneach of these new segment individually.

    Most decision tree algorithm stop growing the tree when one of the threecriteria are met:

    1. The segment contains only one record. Hence there is no furtherquestion thatyou could ask which could further refine a segment of just one.

    2. All the records in the segment have identical characteristics. There isno reasonto continue asking further question segmentations since all the remainingrecordsare the same.

    3. The improvement is not substantial enough to warrant making the split.

    B. CART:CART stands for Classification and regression trees and is data

    exploration and prediction algorithm developed by Leo Breiman, JeromeFriedman, Richard Olshen and Charles Stone and is nicely detaild in theirbook entitled-Classification and Regression Tree.

    C. CHAID:Another equally popular decision tree technology to CART is

    CHAID or Chi-Square Automatic Interaction Detector.

    D. Neural Networks:

  • 8/6/2019 Dm_NN

    16/21

    To be more precise with the term neural network onemight better speak of an artificial neural network. True neural network arebiological system that detects patterns, make predictions and learn.

    Artificial neural network derive their name from theirhistorical development which started off with the premise that machinescould be made to think if scientists found ways to mimic the structure and

    functioning of the human brain on the computer. A neural network isloosely based on how some people believe that the human brain isorganized and how it learns.

    There are two main structures of consequence in the neuralnetwork.

    1. The node- which loosely corresponds to the neuron in the human brain.

    2. The link- which loosely corresponds to the connections between

    neurons in the human brain.

    In order to make prediction, the neural network accepts thevalues for predictors on what are called the input nodes. These becomethe values for those nodes; those values are multiplied by values that are

    stored in the links.These values are then added together at the node at the far

    right (the output node), a special threshold function is applied and theresulting number is the prediction. In this case the resulting number is001.640.

    Neural network can be used for clustering, outlieranalysis, feature extraction and prediction work. There are literallyhundreds of variations on the back propagation feed forward neuralnetworks.

    There are, however, two other neural network

    architectures that are used often. Kohonen Feature Maps are often usedfor unsupervised learning and clustering and Radial Basis Function

  • 8/6/2019 Dm_NN

    17/21

    Networks are used for supervised learning and in some ways represent ahybrid between nearest neighbor and neural network classification.

    Case Study:Archaeological research has always been data driven. Now-a-days, due tothe rapidDevelopment of computing technology, the use of archaeologicaldatabases is starting to take a major role in archaeological studies.

    Although the amount of archaeological data has increasedsignificantly, efficient methods for analyzing this data are still absent. TheData Mining process may resolve this problem by helping us transformthe data into significant information. This thesis deals with testing thepossibility of implementing Data Mining techniques on archaeologicaldatabases, while focusing on pottery data.

    Data Mining is the science of extracting useful informationfrom large data sets. It was necessary to develop various unique methodsfor creating and preparing the data for this kind of process and analysis.

    Wide data tables with many variables were created. The methods bywhich this was done, and their implementation, added another aspect tothis research.

    The typological issue was also dealt with from variousdifferent angles, while trying to find methods whereby real types wouldbe revealed and identified. The purpose behind the various methods used,was not to accept nor to reject any existing hypotheses, but rather to helpfind certain patterns and relationships within the data, which could proveworthy of further investigation as to their archaeological meaning.The research was done on two case studies. The primary one was basedon 280 whole vessels that were found at the site of Tell es-Safi (the

    biblical Philistine town of Gath).In this case study, different methods for the detailed recording of thepottery were also examined. This data had sampling problems because ofits relatively small size, and lack of spatial and chronological variability.The second case study focused on processing and analyzing data that hadalready been published. Here a large data table of about 8000 potteryshards that were found at the site of Tel Batash (the biblical town ofTimnah), was used.

    Clustering and Nearest Neighbor:Clustering and Nearest Neighbor prediction techniques are

    among the oldest techniques used in data mining. Clustering means therecords are grouped or clustered together.

    Nearest neighbor is a prediction technique that is quitesimilar to clustering in order to predict what a prediction value is in onerecord look for records with similar predictor values in historical databaseand use the prediction value from the record that it is nearest to theunclassified record.

    The nearest neighbor prediction algorithm simplystated is: objects that are near to each other will have similar predictionvalues as well. Thus if you know the prediction value of one of the objectsyou can predict it for its nearest neighbors. Since Clustering and NearestNeighbor prediction technique can be used to dynamically classify the

  • 8/6/2019 Dm_NN

    18/21

    records, hence it gives the solid foundation for the new computerizeclassification scheme to replace the classical/manual classificationscheme like DDC, CC, etc. But before using such techniques forclassification a set of base classes like those of DDC or CC is required anda set of appropriate subject headings have to be identified.One of the improvements that is usually made to the basic nearest

    neighbor algorithm is to take a vote from the nearest neighbors ratherthan relying on the sole nearest neighbor to the unclassified record. Thedistance of the nearest neighbor provides a level of confidence. If theneighbor is very close or an exact match then there is much higherconfidence in the prediction than if the nearest record is great distancefrom the unclassified record.

    The degree of homogeneity amongst thepredictions withinthe nearest neighbors can also be used. If all the nearest neighbors makethe sameprediction then there is much higher confidence in the prediction than if

    half therecords made one prediction and other half made another prediction.

    Clustering is the process of grouping physical orabstract objects into classes of similar objects is called clustering orunsupervised classification. Clustering is the method by which like recordsare group together. Usually this is done to give the end user a high levelview of what is going on in the database. Two of these clustering systemsare the PRIZM system from Claritas Corporation and Micro Vision fromEquifax Corporation.

    Hierarchical and Non-Hierarchical Clustering:There are two main types of clustering techniques, those

    that create a hierarchy of clusters and those which do not. Thehierarchical clustering techniques create a hierarchy of clusters fromsmall to big.The main reason for this is that clustering is an unsupervised learningtechnique.The hierarchy of clusters is usually viewed as a tree where the smallestclusters merge together to create the next highest level of cluster andthose at that level merge together to create the next highest level ofclusters.This hierarchy of clustering is created through the algorithm thatbuilds the clusters. There are two main type of clustering algorithms:

    Agglomerative clustering techniques start with as many clusters asthere arerecords where each cluster contains just one record. The clusters that arenearestto each other are merged together to form the next largest cluster.

    This merging is continued until a hierarchy of clusters is built withjust a single cluster containing all the records at the top of the hierarchy.

    Divisive clustering techniques take the opposite approach from

    agglomerative techniques. These techniques start with all the records in one cluster and then

  • 8/6/2019 Dm_NN

    19/21

    try to split that cluster into smaller pieces and then in turn to try to splitthosesmaller pieces.

    Out of these two techniques, the agglomerative techniquesare the most commonly used for clustering and have more algorithmsdeveloped from them.

    Diagram showing a hierarchy of clusters. Clusters at the lowest level aremerged together to form largest clusters at the next level of thehierarchy.

    Non-Hierarchical Clustering:There are two main non-hierarchical clustering techniques. Both of themare very fast to compute on the database but have some drawbacks.

    1.The first are the single pass methods. They derive their name from thefact that the database must only be passed through once in order tocreate the clusters (i.e., each record is read from the database only once).

    2.The other classes of techniques are called reallocation methods. Theyget their name from the movement or reallocation of records from onecluster to another in order to create better clusters. The reallocationtechniques do use multiple passes through the database but are relativelyfast inComparison to the hierarchical techniques.

    Sample Data Set:

  • 8/6/2019 Dm_NN

    20/21

    Correlation of national health statistics withpediatric cancer survival rates

    Factor DescriptionCorrelationcoefficient

    Gov_spending_pcGovernment annual healthcareexpenditure per capita 0.939

    Total_spending_pc Total annual healthcare expenditureper capita 0.872GDPpc Per capita gross domestic product 0.777GNIpc Per capital gross national income 0.756Phys_per1000 Physicians per thousand population 0.749Nurse_per1000 Nurses per thousand population 0.712HDI Human development index 0.631

    Conclusion:

    For efficiently and effectively doing the library administration

    and extending services, the need of library automation and digital libraryoccur. But simply automating there is not the only solution unless anduntil we are not able to explore the hidden information from the largeamount of database. This can be done by applying the data mining.

    Now we can take a glance on the possibilities opening in the new age ofdata mining

    1. Dynamic Classification:

    For this purpose, clustering, nearest neighbor, neuralnetwork, decision tree data mining techniques are very useful.

    2. Sequence Analysis:

    By using statistical analysis to identify unlinked document thatmembers are likely to read together. It examines the path that usersfollow while searching for information and can help to identify whichdocuments users are likely to use together.

    3. Association Analysis:

    By applying association rule mining techniques andalgorithms like Aprori, Partition, Pincer-Search, Dynamic Item setCounting, FPtree Growth and many more new algorithms for findingassociation rules one can take advantage of these rules in taking strategicdecisions.

    At present, data mining is a new andimportant area of research, and neural network itself is very suitable forsolving the problems of data mining because its characteristics of goodrobustness, self-organizing adaptive, parallel processing, distributed

    storage and high degree of fault tolerance.

  • 8/6/2019 Dm_NN

    21/21

    The combination of data mining method and

    neural network model can greatly improve the efficiency of

    data mining methods, and it has been widely used. It also will

    receive more and more attention.

    Key For Data Mining Efficiency: NeuralNetwork Model