[IEEE 2006 International Conference - Modern Problems of Radio Engineering, Telecommunications, and Computer Science - Lviv, Ukraine (2006.02.28-2006.03.4)] 2006 International Conference

379

Intellectual Analysis of Data at Development of theExpert Systems

Valeriy Dubrovin, Larisa Deynega

Abstract - Is offered using the intellectual analysis of data formaking the initial base of precedents for expert system. Thepossibilities of this technology considered. The categorizationexisting systems of the intellectual analysis of data is brolight, areenumerated their value and defect.Keywords - expert system, base of precedents, data mining

I. INTRODUCTIONOne of the perspective directions of the development of theartificial intelligence are an expert systems (ES). ES inmodem understanding - an precedent ES. Such systems inroot differ from their own predecessors that that theirknowledgebase (KB) form not only logical rules, but also thebase of precedents (BP). The Precedent - a description of theproblem or situations with detailed instruction action in theaggregate, undertaken in given situations for decision givenproblems [1].

Appears the problem of the creation initial BP for testingcreated ES. On any enterprise for work years are formedgreater amounts of data. In they contain all main forming BP.Use the staff analyst for extraction of the knowledge fromdatabase (BD) dearly and economic inefficient. To send themuch of the analytical work to modem technology, necessaryto use technology, which automatically extracted from datanew nontrivial knowledge, guaranteeing herewith theirstatistical value. In given work are considered possibility ofthe intellectual analysis of data (IAD) as facility of thecreation test BP for ES and is enumerated packages, realizingmethods IAD.

II. INTELLECTUAL ANALYSIS OF DATA

IAD (or data mining) - a collection of the mathematicalmodels, the numerical methods, software programs andinformation technology, providing finding in empirical dataavailable information to interpretation and syntheses earlierunknown, nontrivial and practically useful for achievementdetermined integer of the knowledge on base of thisinformation.A big amount of the companies are offered facilities in the

field of intellectual analysis of data that expects undertakingthe following work:

- undertaking the study of the dug statistics;- revealing the regularities;- making the models of data;- verification and approbation of the models of data;- introduction to models in practical person [2].

Valery Dubrovin - Zaporizhzhya National Technical University,Jukovsky Str., 64, Zaporijjya, 69063, UKRAINE, E-mail:vxdubrov!wzntu.cdu.ua

Larisa Deynega - Zaporizhzhya National Technical University,Jukovsky Str., 64, Zaporijjya, 69063, UKRAINE, E-mail:larisa(ai,zntu.eduma

Technologies data mining provide the study empirical dataand discovery in them hidden regularities different type,enumerated below.The Association (the identification). If a certain fact-I is

part of certain events then other fact-2, connected with thefirst, will be a part of the same event too with accountingprobability. For instance if fault -1 has occurred in technicalsystem then she will cause fault-2 with probability 70%.The Sequence (the forecasting). If certain event -I has

occurred then other event-2, connected with the first, willoccur with accounting probability through determined lengthof time. The Consequent pattems similar association with thatonly difference that the events links, disembodied at time.

Classification. On the basis of information about propertiesof object the defined discrete value of index whichclassification is conducted on is appropriated it (identifier). Bythe algorithms of classification it is possible to classify objectsby beforehand known recommendations.

Clasterisation. The most similar on the signs objects areunited in groups (clusters) so that strongly different ot eachother objects appear in different clusters most. Clasterisationis similar classification, but unlike the last classes (clusters) ofobjects beforehand not known, but formed in the process ofClasterisation. Quality of process of Clasterisation isdetermined likeness of objects into a class and degree ofdifference of different classes between itself.

Prognostication. The past actual values of sizes are usedfor prognostication of future values of those or other sizes onthe basis of knowledge of dependences between them andstatistics. Regression - one of two methods ofprognostication. This method uses the present actual values ofsizes for prognostication of future on the basis of trends andpresent statistics. The second method of prognostication istemporal rows. They predict values of variable, depending ontime values.

III. SYSTEMS OF INTELLECTUAL ANALYSIS OFDATA

Basis of the systems of data mining is an exposure ofdifferent regularity in data. Classification over of suchsystems is below brought on applied in them methods [2,3].

1. Statistical packages.Although the last versions of almost all known statistical

packages include along with traditional statistical methodsalso the elements of data mining, basic attention in them isspared however to the classic methods to the correlation,regressive, factor analysis and other. A lack of the systems ofthis class is a requirement to the special preparation of user.As examples of the most powerful and widespread statisticalpackages it is possible to name SAS (company SAS Institute),SPSS (SPSS), STATGRAPICS, STATISTICA, STADIA etal.

2. Neuron networks.

TCSET'2006, February 28-March 4, 2006, Lviv-Slavsko, Ukraine

380

It is a large class of the systems in which architecture triesto imitate the construction of nervous fabric from neurons. Inone of the most widespread architectures. multi-layeredperseptrone with reverse distribution of error, work of neuronsis emulated in composition a hieratical network, where everyneuron of more high level is connected by the entrances withthe returns of neurons of lower layer. On the neurons oflowermost layer the values of parameters of entries, on thebasis of which it is needed to make some decisions, forecastdevelopment of situation and etc. The basic lack of neuronsnetworks paradigm is a necessity to have a very large teachingsample size, are given. Other substantial failing consists intom, that the even coached neuron network is a black box.Examples of the neuron network systems - BrainMaker(CSS), NeuroShell (Ward Systems Group), OWL(HyperLogic).

3. Systems of reasonings on the basis of similar cases.In order to do a prognosis on the future or to choose a

correct decision, these systems find in the past the nearanalogues of available situation and choose a tot answerwhich was for them correct. Therefore this method is yetnamed the method of nearest neighbour. A result of thismethod is not creation of certain rule, formulation ofdependence and etc. It decides the task of prognosis, insteadof findings of dependence. It is simple, can be realized veryeffectively, but requires for work large memory, because inthe process of finding of dependent variable value for a newrecord all existent database is used [4]. Examples of thesystems - KATE tools (Acknosoft, France), PatternRecognition Workbench (Unica, the USA).

4. Trees of decisions.The trees of decision create the hierarchical structure of

classifying rules of type "IF... THEN...", having theappearance of tree. In order to decide, to what class to deliversome object or situation, it is required to answer questions,placed in the sites of this tree, since its root. Questions are ofthe form of "value of parameter A anymore x?". If an answeris positive, passing is carried out to the right site of next level,if negative - to the left site; then a question, related to theproper site, follows again. Dignity of approach is itsevidentness. But very sharply the problem of meaningfulnesscosts for the trees of decisions. The most widespread systemsare SeeS5/5.0 (RuleQuest, Australia), Clementine (IntegralSolutions, Great Britain), SIPINA (University of Lyon,France), IDIS (Information Discovery, USA).

5. Evolutional programming.The example of the system, taking this approach, is the

system of PolyAnalyst. ln it hypotheses about the type ofdependence of having a special purpose variable on othervariables are formulated as programs in some internalprogramming language. The process of construction of theprograms is built as an evolution in the world of the programs.When the system finds the program, sufficiently the exactlyexpressing sought after dependence, it begins to bring in itsmall modifications and takes away among the affiliatedprograms of built thus those which promote exactness ofcalculations [4].

6. Algorithms of the limited surplus.

These algorithms are calculated by frequencies ofcombinations of simple logical events in the sub-groups ofinformation. Examples of simple logical events: X = and; X <and; X > and; and < X < b and other, where X - someparameter, "a" and "b" - constants. On the basis of analysisof the calculated frequencies concluded about an utility one oranother combination for establishment of association in data,for classification, prognostication and etc. The brightestmodem representative of this approach is the system ofWizWhy.

IV. CONCLUSIONSAmong the systems of data mining it is possible to

distinguish two independent vast classes: 1) neuron networksalgorithms; 2) algorithms of search of logical rules of IF ...

THEN...Neuron networks algorithms only with the very large

stretching it is possible to attribute to the area "finding theknowledges". They remain "black box" and not suitable fordevelopment of expert system in which one of most importantis a function of explaining accepted decisions.By the logical rules of IF ... THEN... decide tasks of

prognostication, classification, recognition of patterns,segmentation of DB, extractions from data "hidden"knowledges, interpretation of data, establishment ofassociations in DB and other. Logical methods work in theconditions of heterogeneous information. Their results areeffective and interpreted easily.One of problems of the known logical methods of finding

out conformities to the law consists of tom, that they do notsupport the function of generalization of the found rules andfunction of search of optimum composition of such rules [5].At the same time, the indicated functions are very substantialfor the construction of KB and BP, requiring ability to enterconcepts, metaconcepts and semantic relations on the basis ofgreat number of fragments of knowledges about a subjectdomain. Consequently, development of the new systems,realizing these functions and taking known approaches andalgorithms of data mining in a class of rules IF... THEN... isactual. The use of such systems with composition of ES willenable to create KB and BP for it, not attracting the large stateof experts for this purpose.

REFERENCES[I] J1.P.4epHAXOBCKaR, H.O.HHKyJIHHa, T.A.XamKOB,

H.1.-De,opoBa, P.B.BOAoHbIHOB. Pa3pa6oTKaJaHHaMHIeCKOH MoaeTiH npouecca ynpanBjeHHM Bnpo6neMHIbx CHTyauHxX Ha OCHOBe 6a3bl 3HaHHAnpeL\e)eHTOB. /I YnpaJneffie B CJIO)KHb6X CHCTeMax, 1999,N° 2, C.207-212.

[2] B. pJIOK, A. CaMoIUTeHKO. Data Mining: yMe6Hblii KYPC. -

CTH6:F1HTep, 2001. - 368 c.[3] M. KHceneB, E. COJ1OMaTHH CpeaCTBa ,o6biqH 3HaHHri B

6H3Hece H 4MHaHcax. 1 OTpblTbIe CHCTeMbi, X2 4, 1997.C.41-44.

[4] C. ApceHbeB 143BJeqeHiie 3HaHiH 113 MeAH1Au4HCKHx 6a3aaHHbIX.//http://infovisor.ivanovo.ruirus/press/'paperO5.htm

[5] B. J1.IoK Data Mining - cocTosiHHe npo6nieMbl, HOBb1epeuieHHAi. //http://www.inftech.webservis.ru/

TCSET'2006, February 28-March 4, 2006, Lviv-Slavsko, Ukraine

Documents

[IEEE 2006 International Conference - Modern Problems of Radio Engineering, Telecommunications, and Computer Science - Lviv, Ukraine (2006.02.28-2006.03.4)] 2006 International Conference