Download ppt - Tadeusz Łuba Institute of Telecommunications Methods of Logic Synthesis and Their Application in Data Mining Prezentacja wygłaszana na KNU Daegu (Korea),

Tadeusz ŁubaInstitute of Telecommunications

Methods of Logic Synthesis and Their Application in Data Mining

Prezentacja wygłaszana na KNU Daegu (Korea), 25.11.2012

Warsaw University of Technology1

Faculty of Electronics and Information Technology

Logic synthesis vs. Data Mining

2

• applicability of the logic synthesis algorithms in Data Mining

• data mining extends the application of LS:– medicine– pharmacology– banking– linguistics– telecommunication– environmental engineering

Data Mining is the process of automatic discovery of significant and previously

unknown information from large databases

Data mining

is also called knowledge discovery in databases

It is able todiagnose the patient

It is able todiagnose the patient

It is able todecide of granting a loan

to the bank customer

It is able todecide of granting a loan

to the bank customer

It is able toclassify data

It is able toclassify data

It is able tomake a survey

It is able tomake a survey

3

Gaining the knowledge from databases

at the abstract level of data mining algorithms - it means using the procedure of:

– reduction of attributes, – generalization of decision rules, – making a hierarchical decision

These algorithms are similar to those used in logic synthesis!

4

Data mining vs. logic synthesis

5

(rule induction) (logic minimization)

Data mining systems

http://logic.mimuw.edu.pl/~rses/ Rough Set Toolkit for Analysis of Data: Biomedical Centre (BMC), Uppsala, Sweden. http://www.lcb.uu.se/tools/rosetta/

ROSETTA

6

Breast Cancer Database:• Number of instances: 699

training cases• Number of attributes : 10• Classification (2 classes)

Sources: Dr. WIlliam H. Wolberg (physician); University of Wisconsin Hospital ;Madison; Wisconsin; USA

1. Clump Thickness2. Uniformity of Cell Size3. Uniformity of Cell

Shape

….9. Mitoses

7

Diagnosis of breast cancer

Breast Cancer database

ID x1 x2 x3 x4 x5 x6 x7 x8 x9 x101000025 5 1 1 1 2 1 3 1 1 2 1002945 5 4 4 5 7 10 3 2 1 2 1015425 3 1 1 1 2 2 3 1 1 2 1016277 6 8 8 1 3 4 3 7 1 2 1017023 4 1 1 3 2 1 3 1 1 21017122 8 10 10 8 7 10 9 7 1 41018099 1 1 1 1 2 10 3 1 1 21018561 2 1 2 1 2 1 3 1 1 21033078 2 1 1 1 2 1 1 1 5 21033078 4 2 1 1 2 1 2 1 1 21035283 1 1 1 1 1 1 3 1 1 21036172 2 1 1 1 2 1 2 1 1 21041801 5 3 3 3 2 3 4 4 1 41043999 1 1 1 1 2 3 3 1 1 21044572 8 7 5 10 7 9 5 5 4 4

8

RULE_SET breast_cancerRULES 35(x9=1)&(x8=1)&(x2=1)&(x6=1)=>(x10=2) (x9=1)&(x2=1)&(x3=1)&(x6=1)=>(x10=2) (x9=1)&(x8=1)&(x4=1)&(x3=1)=>(x10=2) (x9=1)&(x4=1)&(x6=1)&(x5=2)=>(x10=2) …………………..(x9=1)&(x6=10)&(x1=10)=>(x10=4) (x9=1)&(x6=10)&(x5=4)=>(x10=4) (x9=1)&(x6=10)&(x1=8)=>(x10=4)

REDUCTS (27){ x1, x2, x3, x4, x6 } { x1, x2, x3, x5, x6 } { x2, x3, x4, x6, x7 } { x1, x3, x4, x6, x7 } { x1, x2, x4, x6, x7 } ……………. { x3, x4, x5, x6, x7, x8 } { x3, x4, x6, x7, x8, x9 } { x4, x5, x6, x7, x8, x9 }

Increasing requirements

References: [1]

We are overwhelmed with data!

10

11

UC Irvine Machine Learning Repository

Breast Cancer Database – 10 attr

Audiology Database – 71 attr

Dermatology Database – 34 attr

Are existing methods and algorithms for data mining sufficiently efficient?

Why does it take place? How these algorithms can be improved?

12

Classic method

Discernibility matrix (DM)Discernibility function (DF)

Conjunction of clausesClause is a disjunction of attributes

The key issue is to transform the DF:

CNF DNF

Every monomial corresponds to a reduct

NP- HARD

References: [9]

The method can be significantly improved ...

by using the typical logic synthesis procedure for Boolean function complementation

13

Instead of transforming CNF to DNF

we represent CNF in binary matrix M

BM is treated as Boolean function F

Complement of function F

F is always unate function!

x3x4

x1x2

00 01 11 10

00 1

01 1

11 1 1 1 1

10 1 1

Mf 31xx 42xx 41xx

1001

0011

1100

1011M:

Using Complement Theorem…

14

.i 4

.o 1

.p 411-1 1--11 111-- 11--1 1.end

F = (x1 + x2 + x4) (x3 + x4) (x1 + x2)(x1 + x4) =

fM = x1x2x4 + x3x4 + x1x2 + x1x4

(x1 + x2)(x1 + x4) (x3 + x4) = (x1 + x2x4) (x3 + x4) = x1x3 + x2x4 + x1x4

The same result!

+ x3x4= x1x4 + x1x2

Discernibility function

The key element

15

jxFwhere is called the cofactor of F with respect to variable xj

Fast Complementation Algorithm

Recursive Complementation Theorem:

F x F Fj x j x j

The problem of complementing function F is transformed into the problem of finding the complements of two simpler

cofactors

F = jj xjxj FxFx Fjj xjxj FxFx

Unate Complementation

The entire process reduces to three simple calculations: The choice of the splitting variable Calculation of Cofactors Testing rules for termination

16

Matrix M

Cofactor ……. . ……..

Cofactor

F x F Fj x j x j

F = xj F0 + F1

Merging

Cofactor 1 Cofactor 0

jxjx

Complement Complement

Complement

An Example

F = x4 x6(x1 + x2) (x3 + x5 + x7)(x2 + x3)(x2 + x7)

1000010

0000110

1010100

0000011 7 6 5 4 3 2 1

.i 7

.o 1

.p 611----- 1--1-1-1 1-11---- 1-1----1 1---1--- 1-----1- 1.end

17

An Example…

1000010

0000110

1010100

0000011 7 6 5 4 3 2 1

1000101

1000010

0010010

0000110

C

7 6 5 4 3 2 1

1000100C

1000101C

1000000

0000100

1010100

00000001x

1010100

2x

1000000

0000100

10101001x

0000000

0000100

00101007x

00001007xC

1000000

0000100

1010100

0000001

2x

1000000

0010000

0000100

C

CC

18

x2, x3, x2, x5, x2, x7, x1, x3, x7

Reducts:

{x1,x3,x4,x6,x7}

{x2,x3,x4,x6}

{x2,x4,x5,x6}

{x2,x4,x6,x7}

Verification

(x1 + x2) (x3 + x5 + x7) (x2 + x3) (x2 + x7) =

= (x2 +x1)(x2 + x3)(x2 + x7)(x3 + x5 + x7) =

=(x2 +x1x3x7) (x3 + x5 + x7) =

= x2x3 + x2x5 +x2x7 + x1x3x7

{x2,x3,x4,x6} {x2,x4,x5,x6} {x2,x4,x6,x7} {x4,x6}

{x1,x3,x4,x6,x7}

The same set of reducts!

19

Calculating reducts using the standard method:

20

Boolean function KAZ.type fr.i 21.o 1.p 31100110010110011111101 1111011111011110111100 1001010101000111100000 1001001101100110110001 1100110010011011001101 1100101100100110110011 1001100100111010011011 1001101100011011011001 1110110010011001001101 1100110110011010010011 1110011011011010001100 1010001010000001100111 0100110101011111110100 0111001111011110011000 0101101011100010111100 0110110000001010100000 0110110110111100010111 0110000100011110010001 0001001000101111101101 0100100011111100110110 0100011000110011011110 0110101000110101100001 0110110001101101100111 0010000111001000000001 0001001100101111110000 0100100111111001110010 0000010001110001101101 0101000010100001110000 0101000110101010011111 0101010000001100011001 0011100111110111101111 0.end

All solutions :

With the smallest number of arguments: 35,

With the minimum number of arguments: 5574

Computation time:RSES = 70 min.

Proposed method = 234 ms.

18000 times faster!

01010 110110 100100 101001 101000 111010 110011 001110 010100 011000 011011 010000 000010 001111 000011 011111 000000 001101 000110 0

21

The new method reduces the computation time a couple of orders of magnitude

Conclusion

How this acceleration will affect the speed of calculations for typical databases?

Experimental results

databaseattr

.inst.

RSES/ROSETTA

compl. method

reductscompl. method (least)

reducts

house 17 232 1s 187ms 4 171ms 1 (8 attr)breast-cancer

-wisconsin10 699 2s 823ms 27 826ms 24 (5 attr)

KAZ 22 31 70min 234ms 5574 15ms 35 (5 attr)

trains 33 10out of

memory(5h 38min)

6ms 689 1ms 1 (1 attr)

agaricus-lepiota-mushroom

23 8124 29min 4m 47s 507 4m 51s 3 (4 attr)

urology 36 500out of

memory(12h)

42s 741ms

23437 2s 499ms 1 (2 attr)

audiology 71 200out of

memory(1h 17min)

14s 508ms

37367 920ms 1 (1attr)

dermatology 35 366out of

memory(3h 27min)

3m 32s 143093 1s 474ms 27 (6 attr)

lung-cancer 57 32out of

memory(5h 20min)

111h 57m

3604887 486ms 613 (4 attr)

22

The absolute triumphof the complementation method!

Further possibilities…

of application of logic synthesis methods in issues of Data Mining

23

24

RSES vs Espresso

RSES.i 7.o 1.type fr.p 91000101 01011110 01101110 01110111 00100101 11000110 11010000 11010110 11110101 1.e

ESPRESSO

6274 xxxx f

65427214 xxxx xxxx f

TABLE extlbisATTRIBUTES 8 x1 numeric 0 x2 numeric 0 x3 numeric 0 x4 numeric 0 x5 numeric 0 x6 numeric 0 x7 numeric 0 x8 numeric 0 OBJECTS 91 0 0 0 1 0 1 01 0 1 1 1 1 0 01 1 0 1 1 1 0 01 1 1 0 1 1 1 00 1 0 0 1 0 1 11 0 0 0 1 1 0 11 0 1 0 0 0 0 11 0 1 0 1 1 0 11 1 1 0 1 0 1 1

(x1=1)&(x5=1)&(x6=1)&(x2=1)=>(x8=0)

(x1=1)&(x2=0)&(x5=1)&(x3=0)&(x4=0)&(x6=0)=>(x8=0)

(x4=0)&(x1=1)&(x2=0)&(x7=0)=>(x8=1)

(x2=1)&(x4=0)&(x5=1)&(x6=0)=>(x8=1)

Hierarchical decision making

B A

DT (G)

DT (H)

final decision

attributes

decisiontable

intermediatedecision

decision

attributes

25

F = H(A,G(B))

G P(B)

P(A) G PD

Is it possible to use the decomposition to solve difficult tasks of data mining?

Decomposition

Data compression

26

democrat n y y n y y n n n n n n y y y y republican n y n y y y n n n n n y y y n ydemocrat y y y n n n y y y n y n n n y ydemocrat y y y n n n y y y n n n n n y ydemocrat y n y n n n y y y y n n n n y ydemocrat y n y n n n y y y n y n n n y ydemocrat y y y n n n y y y n y n n n y yrepublican y n n y y n y y y n n y y y n y

………………………..........

democrat y y y n n n y y y n y n n n y yrepublican y y n y y y n n n n n n y y n yrepublican n y n y y y n n n y n y y y n ndemocrat y n y n n n y y y y y n y n y ydemocrat y n y n n n y y y n n n n n n ydemocrat y n y n n n y y y n n n n n y y

G

H

Decomposition

68% space reduction

The data set HOUSE (1984 United States Congressional Voting Records Database)

Decomposition

27

Summary

• Typical logic synthesis algorithms and methods are

effectively applicable to seemingly different modern

problems of data mining

• Also, it is important to study theoretical foundations of new

concepts in data mining e.g. functional decomposition

• Solving these challenges requires the cooperation of

specialists from different fields of knowledge

References1. Abdullah, S., Golafshan, L., Mohd Zakree Ahmad Nazri: Re-heat simulated annealing algorithm for rough set

attribute reduction. International Journal of the Physical Sciences 6(8), 2083–2089 (2011)2. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Advances in

KDD, pp. 307–328. AAAI,Menlo Park (1996)3. An, A., Shan, N., Chan, C., Cercone, N. and Ziarko, W. Discovering rules for water demand prediction: an enhanced

rough-set approach, Engineering Application and Articial Intelligence, 9, 645-653, 1996.4. Bazan, J., Nguyen, H.S., Nguyen, S.H., Synak, P., Wróblewski, J.: Rough set algorithms in classification problem. In:

Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems, vol. 56, pp. 49–88. Physica-Verlag, Heidelberg (2000)

5. Bazan, J., Skowron, A., Synak, P.: Dynamic Reducts as a Tool for Extracting Laws from Decision Tables. In: Ra´s, Z.W., Zemankova, M. (eds.) ISMIS 1994. LNCS (LNAI), vol. 869, pp. 346–355. Springer, Heidelberg (1994)

6. Bazan, J.G., Szczuka, M.S.: RSES and RSESlib - A Collection of Tools for Rough Set Computations. In: Rough Sets and Current Trends in Computing, pp. 106–113 (2000)

7. Bazan, J.G., Nguyen, H.S., Nguyen, S.H., Synak, P. and Wroblewski, J. Rough set algorithms in classi¯cation problem, in: Polkowski, L., Tsumoto, S. and Lin, T.Y. (Eds.), Rough Set Methods and Applications, 49-88, 2000. 32 Information Sciences, 178(17), 3356-3373, Elsevier B.V., 2008.

8. Beynon, M. Reducts within the variable precision rough sets model: a further investigation, European Journal of Operational Research, 134, 592-605, 2001.

9. Borowik, G., Łuba, T., Zydek, D.: Features Reduction using logic minimization techniques. In: Intl. Journal of Electronics and Telecommunications, vol. 58, No.1, pp. 71-76, (2012)

10. Brayton, R.K., Hachtel, G.D., McMullen, C.T., Sangiovanni-Vincentelli, A.: Logic Minimization Algorithms for VLSI Synthesis. Kluwer Academic Publishers (1984)

11. Brzozowski, J.A., Łuba, T.: Decomposition of boolean functions specified by cubes. Journal of Multi-Valued Logic & Soft Computing 9, 377–417 (2003)

12. Dash, R., Dash, R., Mishra, D.: A hybridized rough-PCA approach of attribute reduction for high dimensional data set. European Journal of Scientific Research 44(1), 29–38 (2010)

13. Feixiang, Z., Yingjun, Z., Li, Z.: An efficient attribute reduction in decision information systems. In: International Conference on Computer Science and Software Engineering. pp. 466–469. Wuhan, Hubei (2008), DOI: 10.1109/CSSE.2008.1090

14. Grzenda, M.: Prediction-Oriented Dimensionality Reducition of Industrial Data Sets. in: Modern Approaches in Applied Inteligence, Mehrotra, K.G.; Mohan, C.K.; Oh, J.C.; Varshney, P.K.; Ali, M. (Eds.), LNAI 6703, 232-241 (2011)

15. Hedar, A.R., Wang, J., Fukushima, M.: Tabu search for attribute reduction in rough set theory. Journal of Soft Computing – A Fusion of Foundations, Methodologies and Applications 12(9), 909–918 (Apr 2008), DOI: 10.1007/s00500-007-0260-1

16. Herbert, J.P. and Yao, J.T. Rough set model selection for practical decision making, Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery, 203-207, 2007.

17. Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H.: TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal 42(2), 100–111 (1999)

18. Inuiguch, M. Several approaches to attribute reduction in variable precision rough set model, Modeling Decisions for Arti¯cial Intelligence, 215-226, 2005.

19. Jelonek, J., Krawiec, K., Stefanowski, J.: Comparative study of feature subset selection techniques for machine learning tasks. In: Proceedings of IIS, Malbork, Poland, pp. 68–77 (1998)

20. Jensen R., Shen Q. Semantics-preserving dimensionality reduction: Rough and fuzzy rough-based approaches. IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 1457–1471, (2004)

21. Jing, S., She, K.: Heterogeneous attribute reduction in noisy system based on a generalized neighborhood rough sets model. World Academy of Science, Engineering and Technology 75, 1067–1072 (2011)

22. Kalyani, P., Karnan, M.: A new implementation of attribute reduction using quick relative reduct algorithm. International Journal of Internet Computing 1(1), 99–102 (2011)

23. Katzberg, J.D. and Ziarko, W. Variable precision rough sets with asymmetric bounds, in: W. Ziarko (Ed.) Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer, London, 167-177, 1994.

24. Kryszkiewicz, M., Cicho´n, K.: Towards scalable algorithms for discovering rough set reducts. In: Peters, J., Skowron, A., Grzyma la-Busse, J., Kostek, B., Świniarski, R., Szczuka, M. (eds.) Transactions on Rough Sets I, Lecture Notes in Computer Science, vol. 3100, pp. 120–143. Springer Berlin / Heidelberg, Berlin (2004), DOI: 10.1007/978-3-540-27794-1 5

40. Pawlak, Z. and Skowron, A. Rudiments of rough sets, Information Sciences, 177, 3-27, 2007.41. Pei, X., Wang, Y.: An approximate approach to attribute reduction. International Journal of Information

Technology 12(4), 128–135 (2006)42. Rawski, M., Borowik, G., Łuba, T., Tomaszewicz, P., Falkowski, B.J.: Logic synthesis strategy for FPGAs

with embedded memory blocks. Electrical Review 86(11a), 94–101 (2010)43. Shan, N., Ziarko, W., Hamilton, H.J., Cercone, N.: Discovering Classification Knowledge in Databases Using

Rough Sets. In: Proceedings of KDD, pp. 271–274 (1996)44. Skowron, A.: Boolean Reasoning for Decision Rules Generation. In: Komorowski, J., Ra´s, Z.W. (eds.)

ISMIS 1993. LNCS, vol. 689, pp. 295–305. Springer, Heidelberg (1993)45. Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński,

R. (ed.) Intelligent Decision Support – Handbook of Application and Advances of the Rough Sets Theory. Kluwer Academic Publishers (1992)

46. Slezak, D.: Approximate Reducts in Decision Tables. In: Proceedings of IPMU, Granada, Spain, vol. 3, pp. 1159–1164 (1996)

47. Slezak, D.: Searching for Frequential Reducts in Decision Tables with Uncertain Objects. In: Polkowski, L., Skowron, A. (eds.) RSCTC 1998. LNCS, vol. 1424, pp. 52–59. Springer, Heidelberg (1998)

48. Slezak, D.: Association Reducts: Complexity and Heuristics. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., S_lowi´nski, R. (eds.) RSCTC 2006. LNCS, vol. 4259, pp. 157–164. Springer, Heidelberg (2006)

49. Slezak, D. and Ziarko, W. Attribute reduction in the Bayesian version of variable precision rough set model, Electronic Notes in Theoretical Computer Science, 82, 263-273, 2003.

50. Słowinski, R. (ed.): Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, vol. 11. Kluwer Academic Publishers, Dordrecht (1992)

51. Słowiński, K., Sharif, E.: Rough Sets Analysis of Experience in Surgical Practice. International Workshop: Rough Sets: State of The Art and Perspectives, Poznan-Kiekrz (1992)

52. Stepaniuk, J.: Approximation Spaces, Reducts and Representatives. In: Rough Sets in Data Mining and Knowledge Discovery. Springer, Berlin (1998)

53. Swiniarski, R.W. Rough sets methods in feature reduction and classi¯cation, International Journal of Applied Mathematics and Computer Science, 11, 565- 582, 2001.

54. Swiniarski, R.W. and Skowron, A. Rough set methods in feature selection and recognition, Pattern Recognition Letters, 24, 833-849, 2003

55. Wang, C., Ou, F.: An attribute reduction algorithm based on conditional entropy and frequency of attributes. In: Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation. ICICTA ’08, vol. 1, pp. 752–756. IEEE Computer Society, Washington, DC, USA (2008), DOI: 10.1109/ICICTA.2008.95

56. Wang, G., Yu, H. and Yang, D. Decision table reduction based on conditional information entropy, Chinese Journal of Computers, 25, 759-766, 2002.

57. Wang, G.Y., Zhao, J. and Wu, J. A comparitive study of algebra viewpoint and information viewpoint in attribute reduction, Foundamenta Informaticae, 68, 1-13, 2005.

58. Wróblewski, J.: Finding Minimal Reducts Using Genetic Algorithms. In: Proceedings of JCIS,Wrightsville Beach, NC, September/October 1995, pp. 186–189 (1995)

59. Wu, W.Z., Zhang, M., Li, H.Z. and Mi, J.S. Knowledge reduction in random information systems via Dempster-Shafer theory of evidence, Information Sciences, 174, 143-164, 2005.

60. Yao, Y., Zhao, Y.: Attribute reduction in decision-theoretic rough set models. Information Sciences 178(17), 3356–3373 (2008), DOI: 10.1016/j.ins.2008.05.010

61. Zhang, W.X., Mi, J.S. and Wu, W.Z. Knowledge reduction in inconsistent information systems, Chinese Journal of Computers, 1, 12-18, 2003.

62. Zhao, Y., Luo, F., Wong, S.K.M. and Yao, Y.Y. A general definition of an attribute reduction, Proceedings of the Second Rough Sets and Knowledge Technology, 101-108, 2007.

63. ROSE2 – Rough Sets Data Explorer, http://idss.cs.put.poznan.pl/site/ rose.html64. ROSETTA – A Rough Set Toolkit for Analysis of Data, http://www.lcb.uu.se/ tools/rosetta/65. RSES – Rough Set Exploration System, http://logic.mimuw.edu.pl/~rses/

1. ROSETTA – A Rough Set Toolkit for Analysis of Data, http://www.lcb.uu.se/ tools/rosetta/

2. RSES – Rough Set Exploration System, http://logic.mimuw.edu.pl/~rses/

3. Borowik, G., Łuba, T., Zydek, D.: Features Reduction using logic minimization techniques. In: Intl. Journal of Electronics and Telecommunications, vol. 58, No.1, pp. 71-76, (2012)

4. Skowron, A., Rauszer, C.: The discernibility matrices and functions in information systems. In: Słowiński, R. (ed.) Intelligent Decision Support – Handbook of Application and Advances of the Rough Sets Theory. Kluwer Academic Publishers (1992)

5. Tadeusiewicz R.: Rola technologii cyfrowych w komunikacji społecznej oraz w kulturze i edukacji. PPT presentation.

http://idss.cs.put.poznan.pl/site/

http://www.lcb.uu.se/

http://logic.mimuw.edu.pl/~rses/

http://www.lcb.uu.se/