24
GUI implementation for Supervised and Unsupervised SUBDUE System

GUI implementation for Supervised and Unsupervised SUBDUE System

  • View
    230

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GUI implementation for Supervised and Unsupervised SUBDUE System

GUI implementation for Supervised and Unsupervised SUBDUE System

Page 2: GUI implementation for Supervised and Unsupervised SUBDUE System

Introduction

Data Mining is a very vast field, which is rapidly developing today. It is the use of artificial intelligence to find common, interesting, or previously unknown patterns in large databases.

One method for discovering knowledge in structural data is the identification of common substructures (concepts represented as graphs) within the data.

2

Page 3: GUI implementation for Supervised and Unsupervised SUBDUE System

Introduction (Contd..)

The SUBDUE system, developed by Cook and Holder [Cook and Holder, 1999] performs data mining on databases represented as graphs, i.e. it discovers interesting substructures in structural data.

This project deals with the conversion of a textual representation of the data into a graphical visualization of the data The input data given to the program is the outputs provided by the existing SUBDUE.

3

Page 4: GUI implementation for Supervised and Unsupervised SUBDUE System

The Subdue System

The SUBDUE system is a data-mining tool that discovers interesting substructures in structural data

By compressing previously discovered substructures in the data multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data.

4

Page 5: GUI implementation for Supervised and Unsupervised SUBDUE System

Unsupervised Subdue

5

Example of the input graph to the unsupervised version of the subdueand the discovered substructure.

Page 6: GUI implementation for Supervised and Unsupervised SUBDUE System

Unsupervised (Contd..)

The substructure discovery algorithm used by SUBDUE is beam search. A substructure consists of a definition and a set of instances. The substructure definition is a connected set of vertices and edges that define a SUBGRAPH within G. An instance of a substructure is a SUBGRAPH of G that matches, graph theoretically, to the definition. The algorithm for the discovery is given below.

6

Page 7: GUI implementation for Supervised and Unsupervised SUBDUE System

Unsupervised Contd.)

SUBDUE (Graph, BeamWidth, MaxBest, MaxSubSize, Limit)

ParentList={}

ChildList={}

BestList={}

ProcessedSubs=0

Create a substructure from each unique vertex label and its single-vertex instances; insert the resulting substructures in ParentList

While ProcessedSubs < = Limit and ParentList is not empty do

While ParentList is not empty do

Parent = RemoveHead (ParentList)

Extend each instance of Parent in all possible ways

Group the extended instances into Child substructures

For each Child do

If SizeOf (Child) <= MaxSubSize then

Evaluate the Child

7

Page 8: GUI implementation for Supervised and Unsupervised SUBDUE System

Unsupervised Contd.)

8

Insert Child in ChildList in order by value

If Length (ChildList)>BeamWidth then

Destroy the substructure at the end of the ChildList

ProcessedSubs = ProcessedSubs + 1

Insert Parent in BestList in order by value

If Length (BestList) > MaxBest then

Destroy the substructure at the end of the BestList

Switch ParentList and ChildList

return BestList

Page 9: GUI implementation for Supervised and Unsupervised SUBDUE System

Supervised

SUBDUE performs supervised graph based relational concept learning. The SUBDUE concept learner accepts both a positive graph and a negative graph , and evaluates substructures based on their compression of the positive graph and lack of compression on the negative graph.

9

Page 10: GUI implementation for Supervised and Unsupervised SUBDUE System

Supervised Contd.

10

Page 11: GUI implementation for Supervised and Unsupervised SUBDUE System

Supervised (Contd.)

In the figure above each object has a shape and is related to other objects using the binary relation on and shape. The discovered substructure is as shown in the figure. As seen above the SUBDUE discovered substructure is from the positive example and not from the negative example. For this example the best substructure, which gives the maximum compression, is triangle on a square.

11

Page 12: GUI implementation for Supervised and Unsupervised SUBDUE System

Subdue - Chemical Compounds

A DNA sequence can be represented as a very simple linear graph, and higher-level relationships between different parts of a sequence can be mapped to additional edges in graph.

SUBDUE system discovers patterns in the input graph in polynomial time.

SUBDUE system is capable of discovering known patterns in the DNA sequence of yeast,as well as patterns in yeast DNA that are known to be important in other organism, but which have not yet been shown to play a role in yeast.

12

Page 13: GUI implementation for Supervised and Unsupervised SUBDUE System

Subdue - Chemical Compounds

A DNA sequence can be represented as a very simple linear graph, and higher-level relationships between different parts of a sequence can be mapped to additional edges in graph.

SUBDUE system discovers patterns in the input graph in polynomial time.

SUBDUE system is capable of discovering known patterns in the DNA sequence of yeast,as well as patterns in yeast DNA that are known to be important in other organism, but which have not yet been shown to play a role in yeast.

13

Page 14: GUI implementation for Supervised and Unsupervised SUBDUE System

Subdue - Chemical Compounds

The figure shown above is the backbone representation which gives more meaningful graphs then the linear representation.

This representation separated the base names (A, C, T, G) from the vertices representing themselves.

The backbone representation mimics the actual chemical structure of the DNA molecule, in which the DNA bases are connected by deoxyribose sugars to a linear phosphate backbone.

14

Page 15: GUI implementation for Supervised and Unsupervised SUBDUE System

GUI Design

Requirements

The requirements of the GUI are as follows.

File Dialog Boxes should be added for better user access to the input files.

The entire visual representation of the graphs needs to be shown on the screen. Sometimes these representations exceed the length of the screen. To accommodate these large graphs, scrollbars need to be incorporated into the design.

15

Page 16: GUI implementation for Supervised and Unsupervised SUBDUE System

GUI Design

User interfaces must be provided so that the user can interact with the GUI for displaying the results of each and every iteration. So a button called “ Next Iteration”, which activates the display of substructures on the screen, needs to be incorporated.

A Button called the “Compress” button should be provided to the GUI. This button enables the user to see the compressed graph.

16

Page 17: GUI implementation for Supervised and Unsupervised SUBDUE System

GUI Design

For the supervised version of SUBDUE, both negative and positive graphs need to be displayed.

The vertices of the graph should display their labels inside the vertices.

Since directed edges are used, the arrows with appropriate directions should be displayed.

The language to be used for implementation should be portable and be able to run from a browser. It should also have good GUI components. So JDK 1.2 was used to implement the program.

17

Page 18: GUI implementation for Supervised and Unsupervised SUBDUE System

Implemenation

Unsupervised SUBDUE Unsupervised SUBDUE GUI requires two input files. A

position file which determines the graph position and the output file from the SUBDUE which is parsed as per the requirements of the GUI by the conversion program.

The driver class of the applet is the unsupervised class. This class initializes the applet. When user clicks the next iteration button canvas2 class is invoked and best substructures found in that iteration is displayed. These substructures are arranged by their MDL value.

18

Page 19: GUI implementation for Supervised and Unsupervised SUBDUE System

Implemenation

Unsupervised SUBDUE When user clicks the compress button then

canvas1 class is invoked. This class compresses the input graph by replacing the instances of the best substructure of the iteration by single vertices. The compressed graph will be further compressed when the using the results of that iteration.when-compressed button is clicked. The flow diagrams are as shown below.

19

Page 20: GUI implementation for Supervised and Unsupervised SUBDUE System

Implemenation

Supervised GUI Implemenation Supervised SUBDUE GUI requires three input files.Two

position files which determines the positive graph position and the negative graph position and the output file from the SUBDUE which is parsed as per the requirements of the GUI by the conversion program.

The driver class of the applet is the supervised class. This class initializes the applet. When user clicks the next iteration button canvas2 class is invoked and best substructures found in that iteration is displayed. These substructures are arranged by their MDL value.

20

Page 21: GUI implementation for Supervised and Unsupervised SUBDUE System

Implemenation

Supervised GUI Implemenation When user clicks the compress button then

canvas1 class is invoked. This class compresses the input graph by replacing the instances of the best substructure of the iteration by single vertices. The compressed graph will be further compressed when the using the results of that iteration.when-compressed button is clicked. The flow diagrams are as shown below.

21

Page 22: GUI implementation for Supervised and Unsupervised SUBDUE System

Implemenation

Parser for the GUI input. The Conversion program parses the output from

the subdue to make it compatible to the GUI program. This can parse both supervised and unsupervised GUI.The conversion program takes in longer phrases and replaces it by shorter ones It eliminates blank lines and arranges the output in the format required for the GUI input.

Each line starts with a small phrase for E.g. It(iteration number ), C(for compression), v(vertices),e(edges),Val(Value), It(instances).

22

Page 23: GUI implementation for Supervised and Unsupervised SUBDUE System

Implementation

Parser for the GUI input.

The phrase “Best substructures” indicates the starting of the iteration and the phrase “Graph is compressed using best substructure.” Indicates a compressed graph information

23

Page 24: GUI implementation for Supervised and Unsupervised SUBDUE System

Demo Of the Project

24