25
Project title SCALABLE CONSTRAINED SPECTRAL CLUSTERING

Scalable constrained spectral clustering

Embed Size (px)

Citation preview

Page 1: Scalable constrained spectral clustering

Project titleSCALABLE CONSTRAINED SPECTRAL CLUSTERING

Page 2: Scalable constrained spectral clustering

ABSTRACT:

Constrained spectral clustering (CSC) algorithms have shown great promise in significantly improving clustering accuracy by encoding side information into spectral clustering algorithms. However, existing CSC algorithms are inefficient in handling moderate and large datasets. In this paper, we aim to develop a scalable and efficient CSC algorithm by integrating sparse coding based graph construction into a framework called constrained normalized cuts.

Page 3: Scalable constrained spectral clustering

EXISTING SYSTEM:

Data in a wide variety of areas tend to large scales. For many traditional learning based data mining algorithms, it is a big challenge to efficiently mine knowledge from the fast increasing data such as information streams, images and even videos.

To over-come the challenge, it is important to develop scalable learning algorithms.

Page 4: Scalable constrained spectral clustering

DISADVANTAGES OF EXISTING SYSTEM:

Straightforward integration of the constrained normalized cuts and the sparse coding based graph construction and the formulated scalable constrained normalized-cuts problem.

Page 5: Scalable constrained spectral clustering

PROPOSED SYSTEM:

In this project, we develop an efficient and scalable CSC algorithm that can well handle moderate and large datasets. The SCACS algorithm can be understood as a scalable version of the well-designed but less efficient algorithm known as Flexible Con-strained Spectral Clustering (FCSC).

To our best knowledge, our algorithm is the first efficient and scalable version in this area, which is derived by an integration of two recent studies, the constrained normalized cuts and the graph construction method based on sparse coding.

Page 6: Scalable constrained spectral clustering

ADVANTAGES OF PROPOSED SYSTEM:

We randomly sample sclabelled instances from a given input dataset, and then obtain based on the rules of The clustering accuracy is evaluated by the best matching rate(ACC).

Let h be the resulting label vector obtained from a clustering algorithm. Let g be the ground truth label vector.

Page 7: Scalable constrained spectral clustering

HARDWARE REQUIREMENTS:

System : Pentium IV 2.4 GHz.

Hard Disk : 40 GB.Monitor : 15 VGA Colour.Mouse : Logitech.Ram : 512 Mb.

Page 8: Scalable constrained spectral clustering

SOFTWARE REQUIREMENTS:

Operating system : Windows XP/7.Coding Language : JAVA/J2EEIDE : EclipseDatabase : MYSQL

Page 9: Scalable constrained spectral clustering

Modules with their explanation

A. Text anomaly detectionB. Link anomaly detectionC. Decision factor

Page 10: Scalable constrained spectral clustering

A. Text anomaly detectionDataset of social networking site like Facebook,

tweeter is given to module of text anomaly detection. Content preprocessing is next step which consists of many other processes as follows:

Word extraction: Words are extracted from text shared by user over social networking site.

Stemming: Variant forms of a word are reduced to a common form. Stemming is the process of retrieving root or stem of word.

Weight assignment to word: Whatever words extracted from previous steps are assigned weight to them depending on prediction made from word.

Frequency of words: how many times particular words appear in a given time period is calculated.

Page 11: Scalable constrained spectral clustering

B. Link anomaly detection

Dataset of social networking site is also given to link anomaly detection module. A step performed in this module is as follows:

a)Clustering of vertices having same features: We can do clustering of vertices depending on same communication behavior and build profile for each cluster. Individual vertex profiles are also built depending on the communication behavior of a vertex.

Page 12: Scalable constrained spectral clustering

C. Decision factor: Result obtained from link anomaly module

and text anomaly module is compared in decision factor and final anomaly is predicted.

Page 13: Scalable constrained spectral clustering

SDLCSpiral Model designThe spiral model has four phases. A software project repeatedly

passes through these phases in iterations called Spirals. Identification This phase starts with gathering the business requirements in the

baseline spiral. In the subsequent spirals as the product matures, identification of system requirements, subsystem requirements and unit requirements are all done in this phase.

This also includes understanding the system requirements by continuous communication between the customer and the system analyst. At the end of the spiral the product is deployed in the identified market.

Following is a diagrammatic representation of spiral model listing the activities in each phase.

Page 14: Scalable constrained spectral clustering

Design :Design phase starts with the conceptual design in the baseline spiral and involves architectural design, logical design of modules, physical product design and final design in the subsequent spirals.

  Construct or Build  

Construct phase refers to production of the actual software product at every spiral. In the baseline spiral when the product is just thought of and the design is being developed a POC (Proof of Concept) is developed in this phase to get customer feedback. Then in the subsequent spirals with higher clarity on requirements and design details a working model of the software called build is produced with a version number. These builds are sent to customer for feedback.

Page 15: Scalable constrained spectral clustering

Evaluation and Risk Analysis  Risk Analysis includes identifying, estimating, and monitoring

technical feasibility and management risks, such as schedule slippage and cost overrun. After testing the build, at the end of first iteration, the customer evaluates the software and provides feedback.

Based on the customer evaluation, software development process enters into the next iteration and subsequently follows the linear approach to implement the feedback suggested by the customer. The process of iterations along the spiral continues throughout the life of the software.

Page 16: Scalable constrained spectral clustering

Use Case Diagram

Page 17: Scalable constrained spectral clustering

Active Diagram

Page 18: Scalable constrained spectral clustering

Data Flow Diagram

Key Based Detection

Twitter Trends

Location Based Detection

Retweet CountLocationSearch Word

Level 0:

Level 1:

Page 19: Scalable constrained spectral clustering

Data Flow Diagram

Key Based Detection

Spectral Clustering

Location Based Detection

Retweet Count

Location Based Detection

Spectral Clustering

Rewet Count

Aggregation

Search Word

Key Based DetectionLevel 2:

Page 20: Scalable constrained spectral clustering

ER Diagram

USER

father_name

weight

dob

birth_state

age

Industry

gendername

mother_name

current_city siblingsFILM dor

film_name

c_id

act

search

search search

LOCATION

location

texthash_tag

name

retweet count

id

created at HASH TAG

location

texthash_tag

name

retweet count

id

created atRETWEET

location

texthash_tag

name

retweet count

id

created at

1 N

N

NN

NNN

Page 21: Scalable constrained spectral clustering
Page 22: Scalable constrained spectral clustering
Page 23: Scalable constrained spectral clustering
Page 24: Scalable constrained spectral clustering

Conclusion:

We have developed a new k-way scalable constrained spectral clustering algorithm based on a closed-form integration of the constrained normalized cuts and the sparse coding based graph construction.

with less side information, our algorithm can obtain significant improvements in accuracy compared to the unsupervised baseline

with less computational time, our algorithm can obtain high clustering accuracies close to those of the state-of-the-art

Page 25: Scalable constrained spectral clustering

Thank You