k-meanscse802/clusteringSlides.pdfk-means Network Intrusion data set ( > 4 million data points)...

k-means

• Gaussian mixture model

• Maximize the likelihood

:Centers

,...c, cc

,...,x,xx

k-means

Minimize

Sum of squared errors (SSE) criterion (k clusters and n samples)

2 jiji cxcxP

j Cx ji

ji cx1

k-means

k-means works perfectly when clusters are “linearly separable” and spherical

k-means

k-means works perfectly when clusters are “linearly separable” and spherical

k-means

SSE criterion doesn’t always work

k-means

What about data which contain arbitrarily shaped clusters of different densities?

The Kernel Trick Revisited

Map points to “feature space” using basis function

Replace dot product (for similarity computation between points x and y) with kernel entry

)().( yx

),( yxK

Mercer’s condition: To expand Kernel function K(x,y) into a dot product, i.e. K(x,y)= (x) (y), K(x, y) has to be positive semi-definite function, i.e., for any function f(x) whose is finite, the following inequality holds ( ) ( , ) ( ) 0dxdyf x K x y f y

Kernel k-means

Minimize sum of squared error:

ij jiu cx

mink-means:

}1,0{iju 11

Kernel k-means

Minimize sum of squared error:

ij jiu cx

)(xReplace with

ij jiu cx

2~)(min

k-means:

}1,0{iju 11

Kernel k-means

Cluster centers:

Substitute for centers:

Kernel k-means

• Use kernel trick:

• Optimization problem:

• K is the n x n kernel matrix, U is the optimal normalized cluster membership matrix

)'()(1 1

2~)( UKUtraceKtraceji

)'(max)'()(min UKUtraceUKUtraceKtrace

Example

k-means clusters

Example

),2,(),(

)'(),( kernel Polynomial

xzxxzxz

xxxxxx

Example

),2,(),(

xzxxzxz

xxxxxx

Example

),2,(),(

xzxxzxz

xxxxxx

k-means Vs. Kernel k-means

k-means Kernel k-means 2k

Performance of Kernel k-means

Evaluation of the performance of clustering algorithms in kernel-induced feature space, Pattern Recognition, 2005

Limitations of Kernel k-means

• More complex than k-means

• Need to compute and store n x n kernel matrix

• Appropriate kernel function has to be determined

• Largest n that can be handled?

Limitations of Kernel k-means

• More complex than k-means

• Need to compute and store n x n kernel matrix

• Appropriate kernel function has to be determined

• Largest n that can be handled?

• Intel Xeon E7-8837 Processor (Q2’11), Oct-core, 2.8GHz, 4TB max memory

• < 1 million points with “single” precision numbers

• May take several days to only compute the kernel matrix

“Big data” Volume* – Big data comes in one size: large

*Defn. due to IBM

Data Volume

Application Clustering Task Size of data Number of

features

Document retrieval Group documents of

similar topics

109 104

Gene analysis Group genes with

similar expression

levels

106 102

Image retrieval Quantize low-level

features

109 102

Earth science data

analysis

Derive climate

indices

105 102

“Big data” Velocity – Often time-sensitive, big data must be

used as it is streaming

“Big data” Variety – Big data extends beyond structured data,

including unstructured data of all varieties: text, audio, video, click streams, log files and more

Large Scale Clustering

Deals with the first issue related to big data – the volume of data

Issues:

Computational Complexity

Hardware Limitations

Application Requirements

MapReduce Framework

How to distribute k-means?

Two methods

• Distribute distance computation

k-means Clustering with MapReduce - I

Distribute the cost of distance computation

Cluster centers maintained in global memory

Divide points among map tasks

Parallel k-means clustering based on MapReduce, Cloud computing, 2009

k-means Clustering with MapReduce - I

Map function

Find the closest center for data point

Intermediate output: Closest cluster index

Combine function

Partially sum the values of the points assigned to the same cluster, keep track of number of points in the cluster

Reduce function

Compute new centers from the output of combine function

Parallel k-means clustering based on MapReduce, Cloud computing, 2009

Two methods

• Distribute clustering task

k-means Clustering with MapReduce - II

Distribute the cost of clustering

Map function

Cluster the partition into k clusters

Intermediate output: Clusters of the partition

Reduce function

Cluster the cluster centers from the map output to obtain the new centers

Fast clustering using MapReduce, KDD, 2011

k-means Clustering with MapReduce - II

No global storage required

Approximate solution

Clustering error (SSE) < 2 * optimal clustering error

Fast clustering using MapReduce, KDD, 2011

Machine Learning on Mapreduce

Mahout – scalable implementation of major clustering and classification algorithms on Hadoop

Open source

Java and Maven based

Large Scale Kernel Clustering

Data set with 'n' points.

When n ~ 106 more than 1 TB of memory required, highly expensive computationally

Kn× n

Approximate Kernel k-means

Low rank approximation

Use a small portion of the kernel matrix for clustering.

(n-m) x (n-m) chunk of the kernel matrix need not be computed

= n x n n x m m x m m x n

Approximate Kernel k-means: Solution to Large Scale Kernel Clustering, KDD, 2011

K BK '

BK1K̂

Cluster centers – linear combination of sampled points

Approximation error

ijij xc1

error Clustering Optimal1

1error Clusteringm

Approximate Kernel k-means: Solution to Large Scale Kernel Clustering, KDD, 2011

Performance of Approximate Kernel k-means

MNIST data set (70,000 data points)

Kernel calculation Clustering

Kernel k-means 514 seconds 3953 seconds

Approximate kernel k-

means (m=1000)

8 seconds 75 seconds

About 98% reduction in time

Almost the same clustering error as kernel k-means

Performance of Approximate Kernel k-means

Network Intrusion data set ( > 4 million data points)

• Kernel k-means not possible on a “normal” system

• Requires 64 TB of memory

• Approximate kernel k-means with just 40 GB memory

Kernel calculation Clustering

Approximate kernel k-

means (m=1000)

52 seconds 433 seconds

Summary

• Kernel k-means

• Performs better than k-means

• Kernel clustering algorithms, in general are more complex than linear clustering algorithms

• Large scale clustering

• Distributed and approximate variants of existing algorithms required for clustering large data

k-meanscse802/clusteringSlides.pdfk-means Network Intrusion data set ( > 4 million data points)...

Documents

Meta-analytic framework for sparse K-means to …3.1 K-means and sparse K-means K-means algorithm (Hartigan and Wong, 1979) has been a popular clustering method due to its simplicity

Clustering: K-Means

K-MEANS CLUSTERING

INTRUSION DETECTION SYSTEM USING HYBRID GSA-K-MEANS · 2.1 Organization of a generalized intrusion detection system 10 2.2 Classification of the anomaly detection techniques according

Yinyang K-Means: A Drop-In Replacement of the Classic K ... › f4b9 › ab75539ffddad031134c596… · Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent

RK-Means Clustering: K-Means with Reliability

Clustering Beyond K -means

Towards the world's fastest k-means algorithmcs.ecs.baylor.edu/~hamerly/software/fast_kmeans_talk_20140515.pdf · Towards the world’s fastest k-means algorithm 1 The k-means clustering

Computer System Intrusion Detection: A Surveyrblee/ELE572Papers/Fall04Readings/... · Intrusion Detection 1 02/09/00 Computer System Intrusion Detection: A Survey1 Anita K. Jones

Lecture 14 Machine Learning. K -means, kNN - KTH...Lecture 14 Machine Learning. K -means, kNN Contents K-means clustering K-Nearest Neighbour Power Systems Analysis – An automated

A Novel Distributed Intrusion Detection System for ... · A Novel Distributed Intrusion Detection System for ... (VCPS) using sophisticated ... The K-means clustering method divides

K-Means with BSP

Intrusion Detection Systems Using K-Means And Random

Determining the ‘k’ in k-Means Clustering

Enhanced k-means Clustering Algorithmlibrary.iugaza.edu.ps/thesis/92186.pdf · Random k-means initialization generally leads k-means to converge to local minima i.e. inacceptable

Alternatives to the k-means algorithm that ﬁnd better ...Thealgorithms k-means, Gaussian expectation-maximization, fuzzy k-means, andk-harmonic means are in the family of center-based

Scalable K-Means++

A Strongly Consistent Sparse k-means Clustering with ...proved k-prototypes [24], Minkowski Weighted k-means [13], Feature Weight Self-Adjustment k-Means [10], Feature Group Weighted

Selection of k in K-means Clustering

Cluster K Means