Upload
mailtorohitaqua
View
264
Download
0
Tags:
Embed Size (px)
DESCRIPTION
cluster by evan
Citation preview
Clustering Analysis(of Spatial Data and using Peano Count Trees) (Ptree technology is patented by NDSU)
Notes:1. over 100 slides not going to go through each in detail.
Clustering MethodsA Categorization of Major Clustering MethodsPartitioning methodsHierarchical methodsDensity-based methodsGrid-based methodsModel-based methods
Clustering Methods based on PartitioningPartitioning method: Construct a partition of a database D of n objects into a set of k clustersGiven a k, find a partition of k clusters that optimizes the chosen partitioning criterionk-means (MacQueen67): Each cluster is represented by the center of the clusterk-medoids or PAM method (Partition Around Medoids) (Kaufman & Rousseeuw87): Each cluster is represented by 1 object in the cluster (~ the middle object or median-like object)
The K-Means Clustering MethodGiven k, the k-means algorithm is implemented in 4 steps (assumes partitioning criteria is: maximize intra-cluster similarity and minimize inter-cluster similarity. Of course, a heuristic is used. Method isnt really an optimization)
Partition objects into k nonempty subsets (or pick k initial means).
Compute the mean (center) or centroid of each cluster of the current partition (if one started with k means, then this step is done).centroid ~ point that minimizes the sum of dissimilarities from the mean or the sum of the square errors from the mean. Assign each object to the cluster with the most similar (closest) center.
Go back to Step 2
Stop when the new set of means doesnt change (or some other stopping condition?)
k-Means012345678910012345678910Step 1Step 2Step 3Step 4
The K-Medoids Clustering MethodFind representative objects, called medoids, (must be an actual object in the cluster, where as the mean seldom is).
PAM (Partitioning Around Medoids, 1987)starts from an initial set of medoidsiteratively replaces one of the medoids by a non-medoidif it improves the aggregate similarity measure, retain the swap. Do this over all medoid-nonmedoid pairsPAM works for small data sets. Does not scale for large data setsCLARA (Clustering LARge Applications) (Kaufmann,Rousseeuw, 1990) Sub-samples then apply PAMCLARANS (Clustering Large Applications based on RANdomSearch) (Ng & Han, 1994): Randomized the sampling
PAM (Partitioning Around Medoids) (1987)Use real object to represent the clusterSelect k representative objects arbitrarilyFor each pair of non-selected object h and selected object i, calculate the total swapping cost TCi,hFor each pair of i and h, If TCi,h < 0, i is replaced by hThen assign each non-selected object to the most similar representative objectrepeat steps 2-3 until there is no change
CLARA (Clustering Large Applications) (1990)CLARA (Kaufmann and Rousseeuw in 1990)It draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the outputStrength: deals with larger data sets than PAMWeakness:Efficiency depends on the sample sizeA good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased
CLARANS (Randomized CLARA) (1994)CLARANS (A Clustering Algorithm based on Randomized Search) (Ng and Han94)CLARANS draws sample of neighbors dynamicallyThe clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoidsIf the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum (Genetic-Algorithm-like)Finally the best local optimum is chosen after some stopping condition.It is more efficient and scalable than both PAM and CLARA
Distance-based partitioning has drawbacksSimple and fast O(N)The number of clusters, K, has to be arbitrarily chosen before it is known how many clusters is correct.Produces round shaped clusters, not arbitrary shapes (Chameleon data set below)
Sensitive to the selection of the initial partition and may converge to a local minimum of the criterion function if the initial partition is not well chosen.Correct resultK-means result
Distance-based partitioning (Cont.)If we start with A, B, and C as the initial centriods around which the three clusters are built, then we end up with the partition {{A}, {B, C}, {D, E, F, G}} shown by ellipses.
Whereas, the correct three-cluster solution is obtained by choosing, for example, A, D, and F as the initial cluster means (rectangular clusters).
A Vertical Data ApproachPartition the data set using rectangle P-trees (a gridding)
These P-trees can be viewed as a grouping (partition) of data
Pruning out outliers by disregard those sparse values
Input: total number of objects (N), percentage of outliers (t) Output: Grid P-trees after prune(1) Choose the Grid P-tree with smallest root count (Pgc)(2) outliers:=outliers OR Pgc(3) if (outliers/N
Distance FunctionData Matrix n objects p variables
Dissimilarity Matrix n objects n objects
AGNES (Agglomerative Nesting)Introduced in Kaufmann and Rousseeuw (1990)Use the Single-Link (distance between two sets is the minimum pairwise distance) methodMerge nodes that are most similarityEventually all nodes belong to the same cluster
DIANA (Divisive Analysis)Introduced in Kaufmann and Rousseeuw (1990)Inverse order of AGNES (intitially all objects are in one cluster; then it is split according to some criteria (e.g., maximize some aggregate measure of pairwise dissimilarity again)Eventually each node forms a cluster on its own
Contrasting Clustering TechniquesPartitioning algorithms: Partition a dataset to k clusters, e.g., k=3
Hierarchical alg: Create hierarchical decomposition of ever-finer partitions.
e.g., top down (divisively).bottom up (agglomerative)
Hierarchical Clustering
Hierarchical Clustering (top down)In either case, one gets a nice dendogram in which any maximal anti-chain (no 2 nodes linked) is a clustering (partition).
Hierarchical Clustering (Cont.)Recall that any maximal anti-chain (maximal set of nodes in whichno 2 are chained) is a clustering (a dendogram offers many).
Hierarchical Clustering (Cont.)But the horizontal anti-chains are the clusterings resulting from thetop down (or bottom up) method(s).
Hierarchical Clustering (Cont.)Most hierarchical clustering algorithms are variants of the single-link, complete-link or average link.
Of these, single-link and complete link are most popular.
In the single-link method, the distance between two clusters is the minimum of the distances between all pairs of patterns drawn one from each cluster.
In the complete-link algorithm, the distance between two clusters is the maximum of all pairwise distances between pairs of patterns drawn one from each cluster.
In the average-link algorithm, the distance between two clusters is the average of all pairwise distances between pairs of patterns drawn one from each cluster (which is the same as the distance between the means in the vector space case easier to calculate).
Distance Between ClustersSingle Link: smallest distance between any pair of points from two clustersComplete Link: largest distance between any pair of points from two clusters
Distance between Clusters (Cont.)Average Link: average distance between points from two clusters Centroid: distance between centroids of the two clusters
Single Link vs. Complete Link (Cont.)Single link works but not complete linkComplete link works but not single linkComplete link works but not single link
Single Link vs. Complete Link (Cont.)Single link worksComplete link doesnt
Single Link vs. Complete Link (Cont.)Single link doesnt worksComplete link does1-cluster noise 2-cluster
Hierarchical vs. PartitionalHierarchical algorithms are more versatile than partitional algorithms. For example, the single-link clustering algorithm works well on data sets containing non-isotropic (non-roundish) clusters including well-separated, chain-like, and concentric clusters, whereas a typical partitional algorithm such as the k-means algorithm works well only on data sets having isotropic clusters.On the other hand, the time and space complexities of the partitional algorithms are typically lower than those of the hierarchical algorithms.
More on Hierarchical Clustering MethodsMajor weakness of agglomerative clustering methodsdo not scale well: time complexity of at least O(n2), where n is the number of total objectscan never undo what was done previously (greedy algorithm)
Integration of hierarchical with distance-based clusteringBIRCH (1996): uses Clustering Feature tree (CF-tree) and incrementally adjusts the quality of sub-clusters
CURE (1998): selects well-scattered points from the cluster and then shrinks them towards the center of the cluster by a specified fraction
CHAMELEON (1999): hierarchical clustering using dynamic modeling
Density-Based Clustering MethodsClustering based on density (local cluster criterion), such as density-connected pointsMajor features:Discover clusters of arbitrary shapeHandle noiseOne scanNeed density parameters as termination conditionSeveral interesting studies:DBSCAN: Ester, et al. (KDD96)OPTICS: Ankerst, et al (SIGMOD99).DENCLUE: Hinneburg & D. Keim (KDD98)CLIQUE: Agrawal, et al. (SIGMOD98)
Density-Based Clustering: BackgroundTwo parameters:: Maximum radius of the neighbourhoodMinPts: Minimum number of points in an -neighbourhood of that pointN(p):{q belongs to D | dist(p,q) }Directly (density) reachable: A point p is directly density-reachable from a point q wrt. , MinPts if 1) p belongs to N(q)2) q is a core point: |N(q)| MinPts
Density-Based Clustering: Background (II)Density-reachable: A point p is density-reachable from a point q (p) wrt , MinPts if there is a chain of points p1, , pn, p1=q, pn=p such that pi+1 is directly density-reachable from pi q, q is density-reachable from q.Density reachability is reflexive and transitive, but not symmetric, since only core objects can be density reachable to each other.Density-connectedA point p is density-connected to a q wrt , MinPts if there is a point o such that both, p and q are density-reachable from o wrt , MinPts.Density reachability is not symmetric, Density connectivity inherits the reflexivity and transitivity and provides the symmetry. Thus, density connectivity is an equivalence relation and therefore gives a partition (clustering).pqp1pqo
DBSCAN: Density Based Spatial Clustering of Applications with NoiseRelies on a density-based notion of cluster: A cluster is defined as an equivalence class of density-connected points.Which gives the transitive property for the density connectivity binary relation and therefore it is an equivalence relation whose components form a partition (clustering) according to the duality.Discovers clusters of arbitrary shape in spatial databases with noiseCoreBorderOutlier = 1cmMinPts = 3
DBSCAN: The AlgorithmArbitrary select a point pRetrieve all points density-reachable from p wrt , MinPts.If p is a core point, a cluster is formed (note: it doesnt matter which of the core points within a cluster you start at since density reachability is symmetric on core points.)If p is a border point or an outlier, no points are density-reachable from p and DBSCAN visits the next point of the database. Keep track of such points. If they dont get scooped up by a later core point, then they are outliers.Continue the process until all of the points have been processed.What about a simpler version of DBSCAN:Define core points and core neighborhoods the same way.Define (undirected graph) edge between two points if they cohabitate a core nbrhd.The connectivity component partition is the clustering.Other related method? How does vertical technology help here? Gridding?
OPTICSOrdering Points To Identify Clustering StructureAnkerst, Breunig, Kriegel, and Sander (SIGMOD99)http://portal.acm.org/citation.cfm?id=304187
Addresses the shortcoming of DBSCAN, namely choosing parameters.
Develops a special order of the database wrt its density-based clustering structure This cluster-ordering contains info equivalent to the density-based clusterings corresponding to a broad range of parameter settings
Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure
OPTICSDoes this order resemble theTotal Variation order?
OPTICS: Some Extension from DBSCANIndex-based: k = number of dimensions N = number of points (20)p = 75%M = N(1-p) = 5Complexity: O(kN2)Core Distance
Reachability Distance
Dp2MinPts = 5e = 3 cmMax (core-distance (o), d (o, p))r(p1, o) = 2.8cm. r(p2,o) = 4cmoop1
Reachability-distanceCluster-orderof the objectsundefined
DENCLUE: using density functionsDENsity-based CLUstEring by Hinneburg & Keim (KDD98)Major featuresSolid mathematical foundationGood for data sets with large amounts of noiseAllows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data setsSignificant faster than existing algorithm (faster than DBSCAN by a factor of up to 45 claimed by authors ???)But needs a large number of parameters
Denclue: Technical EssenceUses grid cells but only keeps information about grid cells that do actually contain data points and manages these cells in a tree-based access structure.Influence function: describes the impact of a data point within its neighborhood.F(x,y) measures the influence that y has on x.A very good influence function is the Gaussian, F(x,y) = e d2(x,y)/2Others include functions similar to the squashing functions used in neural networks.One can think of the influence function as a measure of the contribution to the density at x made by y.Overall density of the data space can be calculated as the sum of the influence function of all data points.Clusters can be determined mathematically by identifying density attractors.Density attractors are local maximal of the overall density function.
DENCLUE(D,,c,)Grid Data Set (use r = , the std. dev.)Find (Highly) Populated Cells (use a threshold=c) (shown in blue)Identify populated cells (+nonempty cells)
Find Density Attractor pts, C*, using hill climbing:Randomly pick a point, pi.Compute local density (use r=4)Pick another point, pi+1, close to pi, compute local density at pi+1If LocDen(pi) < LocDen(pi+1), climbPut all points within distance /2 of path, pi, pi+1, C* into a density attractor cluster called C*Connect the density attractor clusters, using a threshold, , on the local densities of the attractors.
A. Hinneburg and D. A. Keim. An Efficient Approach to Clustering in Multimedia Databases with Noise. In Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press, 1998. & KDD 99 Workshop.
Comparison: DENCLUE Vs DBSCAN
BIRCH (1996)Birch: Balanced Iterative Reducing and Clustering using Hierarchies, by Zhang, Ramakrishnan, Livny (SIGMOD96http://portal.acm.org/citation.cfm?id=235968.233324&dl=GUIDE&dl=ACM&idx=235968&part=periodical&WantType=periodical&title=ACM%20SIGMOD%20Record&CFID=16013608&CFTOKEN=14462336Incrementally construct a CF (Clustering Feature) tree, a hierarchical data structure for multiphase clusteringPhase 1: scan DB to build an initial in-memory CF tree (a multi-level compression of the data that tries to preserve the inherent clustering structure of the data) Phase 2: use an arbitrary clustering algorithm to cluster the leaf nodes of the CF-tree Scales linearly: finds a good clustering with a single scan and improves quality with a few additional scansWeakness: handles only numeric data, and sensitive to the order of the data record.
BIRCHABSTRACTFinding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters, or densely populated regions, in a multi-dimensional dataset.Prior work does not adequately address the problem of large datasets and minimization of I/O costs.This paper presents a data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases.BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints).BIRCH can typically find a good clustering with a single scan of the data, and improve the quality further with a few additional scans.BIRCH is also the first clustering algorithm proposed in the database area to handle "noise" (data points that are not part of the underlying pattern) effectively.We evaluate BIRCH's time/space efficiency, data input order sensitivity, and clustering quality through several experiments.
Clustering Feature VectorCF = (5, (16,30),(54,190))(3,4)(2,6)(4,5)(4,7)(3,8)
Chart4
4
6
2
7
8
5
5
2
4
4
y
Sheet1
xy
34
26
73
47
38
85
45
51
74
55
3.55.1666666667
6.754.25
distance2.27777777781.125
3.55.8333333333
xy
34
36
73
47
38
85
45
51
74
55
3.66666666675.8333333333
6.16666666673.8333333333
Sheet1
0
0
0
0
0
0
0
0
0
0
y
Sheet2
0
0
0
0
0
0
0
0
0
0
0
0
y
Sheet3
0
0
0
0
0
0
0
0
0
0
y
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
xyPAM
34
26
72
47
38
85
45
5.912
74
84
0
0
0
0
0
0
0
0
0
0
y
k
i
h
j
BirchBranching factor, B=6Threshold, L = 7Iteratively put points into closest leaf until threshold is exceed, then split leaf.Inodes summarize their subtrees and Inodes get split when threshold is exceeded.Once in-memory CF tree is built, use another method to cluster leaves together.
CURE (Clustering Using REpresentatives )CURE: proposed by Guha, Rastogi & Shim, 1998http://portal.acm.org/citation.cfm?id=276312Stops the creation of a cluster hierarchy if a level consists of k clustersUses multiple representative points to evaluate the distance between clustersadjusts well to arbitrary shaped clusters (not necessarily distance-basedavoids single-link effect
Drawbacks of Distance-Based MethodDrawbacks of square-error based clustering method Consider only one point as representative of a clusterGood only for convex shaped, similar size and density, and if k can be reasonably estimated
Cure: The AlgorithmVery much a hybrid method (involves pieces from many others).Draw random sample s.Partition sample to p partitions with size s/pPartially cluster partitions into s/pq clustersEliminate outliersBy random samplingIf a cluster grows too slow, eliminate it.Cluster partial clusters.Label data in disk
CureABSTRACT
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers.We propose a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size.CURE achieves this by representing each cluster by a certain fixed number of points that are generated by selecting well scattered points from the cluster and then shrinking them toward the center of the cluster by a specified fraction.Having more than one representative point per cluster allows CURE to adjust well to the geometry of non-spherical shapes and the shrinking helps to dampen the effects of outliers.To handle large databases, CURE employs a combination of random sampling and partitioning. A random sample drawn from the data set is first partitioned and each partition is partially clustered. The partial clusters are then clustered in a second pass to yield the desired clusters.Our experimental results confirm that the quality of clusters produced by CURE is much better than those found by existing algorithms.Furthermore, they demonstrate that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.
Data Partitioning and Clusterings = 50p = 2s/p = 25xxs/pq = 5
Cure: Shrinking Representative PointsShrink the multiple representative points towards the gravity center by a fraction of .Multiple representatives capture the shape of the cluster
Clustering Categorical Data: ROCKhttp://portal.acm.org/citation.cfm?id=351745
ROCK: Robust Clustering using linKs, by S. Guha, R. Rastogi, K. Shim (ICDE99). Agglomerative HierarchicalUse links to measure similarity/proximityNot distance basedComputational complexity:Basic ideas:Similarity function and neighbors:Let T1 = {1,2,3}, T2={3,4,5}
ROCKAbstract
Clustering, in data mining, is useful to discover distribution patterns in the underlying data.Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions.In this paper, we study clustering algorithms for data with boolean and categorical attributes.We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points.We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters.Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge.In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as synthetic data sets to demonstrate the effectiveness of our techniques.For data with categorical attributes, our findings indicate that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.
Rock: AlgorithmLinks: The number of common neighbors for the two pts
AlgorithmDraw random sampleCluster with linksLabel data in disk{1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}{1,4,5}, {2,3,4}, {2,3,5}, {2,4,5}, {3,4,5}{1,2,3} {1,2,4}3
CHAMELEONCHAMELEON: hierarchical clustering using dynamic modeling, by G. Karypis, E.H. Han and V. Kumar99 http://portal.acm.org/citation.cfm?id=621303
Measures the similarity based on a dynamic modelTwo clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high relative to the internal interconnectivity of the clusters and closeness of items within the clustersA two phase algorithm1. Use a graph partitioning algorithm: cluster objects into a large number of relatively small sub-clusters2. Use an agglomerative hierarchical clustering algorithm: find the genuine clusters by repeatedly combining these sub-clusters
CHAMELEONABSTRACT
Many advanced algorithms have difficulty dealing with highly variable clusters that do not follow a preconceived model.By basing its selections on both interconnectivity and closeness, the Chameleon algorithm yields accurate results for these highly variable clusters.Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged.Furthermore, one set of schemes (the CURE algorithm and related schemes) ignores the information about the aggregate interconnectivity of items in two clusters.Another set of schemes (the Rock algorithm, group averaging method, and related schemes) ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters.By considering either interconnectivity or closeness only, these algorithms can select and merge the wrong pair of clusters.Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters.Chameleon finds the clusters in the data set by using a two-phase algorithm.During the first phase, Chameleon uses a graph-partitioning algorithm to cluster the data items into several relatively small subclusters.During the second phase, it uses an algorithm to find the genuine clusters by repeatedly combining these sub-clusters.
Overall Framework of CHAMELEON
ConstructSparse GraphPartition the GraphMerge PartitionFinal ClustersData Set
Grid-Based Clustering Method Using multi-resolution grid data structureSeveral interesting methodsSTING (a STatistical INformation Grid approach) by Wang, Yang and Muntz (1997)WaveCluster by Sheikholeslami, Chatterjee, and Zhang (VLDB98)A multi-resolution clustering approach using wavelet methodCLIQUE: Agrawal, et al. (SIGMOD98)
Vertical griddingWe can observe that almost all methods discussed so far suffer from the curse of cardinality (for very large cardinality data sets, the algorithms are too slow to finish in the average life time!) and/or the curse of dimensionality (points are all at ~ same distance).
The work-arounds employed to address the curses
sampling (throw out most of the points in a way that what remains is low enough cardinality for the algorithm to finish and in such a way that the remaining sample contains all the information of the original data set (Therein is the problem that is impossible to do in general);
Gridding (agglomerate all points in a grid cell and treat them as one point (smooth the data set to this gridding level). The problem with gridding, often, is that info is lost and the data structure that holds the grid cell information is very complex. With vertical methods (e.g., P-trees), all the info can be retained and griddings can be constructed very efficiently on demand. Horizontal data structures cant do this.
Subspace restrictions (e.g., Principal Components, Subspace Clustering)
Gradient based methods (e.g., the gradient tangent vector field of a response surface reduces the calculations to the number of dimensions, not the number of combinations of dimensions.)
j-hi gridding: the j hi order bits identify a grid cells and the rest identify points in a particular cell. Thus, j-hi cells are not necessarily cubical (unless all attribute bit-widths are the same).
j-lo gridding; the j lo order bits identify points in a particular cell and the rest identify a grid cell. Thus, j-lo cells always have a nice uniform shape (cubical).
1-hi gridding of Vector Space, R(A1, A2, A3) in which all bit-widths are the same = 3 (so each grid cell contains 22 * 22 * 22 = 64 potential points). Grid cells are identified by their Peano id (Pid) internally the points cell coordinates are shown - called the grid cell id and cell points are ided by coordinates within their cell.A1hi-bitA3hi-bitA2hi-bit010101 gci=001 gcp=00,00,00Pid = 001 gci=001 gcp=00,00,01 gci=001 gcp=00,01,00 gci=001 gcp=00,01,01 gci=001 gcp=01,00,00 gci=001 gcp=01,00,01 gci=001 gcp=01,01,00 gci=001 gcp=01,01,01 gci=001 gcp=00,00,10 gci=001 gcp=00,00,11 gci=001 gcp=00,01,10 gci=001 gcp=00,01,11 gci=001 gcp=01,00,10 gci=001 gcp=01,00,11 gci=001 gcp=01,01,10 gci=001 gcp=01,01,11 gci=001 gcp=00,10,00 gci=001 gcp=00,10,01 gci=001 gcp=00,11,00 gci=001 gcp=00,11,01 gci=001 gcp=01,10,00 gci=001 gcp=01,10,01 gci=001 gcp=01,11,00 gci=001 gcp=01,11,01 gci=001 gcp=00,10,10 gci=001 gcp=00,10,11 gci=001 gcp=00,11,10 gci=001 gcp=00,11,11 gci=001 gcp=01,10,10 gci=001 gcp=01,10,11 gci=001 gcp=01,11,10 gci=001 gcp=01,11,11 gci=001 gcp=10,00,00 gci=001 gcp=10,00,01 gci=001 gcp=10,01,00 gci=001 gcp=10,01,01 gci=001 gcp=10,00,10 gci=001 gcp=10,00,11 gci=001 gcp=10,01,10 gci=001 gcp=10,01,11 gci=001 gcp=10,10,00 gci=001 gcp=10,10,01 gci=001 gcp=10,11,00 gci=001 gcp=10,11,01 gci=001 gcp=10,10,10 gci=001 gcp=10,10,11 gci=001 gcp=10,11,10 gci=001 gcp=10,11,11 gci=001 gcp=11,00,00 gci=001 gcp=11,00,01 gci=001 gcp=11,01.00 gci=001 gcp=11,01.01 gci=001 gcp=11,00,10 gci=001 gcp=11,00,11 gci=001 gcp=11,01,10 gci=001 gcp=11,01,11 gci=001 gcp=11,10,00 gci=001 gcp=11,10,01 gci=001 gcp=11,11,00 gci=001 gcp=11,11,01 gci=001 gcp=11,10,10 gci=001 gcp=11,10,11 gci=001 gcp=11,11,10 gci=001 gcp=11,11,11
A3A2000110110001101100011011Pid = 001.001 gci=00,00,11gcp=0,0,0 gci=00,00,11gcp=0,0,1 gci=00,00,11gcp=0,1,0 gci=00,00,11gcp=0,1,1 gci=00,00,11gcp=1,0,0 gci=00,00,11gcp=1,0,1 gci=00,00,11gcp=1,1,0 gci=00,00,11gcp=1,1,12-hi gridding of Vector Space, R(A1, A2, A3) in which all bitwidths are the same = 3 (so each grid cell contains 21 * 21 * 21 = 8 points).
1-hi gridding of R(A1, A2, A3), bitwidths of 3,2,3A1hi-bitA3hi-bitA2hi-bit010101 gci=001 gcp=00,0,00Pid = 001 gci=001 gcp=00,0,01 gci=001 gcp=00,1,00 gci=001 gcp=00,1,01 gci=001 gcp=01,0,00 gci=001 gcp=01,0,01 gci=001 gcp=01,1,00 gci=001 gcp=01,1,01 gci=001 gcp=00,0,10 gci=001 gcp=00,0,11 gci=001 gcp=00,1,10 gci=001 gcp=00,1,11 gci=001 gcp=01,0,10 gci=001 gcp=01,0,11 gci=001 gcp=01,1,10 gci=001 gcp=01,1,11 gci=001 gcp=10,0,00 gci=001 gcp=10,0,01 gci=001 gcp=10,1,00 gci=001 gcp=10,1,01 gci=001 gcp=10,0,10 gci=001 gcp=10,0,11 gci=001 gcp=10,1,10 gci=001 gcp=10,1,11 gci=001 gcp=11,0,00 gci=001 gcp=11,0,01 gci=001 gcp=11,1,00 gci=001 gcp=11,1.01 gci=001 gcp=11,0,10 gci=001 gcp=11,0,11 gci=001 gcp=11,1,10 gci=001 gcp=11,1,11
2-hi gridding) of R(A1, A2, A3), bitwidths of 3,2,3 (each grid cell contains 21 * 20 * 21 = 4 potential pts). A12-hi-bitA32-hi-bitA22-hi-bit000100011011gcp=0,,0gcp=0,,1 gcp=1,,0gcp=1,,1101100011011Pid = 3.1.3
gcp=1010,1010,1011gcp=1010,1010,1010HOBBit disks and rings: (HOBBit = Hi Order Bifurcation Bit) 4-lo grid where A1,A2,A3 have bit-widths, b1+1, b2+1, b3+1, HOBBit grid centers are points of the form (exactly one per grid cell):x=(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010) where xi,js range over all binary patterns
HOBBit disk about x, of radius 20 , H(x,20).
Note: we have switched the direction of A3A3(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)(x1,b1..x1,41010, x2,b2..x2,41011, x3,b3..x3,41010)(x1,b1..x1,41011, x2,b2..x2,41010, x3,b3..x3,41011)(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41011)(x1,b1..x1,41010, x2,b2..x2,41011, x3,b3..x3,41011)(x1,b1..x1,41011, x2,b2..x2,41010, x3,b3..x3,41010)gcp=1010,1011,1011gcp=1010,1011,1010gcp=1011,1010,1011gcp=1011,1010,1010gcp=1011,1011,1011gcp=1011,1011,1010 (x1,b1..x1,41011, x2,b2..x2,41011, x3,b3..x3,41011)(x1,b1..x1,41011, x2,b2..x2,41011, x3,b3..x3,41010)
H(x,21) HOBBit disk about a HOBBit grid center pt, x = , of radius 21A1A2A3(x1,b1..x1,41000, x2,b2..x2,41000, x3,b3..x3,41000)(x1,b1..x1,41011, x2,b2..x2,41011, x3,b3..x3,41011)(x1,b1..x1,41000, x2,b2..x2,41011, x3,b3..x3,41000)(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)(x1,b1..x1,41011, x2,b2..x2,41000, x3,b3..x3,41011)(x1,b1..x1,41011, x2,b2..x2,41000, x3,b3..x3,41000)(x1,b1..x1,41011, x2,b2..x2,41000, x3,b3..x3,41010)(x1,b1..x1,41011, x2,b2..x2,41000, x3,b3..x3,41001)(x1,b1..x1,41011, x2,b2..x2,41010, x3,b3..x3,41011)(x1,b1..x1,41000, x2,b2..x2,41011, x3,b3..x3,41011)(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)
The Regions of H(x,21) are as follows:
These REGIONS are labeled with dimensions in which length is increased(e.g., all three dimensions are increased below).A1A2A3123-REGION
A1A2A313-REGION
A1A2A323-REG
A1A2A312-REGION
A1A2A33-REG
A1A2A32-REG
A1A2A31-REGION
A1A2A3H(x,20) =123-REGOf H(x,20)
Algorithm (for computing gradients):
Select an outlier threshold, (pts without neighbors in their ot L-disk are outliers That is, there is no gradient at these outlier points (instantaneous rate of response change is zero).
Create an j-lo grid with j=ot (see previous slides - where HOBBit disks are built out from HOBBit centers x = ( x1,b1x1,ot+11010 , , xn,bnxn,ot+11010 ), xi,js ranging over all binary patterns).
Pick a point, x in R. Build out alternating one-sided-rings centered at x until a neighbor is found or radius ot is exceeded (in which case x is declared an outlier). If a neighbor is found at a raduis, ri < ot 2j, f/ xk(x) is estimated as below:
Note: one can use L-HOBBit or L ordinary distance.Note: One-sided means that each successive build out increases aternatively only in the positive direction in all dimensions then only in the negative direction in all dimensions.Note: Building out HOBBit disks from a HOBBit center automatically gives one-sided rings (a built-out ring is defined to be the built-out disk minus the previous built-out disk) as shown in the next few slides.
( RootcountD(x,ri) - RootcountD(x,ri)k ) / xk where D(x,ri)k is D(x,ri-1) expandedin all dimensions except k.
Alternatively in 3., actually calculate the mean (or median?) of the new points encountered in D(x,ri) (we have a P-tree mask for the set so this it trivial) and measure the xk-distance.
NOTE: Might want to go 1 more ring out to see if one gets the same or a similar gradient(this seems particularly important when j is odd (since the gradient then points the opposite way.)NOTE: the calculation of xk can be done in various ways. Which is best?
H(x,21)H(x,21)1HOBBit center, x=(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)First new pointgradient
H(x,21)H(x,21)2gradient( RootcountD(x,ri) - RootcountD(x,ri)2 ) / x2 = (2-1)/(-1) = -1
H(x,21)H(x,21)3gradient( RootcountD(x,ri) - RootcountD(x,ri)3 ) / x3 = (2-1)/(-1) = -1
H(x,21)HOBBit center, x=(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)First new pointgradient
H(x,21)H(x,21)1HOBBit center, x=(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)gradientEst f/ xk(x) = (RcD(x,ri) - RootcountD(x,ri)1 ) / x1 =(2-2)/(-1) = 0
H(x,21)H(x,21)2gradient
H(x,21)H(x,21)3gradient
H(x,21)gradient
H(x,21)H(x,21)1HOBBit center, x=(x1,b1..x1,41010, x2,b2..x2,41010, x3,b3..x3,41010)First new point
H(x,21)H(x,21)2( RootcountD(x,ri) - RootcountD(x,ri)2 ) / x2 = (2-2)/(-1) = 0
H(x,21)H(x,21)3( RootcountD(x,ri) - RootcountD(x,ri)3 ) / x3 = (2-2)/(-1) = 0
H(x,21)Intuitively, this Gradient estimation method seems to work.
Next we consider a potential accuracy improvement in which we take the medoid of all new points as the gradient
(or, more accurately, as the point to which we climb in any response surface hill climbing technique)
H(x,21)new points = sEstimate the gradient arrowhead as being at the medoid of the new point set (or, more correctly, estimate the next hill-climb step).Note: If the original points are truly part of a strong cluster, the hill climb will be excellent. new points centroid =
H(x,21)new points = sEstimate the gradient arrowhead as being at the medoid of the new point set (or, more correctly, estimate the next hill-climb step).Note: If the original points are not truly part of a strong cluster, the weak hill climb will indicate that. new points centroid =
H(x,22)H(x,22)1First new point
H(x,2)To evaluate how well the formula estimates thegradient, it is important to consider all cases of thenew point appearing in one of these regions(if 1 point appears, gradient components areadditive, so it suffices to consider 1?
To evaluate how well the formula estimates the gradient, it is important to consider all cases of the new point appearing in 1 of these regions (if 1 pt appears, gradient comps add)
H( x,23 )Notice that the HOBBit center moves more and more toward the true center as the grid size increases.
Grid based Gradients and Hill ClimbingIf we are using gridding to produce the gradient vector field of a response surface, might we always vary xi in the positive direction only? How can that be done most efficiently? j-lo gridding, building out HOBBit rings from HOBBit grid centers (see previous slides where this approach was used.) or j-lo gridding. building out HOBBit rings from lo-value grid pts (ending in j 0-bits) x = ( x1,b1x1,j+100 , , xn,bnxn,j+100 ) Ordinary j-lo griddng, building out rings from lo-value ids (ending in j zero bits)Ordinary j-lo gridding, uilding out Rings from true centers.Other? (there are many other possibilities, but we will first explore 2.) Using j-lo gridding with j=3 and lo-value cell identifiers, is shown on the next slide.
Of course, we need not use HOBBit build out.
With ordinary unit radius build out, the results are more exact, but are the calculations may be more complex???
HOBBit j-lo rings using lo-value cell ids x=(x1,b1x1,j+100 ,, xn,bnxn,j+100)
Disk(x,0)= PDisk(x,2)^PDisk(x,1)
wherePD(x,i) =
Pxb^..^Pxj+1 ^Pj^..^Pi+1 = PDisk(x,3)^PDisk(x,2)Ordinary j-lo rings using lo-value cell ids x=(x1,b1x1,j+100 ,, xn,bnxn,j+100)
k-Medoids Clustering Review:Find representative objects (medoids) (actual objects in the cluster - mean seldom is).PAM (Partitioning Around Medoids)Select k representative objects arbitrarilyFor each pair of non-selected object h and selected object i, calculate the total swapping cost TCi,hFor each pair of i and h, If TCi,h < 0, i is replaced by hThen assign each non-selected object to the most similar representative objectrepeat steps 2-3 until there is no change
CLARA (Clustering LARge Apps) draws multiple samples of the data set, applies PAM on each sample, and gives the best clustering as the output. Strength: deals with larger data sets than PAM. Weakness: Efficiency depends on the sample size. A good clustering based on samples will not necessarily represent a good clustering of the whole data set if the sample is biased
CLARANS (Clustering Large Apps based on RANdom Search) draws sample of neighbors dynamically. The clustering process can be presented as searching a graph where every node is a potential solution, that is, a set of k medoids. If the local optimum is found, CLARANS starts with new randomly selected node in search for a new local optimum (Genetic-Algorithm-like). Finally the best local optimum is chosen after some stopping condition. It is more efficient and scalable than both PAM and CLARA
A Vertical k-Medoids Clustering AlgorithmFollowing PAM (to illustrate the main killer idea but it can apply much more widely)Select k component P-treesThe Goal here is to efficiently get one Ptree mask for each componente.g., calculate the smallest j : the j-lo gridding has > k cells.Agglomerate into precisely k components (by ORing Ptrees of cells with closest means/mediods/corners(single_link)Where PAM uses: For each pair of non-selected object h and selected object i, calculate total swapping cost TCi,h. For each pair of i and h, if TCi,h < 0, i is replaced by h, then assign each non-selected object to the most similar object., use:Find Medoid of each component, Ci: Calculate TV(Ci,x) x Ci. (create P-tree of all points with min TV so far. smaller TV, reset this P-tree to it. Ending up with a P-tree, PtMi of tieing-Medoids for Ci. Calc TV(PtMi ,x) x PtMi and pick its medoid (if there are still multiple, repeat). This is Ci-medoid! Alternatively, just pick 1 pre-Medoid!Note: this avoids expense of pairwise swappings, and avoids subsampling as in CLARA, CLARANS)Put each point with its closest Medoid (building P-tree component masks as you do this).Repeat 1 & 2 until (some stopping condition such as: no change in Medoid set?)Can we cluster at step 4 without a scan (create component P-trees??)
A Vertical k-Means Clustering AlgorithmAs mentioned on previous slide, a Vertical k-Means algorithm goes similarily:Select k component P-trees (The Goal here is to efficiently get one Ptree mask for each component, e.g., calculate the smallest j : the j-lo gridding has > k cells and agglomerate into precisely k components (by ORing Ptrees of cells with closest means)Calculate the Means of each component, Ch, byANDing each basic Ptree with PCi In dimension, Ak , calculate the k-component of the mean as i=bk..0 2i * rc(PCh ^ Pk,i ) Put each pt with closest Mean (building P-tree component masks as you do this).Repeat 2, 3 until (some stopping condition such as: no change in Mean set?)Can we cluster at step 4 without a scan (create component P-trees??)
Zillions of vertical hybrid clustering algs leap to mind (involving partitioning, hierarchical, density methods)! Pick one!
Finding Density Attractors for Density Clustering AlgFinding density attractors (and their attractor sets which, when agglomerated via a density threshold constitutes the generic density clustering algorithm)
Pick a point.Build out (HOBBit or Ordinary or?) rings one at a time until the first k neighbors are found.In that ring, computer the medoid points (as in step 1 of the V-k-Medoids algorithm above)if the Medoid increases the density, climb to it and goto 2Else declare it a density attractor and goto 1
j-grids - P-tree relationship1-hi-gridding of R(A1, A2, A3).Bitwidths: 3,2,3 (dim cardinalities: 8,4,8)Using tree node-like identifierscell (tree) id (ci) of the form, c0c1cdpoint (coord) id (pi) of the form, p1,p2,p3
00.0.00
00.0.0100.1.0000.1.01
01.0.00
01.0.0101.1.0101.1.01
00.0.10
00.0.1100.1.1000.1.11
01.0.10
01.0.1101.1.1001.1.1110.0.0010.0.0110.1.0010.1.0110.0.10 10.0.1110.1.1010.1.1111.0.0011.0.0111.1.0011.1.0111.0.1011.0.1111.1.1011.1.11A 1-hi-grid yields a P-tree with level-0 (cell level) fanout of 23 and level-1 (point level) fanout of 25. If leaves are segment labelled (not coords):
j-grids - P-tree relationship (Cont.)One can view a standard P-tree as nested 1-hi-griddings, with compressed out constant subtrees.R(A1, A2, A3) with bitwidths: 3,2,300011011
Gridding categorical data?The following bioinformatics (yeast genome) data used was extracted mostly from the MIPS database (Munich Information center for Protein Sequences)
Left column shows featurestreat these with hi-order bit (1 iff gene participates)There may be more levels of hierarchy (e.g., function: some genes actually cause the function when they express in sufficient quantities, while others are transcription factors for those primary genes. Primary genes have hi bit on, tfs have second bit on)
Right column shows # distinct feature valuesBitmap these
Data Representationgene-by-feature table.
For a categorical feature, we consider each category as a separate attribute or column by bit-mapping it.
The resulting table has a total of 8039 distinct feature bit vectors (corresponding to items in MBR) for6374 yeast genes (corresponding to transactions in MBR)
STING: A Statistical Information Grid ApproachWang, Yang and Muntz (VLDB97)The spatial area is divided into rectangular cellsThere are several levels of cells corresponding to different levels of resolution
STING: A Statistical Information Grid Approach (2)Each cell at a high level is partitioned into a number of smaller cells in the next lower levelStatistical info of each cell is calculated and stored beforehand and is used to answer queriesParameters of higher level cells can be easily calculated from parameters of lower level cellcount, mean, s, min, max type of distributionnormal, uniform, etc.Use a top-down approach to answer spatial data queriesStart from a pre-selected layertypically with a small number of cellsFor each cell in the current level compute the confidence interval
STING: A Statistical Information Grid Approach (3)Remove the irrelevant cells from further considerationWhen finish examining the current layer, proceed to the next lower level Repeat this process until the bottom layer is reachedAdvantages:Query-independent, easy to parallelize, incremental updateO(K), where K is the number of grid cells at the lowest level Disadvantages:All the cluster boundaries are either horizontal or vertical, and no diagonal boundary is detected
WaveCluster (1998)Sheikholeslami, Chatterjee, and Zhang (VLDB98) A multi-resolution clustering approach which applies wavelet transform to the feature space A wavelet transform is a signal processing technique that decomposes a signal into different frequency sub-band.Both grid-based and density-basedInput parameters: # of grid cells for each dimensionthe wavelet, and the # of applications of wavelet transform.
What is Wavelet (1)?
WaveCluster (1998)How to apply wavelet transform to find clusters Summaries the data by imposing a multidimensional grid structure onto data spaceThese multidimensional spatial data objects are represented in a n-dimensional feature spaceApply wavelet transform on feature space to find the dense regions in the feature spaceApply wavelet transform multiple times which result in clusters at different scales from fine to coarse
What Is Wavelet (2)?
Quantization
Transformation
WaveCluster (1998)Why is wavelet transformation useful for clusteringUnsupervised clustering It uses hat-shape filters to emphasize region where points cluster, but simultaneously to suppress weaker information in their boundary Effective removal of outliersMulti-resolutionCost efficiencyMajor features:Complexity O(N)Detect arbitrary shaped clusters at different scalesNot sensitive to noise, not sensitive to input orderOnly applicable to low dimensional data
CLIQUE (Clustering In QUEst) Agrawal, Gehrke, Gunopulos, Raghavan (SIGMOD98). http://portal.acm.org/citation.cfm?id=276314
Automatically identifying subspaces of a high dimensional data space that allow better clustering than original space CLIQUE can be considered as both density-based and grid-basedIt partitions each dimension into the same number of equal length intervalIt partitions an m-dimensional data space into non-overlapping rectangular unitsA unit is dense if the fraction of total data points contained in the unit exceeds the input model parameterA cluster is a maximal set of connected dense units within a subspace
CLIQUE: The Major StepsPartition the data space and find the number of points that lie inside each cell of the partition.Identify the subspaces that contain clusters using the Apriori principleIdentify clusters:Determine dense units in all subspaces of interestsDetermine connected dense units in all subspaces of interests.Generate minimal description for the clustersDetermine maximal regions that cover a cluster of connected dense units for each clusterDetermination of minimal cover for each cluster
CLIQUEABSTRACTData mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.
We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality.It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension.It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution.Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.
Salary (10,000)2030405060age54312670 = 3
Strength and Weakness of CLIQUEStrength It automatically finds subspaces of the highest dimensionality such that high density clusters exist in those subspacesIt is insensitive to the order of records in input and does not presume some canonical data distributionIt scales linearly with the size of input and has good scalability as the number of dimensions in the data increasesWeaknessThe accuracy of the clustering result may be degraded at the expense of simplicity of the method
Model-Based Clustering MethodsAttempt to optimize the fit between the data and some mathematical modelStatistical and AI approachConceptual clusteringA form of clustering in machine learningProduces a classification scheme for a set of unlabeled objectsFinds characteristic description for each concept (class)COBWEB (Fisher87) A popular a simple method of incremental conceptual learningCreates a hierarchical clustering in the form of a classification treeEach node refers to a concept and contains a probabilistic description of that concept
COBWEB Clustering MethodA classification tree
More on Statistical-Based ClusteringLimitations of COBWEBThe assumption that the attributes are independent of each other is often too strong because correlation may existNot suitable for clustering large database data skewed tree and expensive probability distributionsCLASSITan extension of COBWEB for incremental clustering of continuous datasuffers similar problems as COBWEB AutoClass (Cheeseman and Stutz, 1996)Uses Bayesian statistical analysis to estimate the number of clustersPopular in industry
Other Model-Based Clustering MethodsNeural network approachesRepresent each cluster as an exemplar, acting as a prototype of the clusterNew objects are distributed to the cluster whose exemplar is the most similar according to some dostance measureCompetitive learningInvolves a hierarchical architecture of several units (neurons)Neurons compete in a winner-takes-all fashion for the object currently being presented
Model-Based Clustering Methods
Self-organizing feature maps (SOMs)Clustering is also performed by having several units competing for the current objectThe unit whose weight vector is closest to the current object winsThe winner and its neighbors learn by having their weights adjustedSOMs are believed to resemble processing that can occur in the brainUseful for visualizing high-dimensional data in 2- or 3-D space
Hybrid ClusteringOne of the common approaches is to combine k-means method and hierarchical clustering First partition the dataset into k small clusters, and then merge the clusters based on similarity using hierarchical method.Hybrid clustering combines partitioning clustering and hierarchical clustering approachK = 7
Problems of Existing Hybrid ClusteringPredefine the number of preliminary clusters, K.Unable to handle noisy data
Chart1
411.975006
147.626007
370.727997
284.882996
294.562988
401.381012
281.992004
94.350998
167.190002
118.780998
361.755005
81.598999
381.265991
421.480011
322.730011
69.384003
296.382996
307.89801
429.165985
73.486
163.341995
289.688995
406.865997
92.762001
306.391998
282.093994
468.524994
338.063995
319.738007
97.189003
158.792007
307.096008
387.63501
455.946991
346.17099
91.510002
397.192993
286.822998
166.382004
66.892998
289.194
393.152008
205.548004
148.632996
219.106995
299.589996
396.721985
386.545013
271.19101
453.752991
434.139008
404.265991
61.938
151.570999
340.463989
414.664001
309.684998
309.30899
117.589996
113.858002
342.661987
190.306
334.019012
415.173004
405.80899
325.358002
278.631989
441.615997
166.511993
402.427002
258.368011
123.528999
173.026993
342.243988
154.636002
359.445007
264.644989
318.136993
148.951004
342.123993
284.403015
106.501999
408.743011
168.382004
186.985001
98.852997
121.884003
198.557007
353.244995
336.570007
155.419998
62.771
311.597992
182.811005
417.424011
68.598
371.300995
424.826996
86.344002
423.584015
86.559998
75.369003
427.462006
454.506012
272.295013
363.919006
87.001999
327.880005
261.791992
335.471008
408.656006
79.885002
144.559998
426.477997
243.266998
169.444
173.817001
280.162994
433.441986
156.134995
342.634003
85.092003
107.855003
439.984985
441.214996
353.347992
40.794998
234.908997
277.278992
161.722
94.833
134.664993
281.058014
333.121002
399.162994
313.82901
446.593994
417.181
322.428986
82.452003
401.860992
158.348999
345.700989
266.322998
175.306
357.752991
278.677002
288.769012
111.399002
138.800003
295.947998
369.30899
274.343994
102.209999
162.998001
290.972992
215.755005
417.372986
399.623993
349.989014
394.075012
171.520996
106.579002
138.574005
335.604004
97.509003
399.070007
90.276001
415.957001
339.56601
414.022003
163.787994
461.57901
89.614998
236.263
210.246994
427.973999
161.421997
406.076996
108.414001
312.509003
366.365997
279.231995
380.846008
323.664001
440.687012
419.429993
253.397995
421.220001
292.575989
249.365005
133.772995
51.333
369.588989
396.860992
344.040009
152.404007
399.234985
261.708008
370.350006
124.433998
131.962997
350.19101
74.773003
434.911011
93.491997
99.889999
83.125
354.847992
243.822006
281.057007
303.593994
402.153015
307.951996
316.003998
373.040009
177.156998
361.018005
321.57901
218.477997
90.274002
145.770996
304.859985
393.790009
378.734009
375.536987
148.067993
69.130997
416.785004
352.445007
284.882996
100.473
319.355988
78.129997
91.217003
355.893005
345.015015
292.544006
101.383003
162.660995
306.937012
170.819
173.718994
104.254997
253.785995
346.757996
404.593994
410.269012
179.164993
248.854996
168.171005
223.132996
153.628006
181.177002
105.845001
412.964996
423.796997
205.031006
69.685997
415.209015
452.665009
375.322998
397.28299
178.970993
93.762001
319.312012
346.678009
70.839996
94.478996
364.03299
150.130997
425.682007
46.923
183.192993
215.259003
253.264008
70.057999
103.917999
101.127998
258.218994
277.386993
204.921005
239.093002
413.503998
453.852997
207.559006
272.687988
155.746002
363.645996
327.566986
347.648987
362.05899
443.962006
426.362
416.279999
211.813004
429.858002
248.307007
159.154999
87.805
245.417007
81.510002
402.902008
439.873993
344.179993
38.550999
415.115997
391.984985
121.133003
140.483994
280.044006
251.395004
113.889
129.988998
149.259003
386.951996
341.609009
413.817993
91.872002
256.539001
371.484985
426.82901
433.528992
272.737
424.382996
318.037994
136.136002
173.192001
102.169998
148.266998
148.826996
270.967987
273.917999
345.513
90.611
161.382996
254.389999
441.686005
353.803986
415.084991
360.951996
200.313004
447.243011
161.151993
143.869995
434.391998
284.217987
427.306
393.717987
89.828003
173.658997
148.011002
339.085999
420.188995
107.766998
272.647003
172.613998
387.895996
85.624001
141.061996
264.362
44.105999
281.847992
404.339996
76.639999
87.621002
97.117996
309.628998
172.781998
296.240997
326.884003
77.799004
263.118011
154.477997
89.481003
363.209015
48.134998
337.112
305.195007
149.408005
350.737
439.667999
75.254997
321.296997
224.789993
378.575989
399.600006
105.292
386.027008
391.389008
276.843994
426.437988
94.68
92.777
99.792999
372.265991
183.662994
302.263
359.138
333.631989
191.018005
299.346985
163.597
306.737
75.677002
71.044998
108.685997
183.113998
283.863007
430.812012
258.38501
98.038002
101.893997
171.526001
170.727005
160.106003
214.516998
353.427002
262.48999
369.528015
217.365997
144.889999
261.468994
252.020004
146.255997
309.867004
338.81601
74.276001
170.112
420.506989
286.636993
345.281006
161.199005
407.908997
104.052002
210.143997
185.255005
280.795013
398.273987
176.363998
167.182999
455.894989
410.996002
265.170013
58.123001
146.639999
290.269012
123.141998
251.643005
128.084
150.513
430.519012
77.293999
407.371002
171.828003
124.708
184.681
352.268005
29.370001
335.317993
356.350006
409.332001
289.036987
374.335999
433.358002
422.505005
170.764999
150.860992
72.875999
175.623993
402.600006
156.906998
430.181
30.91
74.058998
346.696014
309.187012
293.131989
143.582001
436.411987
162.210999
114.343002
357.123993
136.572006
287.850006
372.144012
342.381989
336.246002
104.377998
418.369995
234.050003
155.158005
262.756012
120.799004
140.332993
441.903992
414.872986
267.566986
285.682007
363.410004
68.014
139.917999
149.014999
152.589996
64.913002
286.81601
213.615005
123.617996
357.527008
179.009995
148.753006
144.634995
452.337006
254.742004
177.085007
204.873001
333.678009
413.658997
359.608002
59.185001
354.632996
276.035004
200.938004
288.074005
134.873001
257.946014
270.098999
407.196014
297.95401
409.427002
108.485001
356.910004
424.279999
90.167999
403.316986
383.121002
334.575989
430.992004
55.445999
153.067993
435.535004
410.891998
347.910004
303.158997
167.076004
179.141006
315.420013
251.970993
237.095993
300.765015
106.078003
325.735992
337.957001
384.118988
125.338997
321.266998
281.993988
327.049011
140.395004
411.807007
99.917
146.807007
109.885002
181.315994
343.324005
146.138
202.037003
354.878998
82.255997
103.931
393.513
157.832001
136.75
171.162994
341.546997
75.081001
219.559006
397.091003
276.619995
222.595001
418.187988
443.503998
438.928009
58.319
462.510986
427.606995
265.519989
288.994995
58.331001
169.615005
406.761993
264.567993
308.860992
350.002014
358.739014
417.102997
103.794998
252.470993
166.684998
99.315002
144.365997
71.482002
139.386993
456.074005
413.612
421.634003
371.907013
248.046997
49.363998
96.299004
242.654999
307.13501
225.276001
170.257996
419.315002
182.671997
320.062012
416.313995
113.790001
296.410004
164.539993
363.894012
438.174011
459.45401
163.332001
74.560997
232.052994
142.087997
385.596985
366.731995
33.029999
422.687012
339.394989
364.029999
409.218994
424.799988
61.924
286.834015
281.73999
147.494003
380.735992
279.720001
432.332001
89.833
419.402008
105.178001
303.576996
329.170013
228.669998
163.518005
299.259003
360.424011
386.960999
280.826996
430.199005
227.983002
84.471001
406.631989
279.035004
254.994995
302.79599
102.126999
219.246994
134.559998
246.656998
149.427994
70.155998
352.38501
181.440002
125.613998
189.464005
345.447998
69.726997
221.354004
404.820007
411.55899
390.354004
390.489014
245.335999
142.335999
211.039993
400.444
414.061005
413.035004
335.237
265.77301
268.003998
438.921997
108.485001
71.207001
272.036011
112.561996
353.608002
355.928009
101.720001
76.428001
164.184998
294.545013
333.962006
76.594002
87.478996
285.971985
83.738998
108.974998
129.959
68.763
80.113998
246.544006
87.264
102.337997
264.950989
109.217003
156.082001
169.095001
292.673004
270.07901
292.769012
349.415009
305.188995
103.540001
300.585999
254.238007
101.574997
70.324997
316.324005
80.808998
165.042007
295.459991
364.417999
160.257996
140.296005
342.928986
150.522995
69.125
194.188995
241.889008
419.772003
381.272003
266.501007
299.845001
83.634003
92.921997
106.843002
225.565002
115.693001
120.246002
169.343002
186.029007
282.281006
361.078003
322.162994
102.886002
381.977997
195.589005
153.572998
118.434998
258.385986
123.403
83.694
105.144997
87.931999
363.122986
105.375
135.084
62.674999
272.627991
239.132004
171.932999
366.229004
406.355988
371.421997
417.742004
171.348999
394.865997
304.01001
452.489014
101.063004
69.183998
70.362
377.388
277.790009
132.279999
285.881012
140.052994
320.221008
282.5
180.727005
75.057999
148.397995
253.003006
417.162994
245.822998
290.506012
157.123001
460.141998
435.468994
363.605011
466.536011
183.121002
383.799988
448.562988
465.03299
212.438995
64.663002
88.723
418.415985
260.058014
276.231995
316.390015
389.855988
160.513
76.130997
467.022003
423.502991
355.480988
393.710999
413.946014
400.63501
87.391998
96.442001
192.942001
135.835007
60.238998
77.614998
277.355011
76.836998
303.019012
309.695007
85.653
273.613007
60.386002
158.281006
225.621994
430.76001
68.586998
468.425995
333.167999
148.389008
72.014
271.080994
267.609985
138.069
53.355999
217.768997
175.237
380.566986
176.175995
328.488007
376.234985
271.23999
140.151001
388.799011
255.751007
235.598007
343.644012
251.996994
331.829987
303.695007
157.742004
384.822998
325.26001
385.264008
320.299988
101.286003
253.026001
283.078003
112.135002
374.411987
143.709
358.519989
462.231995
141.884995
226.324005
405.515015
326.714996
346.292999
342.200989
270.518005
331.062988
431.364014
139.406006
320.063995
452.329987
454.144989
144.998993
357.248993
123.603996
429.223999
371.342987
260.872986
404.317993
281.989014
470.860992
262.003998
309.516998
142.901001
442.464996
273.964996
211.356995
349.985992
439.441986
56.181999
60.667999
412.161011
171.227005
393.994995
407.834991
345.552002
73.113998
174.442993
262.859985
56.455002
407.343994
329.649994
453.915009
376.006012
164.615005
349.950989
287.348999
173.901993
436.981995
346.355011
266.794006
304.914001
178.822006
435.674988
155.576996
110.553001
418.669006
280.138
282.101013
85.728996
356.751007
249.483994
428.565002
277.845001
99.510002
89.015999
69.092003
270.851013
72.955002
353.61499
188.376999
164.914993
173.751999
246.804001
147.199005
235.903
411.838989
102.431
88.247002
390.391998
166.356003
80.586998
354.371002
434.235992
325.130005
124.962997
50.557999
312.928009
434.196014
465.752014
362.040985
341.359985
24.195
408.988007
381.540009
402.351013
137.990997
417.56601
319.752991
258.147003
96.987999
319.717987
168.427002
319.194
259.580994
167.729004
471.550995
367.903992
370.600006
228.319
318.751007
266.403015
160.035004
175.617996
161.423004
94.200996
125.359001
319.083008
215.783005
416.612
146.550995
368.246002
112.146004
158.473007
173.341003
181.654999
70.103996
430.949005
135.248993
313.53299
194.268997
177.054001
367.880005
102.415001
283.924988
170.354996
87.662003
164.649002
157.335007
144.792007
327.834015
290.97699
303.341003
177.119003
430.683014
345.322998
344.023987
358.787994
153.210007
126.336998
179.408005
263.916992
76.864998
325.679993
370.152008
412.424011
313.355011
298.157013
378.682007
260.118011
432.519989
401.90799
274.730988
269.661987
266.098999
466.01001
436.523987
297.326996
258.661011
176.785995
87.992996
264.59201
345.386993
273.285004
210.695007
255.416
311.945007
161.763
438.864014
441.990997
415.569
243.820999
129.319
147.042999
439.656006
99.763
289.109985
290.513
178.498001
267.716003
282.105011
366.289001
132.516006
138.520004
338.832001
294.417999
454.813995
167.947998
175.723007
86.389
378.95401
71.641998
340.763
127.134003
144.188995
433.039001
250.498001
433.303009
377.10199
321.985992
452.03299
227.639008
428.309998
375.893005
296.425995
105.628998
118.564003
443.565002
318.289001
268.94101
415.536987
278.402008
256.351013
360.056
430.358002
339.683014
181.393005
97.700996
270.011993
417.610992
398.066986
418.42099
453.471985
424.186005
296.593994
309.856995
341.031006
379.769989
73.056
275.959015
331.747009
250.753006
401.087006
150.406998
76.752998
169.544998
257.312988
430.457001
78.228996
120.024002
340.161987
162.809006
259.507996
273.309998
199.544006
119.684998
72.434998
302.149994
437.391998
73.427002
211.983002
345.201996
275.21701
383.062012
253.695007
427.82901
156.423004
182.039993
274.351013
462.704987
80.511002
270.144012
455.846985
269.567993
67.773003
163.164001
199.304001
344.76001
139.994003
407.27301
195.783997
304.009003
107.469002
50.537998
433.842987
158.886002
129.537003
306.761993
180.453003
398.321014
404.480988
395.395996
156.139999
218.270996
323.076996
282.971008
278.497986
84.252998
336.920013
376.751007
395.835999
319.217987
446.821014
78.745003
217.658997
137.832993
370.489014
418.985992
142.259003
250.199005
325.141998
297.856995
228.520996
59.577999
209.645004
453.653015
386.773987
308.195007
274.651001
65.334999
101.911003
175.979004
342.975006
248.554001
173.005005
105.205002
343.441986
350.524994
322.252991
78.988998
54.297001
78.147003
84.073997
73.643997
270.988007
261.151001
369.681
302.684998
276.309998
391.437988
441.592987
389.21701
301.868011
332.975006
61.547001
267.192993
256.776001
215.445999
401.992004
334.279999
157.289993
284.052002
72.517998
260.419006
250.524002
161.416
283.020996
326.403015
57.397999
407.739014
252.589996
226.102005
72.181999
78.222
143.785995
248.574997
305.502014
339.597992
82.142998
243.878998
310.395996
455.938995
262.346008
288.390991
147.794006
321.225006
271.954987
343.845001
75.432999
416.102997
360.662994
194.169998
429.386993
77.363998
403.014008
70.153
154.610992
406.69101
376.410004
97.058998
96.771004
358.548004
346.89801
252.451996
426.06601
192.048996
328.825012
293.309998
257.38501
325.351013
319.729004
420.747986
112.197998
425.375
410.444
176.154999
114.669998
401.640991
396.261993
180.768005
99.038002
438.881012
316.339996
174.324005
328.757996
144.485001
249.505997
387.660004
87.093002
463.450989
431.338989
386.653992
415.201996
434.864014
224.052994
178.468002
440.371002
282.946991
270.343994
262.238007
215.779007
383.192993
170.195999
224.410004
87.686996
97.904999
290.696991
152.660004
351.863007
171.507996
302.785004
275.352997
398.480988
326.631989
97.151001
181.389008
207.485992
366.165009
100.343002
327.608002
176.455002
68.344002
378.123993
416.915985
139.733994
281.63501
190.054001
444.847992
340.57901
63.146
378.292999
273.065002
97.813004
320.622009
185.669006
473.703003
24.483999
412.091003
205.061996
387.076996
370.088013
76.366997
151.475006
29.521999
208.225006
95.373001
160.373993
146.220993
283.746002
279.040985
181.751007
409.786011
289.123993
213.195999
161.423004
298.419006
322.709015
280.876007
382.477997
84.125999
174.212997
351.070007
117.563004
164.625
311.829987
205.925003
294.925995
317.82901
147.365997
71.588997
194.880005
351.165985
140.703003
397.027008
94.189003
215.328003
102.426003
344.346008
333.808014
290.485992
206.615997
455.115997
278.895996
76.800003
243.326996
352.631012
148.660004
342.268005
342.846985
324.191986
360.988007
358.977997
390.992004
299.390991
203.037994
294.381012
329.169006
130.296997
367.48999
144.052994
432.006012
325.104004
404.565002
320.595001
175.610992
433.359009
397.621002
73.658997
316.027008
378.713013
245.934998
390.570007
249.983994
328.346985
77.617996
288.899994
248.367996
211.802002
350.609009
365.092987
298.55899
343.928009
346.196991
344.247009
358.319
324.983002
373.936005
259.334991
151.195999
336.997009
319.910004
291.950989
247.371994
155.651993
77.438004
98.464996
154.815002
226.987
245.671005
232.845993
170.934006
149.481995
358.247986
165.417007
254.391998
244.020004
433.979004
74.668999
407.270996
197.472
181.220001
371.238007
266.029999
409.21701
382.877014
114.547997
404.726013
242.753998
93.075996
60.666
280.398987
334.97699
183.686005
435.132996
364.247986
216.214996
72.888
389.563995
157.884995
198.061005
220.746994
409.695007
365.154999
183.472
382.11499
328.618011
333.996002
87.056
397.996002
426.53299
196.431
388.653992
398.041992
207.981003
76.360001
267.61499
300.113007
148.248993
92.306999
237.987
124.499001
92.168999
98.041
70.652
355.688995
276.632996
179.190994
259.859009
105.25
162.753998
152.190002
132.507004
258.367004
266.630005
292.63501
199.921005
291.968994
441.139008
90.570999
156.906006
44.348
316.960999
393.328003
352.893005
319.295013
333.735992
391.338013
419.11499
107.491997
183.457001
100.469002
58.148998
166.438995
371.234985
302.122009
434.735992
167.791
76.970001
169.772995
88.875
166.186005
370.381012
439.600006
225.410995
263.010986
336.962006
86.060997
138.197006
412.873993
138.320999
74.719002
286.018005
298.372986
311.81601
292.959015
430.203003
404.127014
117.292999
441.625
166.602997
223.067001
283.121002
257.003998
150.242004
346.567993
442.768005
327.471985
218.453003
71.360001
387.317993
97.149002
226.992996
234.658005
337.459991
261.569
153.382004
151.813995
400.225006
113.255997
257.829987
174.755005
371.658997
410.816986
410.968994
274.214996
429.25
333.81601
408.886993
144.095001
287.806
186.449997
447.127991
169.819
408.203003
95.992996
296.994995
82.318001
147.630005
394.014008
198.876999
225.738007
377.492004
396.618011
330.614014
391.707001
309.403015
73.32
212.580002
97.509003
299.308014
86.675003
277.989014
385.246002
385.959015
169.300003
403.140991
317.556
410.09201
363.571991
84.181999
151.227005
176.699997
344.612
54.167
286.217987
335.019989
331.001007
116.712997
295.329987
97.195
259.428009
241.511993
426.632996
406.520996
63.404999
361.436005
379.993988
138.009995
354.981995
133.296005
332.371002
395.049988
395.850006
341.876007
138.417007
196.807007
70.387001
245.037994
428.351013
150.449997
349.054993
311.506012
68.877998
370.610992
166.475998
104.400002
226.651993
401.196991
213.063995
358.035004
79.586998
319.174011
393.053009
291.835999
137.279999
403.631989
174.630005
266.860992
406.165009
270.903015
163.097
322.890015
300.397003
294.972992
163.610001
111.523003
415.471985
64.246002
167.699997
170.520996
84.876999
171.779999
274.222992
276.705994
386.942993
405.697998
101.296997
84.097
252.408005
378.540985
154.889999
69.556
383.60199
401.118011
378.358002
433.216003
163.024994
40.888
142.356003
414.045013
89.620003
156.181
408.997009
346.492004
72.890999
387.877014
134.304001
192.817001
263.433014
408.945007
328.544006
401.428009
243.158997
124.241997
103.211998
243.903
108.440002
278.07901
120.735001
156.869995
178.233994
402.009003
360.279999
314.890991
71.481003
235.912994
80.758003
442.54599
170.085999
324.546997
222.315002
179.199005
158.130005
54.550999
277.29599
302.360992
156.731995
410.261993
379.932007
317.088989
235.845001
312.161987
345.752991
391.566986
351.032013
165.990997
125.924004
428.001007
342.75
321.877991
182.382004
384.548004
101.642998
410.434998
138.641998
399.598999
147.949005
165.632996
348.501007
97.224998
166.901993
343.546997
363.811005
264.178009
199.192001
359.640991
74.566002
295.598999
422.855988
290.641998
336.321991
145.863998
416.57901
232.628006
278.605988
224.481995
283.066986
308.622009
411.105988
356.59201
127.920998
382.473999
393.781006
79.410004
264.089996
93.407997
56.235001
109.217003
302.606995
157.033997
237.731995
138.462006
256.975006
436.569
376.803009
447.808014
41.754002
145.516006
62.521999
341.408997
181.770996
383.121002
148.291
429.32901
258.213013
150.091995
225.423004
370.709015
406.423004
383.51001
457.565002
260.898987
86.235001
74.495003
370.786011
385.632996
426.46701
80.503998
274.59201
347.487
163.528
152.792999
96.172997
339.686005
89.195999
163.029999
361.953003
425.589996
70.236
140.511993
425.29599
169.134003
193.645996
453.074005
343.102997
109.096001
440.221985
367.928009
377.621002
75.623001
339.471985
325.281006
212.835007
289.411987
156.709
170.779007
91.709
375.01001
80.445
142.431
38.668999
331.184998
343.359009
278.678986
289.156006
93.642998
163.300003
369.51001
416.651001
89.018997
159.796997
316.333008
184.972
307.692993
320.852997
180.628998
306.605011
401.829987
234.240997
138.399994
421.140991
189.231003
75.636002
156.195999
188.830002
359.32901
362.572998
175.464005
414.601013
408.825012
130.658997
284.432007
249.268997
132.126999
343.442993
151.384003
422.764008
166.987
416.481995
109.268997
112.372002
405.946991
423.880005
356.048004
154.311996
82.528
131.164001
258.752991
186.382004
434.843994
214.748001
71.858002
74.245003
415.927002
418.856995
89.095001
308.346008
323.938995
353.807007
178.757996
138.608994
112.773003
471.365997
202.371994
381.23999
98.403
345.128998
231.653
142.179001
465.782013
315.213989
140.988998
367.084991
163.444
88.864998
399.480988
268.270996
468.875
151.671997
432.436005
77.856003
219.389008
344.312988
357.731995
172.016998
145.636002
164.259003
331.493988
67.574997
340.217987
155.054001
104.714996
267.143005
364.661987
202.912994
288.207001
57.477001
315.885986
447.316986
451.438995
451.384003
101.379997
78.402
370.286987
290.562988
419.558014
263.458008
138.792007
331.347992
375.790985
233.820007
392.415009
207.205002
162.774002
171.600006
317.272003
93.765999
429.627991
164.975998
296.438995
247.964005
330.001007
260.790009
342.645996
122.978996
212.537003
211.024002
167.429993
140.128998
342.687988
430.868011
219.878006
209.037003
353.421997
262.878998
437.635986
412.638
396.472992
203.552994
98.125999
308.752014
380.60199
460.664001
187.367996
76.397003
227.197006
168.507004
180.578003
66.889999
367.140015
210.742004
124.014999
111.598
158.343994
146.057007
406.005005
109.607002
222.345001
322.669006
273.713013
296.993011
368.178986
439.588989
146.792007
308.028015
166.697006
329.566986
142.259995
266.548004
205.113007
413.350006
107.244003
93.380997
318.235992
129.380005
320.734009
272.390015
353.627014
79.888
123.622002
85.425003
122.546997
420.131012
422.007996
120.481003
113.037003
84.649002
201.645004
49.078999
254.069
282.923004
303.098999
108.820999
440.880005
123.660004
380.328003
392.940002
426.683014
99.589996
107.380997
392.367004
109.783997
219.029999
113.351997
458.144989
159.531998
284.105988
281.928009
302.657013
85.002998
95.623001
126.794998
253.860992
180.358002
162.854996
335.924988
372.875
325.390991
48.554001
141.563995
271.148987
133.891006
390.184998
256.958008
89.113998
351.438995
345.062988
183.013
248.979004
194.481003
357.158997
307.311005
420.488007
426.623993
85.649002
298.033997
376.772003
270.138
367.346008
180.664993
307.286011
445.929993
315.346008
75.471001
271.596008
59.764999
194.692001
261.221008
104.393997
425.105988
317.450012
273.381989
83.054001
440.296997
290.566986
106.275002
258.665985
112.412003
213.373993
433.299011
265.584015
145.755005
81.689003
299.351013
96.694
87.267998
292.890991
236.787003
349.959015
276.385986
88.633003
309.554993
311.601013
55.071999
55.862999
255.544998
335.524994
379.445007
326.516998
264.540985
275.113007
272.235992
421.766998
445.601013
66.835999
419.242004
310.093994
159.546005
258.90799
156.988007
380.343994
176.826996
334.562012
363.760986
101.518997
158.098007
254.287994
383.035004
326.621002
227.156998
351.029999
346.748993
145.774994
292.131012
86.739998
427.131989
173.904007
375.30899
429.118011
444.993011
351.681
448.040985
90.628998
173.787003
392.322998
420.082001
161.701004
96.550003
177.444
367.837006
323.821014
288.085999
231.330002
302.390991
343.704987
93.008003
274.618988
86.745003
75.527
341.399994
153.158005
346.026001
209.048996
212.494995
309.464996
132.735992
394.15799
355.828003
373.347992
29.874001
77.667
432.730011
257.178009
164.735001
426.64801
165.335007
329.498993
157.410004
203.328003
71.196999
404.273987
326.903992
190.397995
180.082001
78.414001
288.450989
148.496002
354.68399
187.488998
298.529999
162.610001
405.229004
435.479004
192.783005
163.542999
393.212006
107.244003
335.77301
228.289993