A Clustering method for network visualization and
monitoringKDD4Service, san diego, 2011
Perikles RammosEricsson oss research
Yangcheng huangEricsson pm systems
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 2
Why clustering?› Problem: Telecommunications networks
produce too much data to be easily digested and manipulated by a human user.
› Solution: Reduce the number of objects shown on-screen without greatly reducing the conveyed information and usability.
› Prerequisites: – Large datasets– Real-time processing
› Dynamic nature of the network› Instant interactivity (e.g. levels of zoom)
– Some trade-offs in visualization accuracy are tolerated. 50 billion !!
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 3
Clustering algorithms
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 6
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
0
0.5
1
1.5
2
2.5
3
x
y
Iteration 5
› K-means– Fast algorithm– Need for prior knowledge of…
› Number› Initial location
…of cluster centroids– Sensitivity to initial choices– Only convex-shaped clusters
are detectable
› DBSCAN– Concave-shape detection– Can handle noise– May not respond well to
datasets with varying densityFigures from Tan,Steinbach,Kumar,Introduction to Data Mining (online slides)http://www-users.cs.umn.edu/~kumar/dmbook
ε = 1
MinPoints=5
ε = 3
MinPoints=20
ε = 1
MinPoints=20
ε = 3
MinPoints = 5
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 4
› A grid of Cells is overlaid on the area of interest.› Resolutions Xres,Yres are automatically calculated.› For every point {x,y} in the dataset
– Physical coordinates {x,y} are converted togrid coordinates {i,j}
– Point is assigned at Cell {i,j}. (population increased by 1)› “Topographical” features will have a decisive role.› O(n) time› Density Histogram replaces the Dataset in further calculations, greatly boosting
performance, since it contains much less elements to iterate on.
Density histogram
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 5
Cell hierarchies
› Each Cell is compared with it’s neighboring Cells.› Looking for a Denser (more populated) Cell…
– If such a Cell is found, the central Cell will connect with the larger Cell, becoming it’s ‘Child’.– If no such Cell is found, this Cell is labeled as a ‘Maximum’.
› A tree-like hierarchy of cells is thus formed.› Density Histogram is now comprised of several distinct trees.› A Maximum is associated with each tree, occupying it’s root.› Each of those trees will be a Cluster!› Cells are self-organized into hierarchies.
in an expanding neighborhood.
Clusters emerge naturally.
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 6
trees – maxima - clusters› Parameter: Maximum Allowed Neighborhood– area of “tolerance”– self-adjustment optimizes clustering
› Small Clusters are aggregated in hierarchies.
3x3 11x11
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 7
Performance results› Qualitative–Clusters of arbitrary shape can be
detected.–Cell hierarchy can make
visualization more intuitive.–Resistance to sparse noise, by
rejecting isolated maxima.
› Quantitative– Linear time O(n)–Comparison with k-means› maxima as initial centroids› greatly outperforms k-means
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 8
Conclusions & future work› Suitable for real-time visualization of large-scale
telecommunication networks due to:–Qualitative properties–Quantitative properties
› Future plans:–Higher dimensions : 3D, 4D–Dynamic Clustering– Always room for improvement
› Author’s [email protected]@ericsson.com
Slide title minimum 32 pt
(32 pt makes 2 rows
Text and bullet level 1 minimum 24 pt
Bullets level 2-5minimum 20 pt
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´¶·¸¹º»¼½ÀÁÂÃÄÅÆÇÈËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂăąĆćĊċČĎďĐđĒĖėĘęĚěĞğĠġĢģĪīĮįİıĶķĹĺĻļĽľŁłŃńŅņŇňŌŐőŒœŔŕŖŗŘřŚśŞşŠšŢţŤťŪūŮůŰűŲųŴŵŶŷŸŹźŻżŽžƒȘșˆˇ˘˙˚˛˜˝ẀẁẃẄẅỲỳ–—‘’‚“”„†‡•…‰‹›⁄€™−≤≥fiflĀĀĂĂĄĄĆĆĊĊČČĎĎĐĐĒĒĖĖĘĘĚĚĞĞĠĠĢĢĪĪĮĮİĶĶĹĹĻĻĽĽŃŃŅŅŇŇŌŌŐŐŔŔŖŖŘŘŚŚŞŞŢŢŤŤŪŪŮŮŰŰŲŲŴŴŶŶŹŹŻŻȘșΆΈΉΊΌΎΏΐΑΒΓΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΪΫΆΈΉΊΰαβγδεζηθικλνξορςΣΤΥΦΧΨΩΪΫΌΎΏЁЂЃЄЅІЇЈЉЊЋЌЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁЂЃЄЅІЇЈЉЊЋЌЎЏѢѢѲѲѴѴҐҐәǽẀẁẂẃẄẅỲỳ№
Do not add objects or text in the footer area
Clustering for Network Monitoring | Public | © Ericsson AB 2011 | 2011-07-20 | Page 10
Questions?› Experimental Setup
– Datasets: Random bounding boxes out of a synthesized, realistic, widely varied dataset, containing 273,000 points.
– 3000 random bounding boxes for O(n) testing– 500 random bounding boxes for k-means comparison– Hardware: Laptop PC, 2x1.83Ghz CPU, 4 GB RAM– Software: Windows Vista SP2, Visual Studio 2008 C#
› K-means– maxima as initial centroids : Very compatible with k-means, since a large percentage
of the points will be very close to the initial centroids– Euclidean distance– Iterations: avg=8.72, sd=5.93– Clusters: avg=27.3, sd=18.2– Maximum Allowed Neighborhood : 11x11
› Why isn’t a Large Neighborhood applied directly?– Performance: When a Denser cell is found nearby, no need to check further away.– Better Resolution: Detection of Irregular Shapes & Distributions
› Xres,Yres Resolutions calculation– – Preserving aspect ratio of bounding box– Ceiling at 100x100 cells