42
Outline Introduction Clouds with inline text Clouds with arbitrary placement Conclusions Tag-Cloud Drawing Algorithms for Cloud Visualization Owen Kaser 1 Daniel Lemire 2 1 University of New Brunswick, Saint John campus 2 Universit´ e du Qu´ ebec ` a Montr´ eal WWW2007 Workshop on Tagging and Metadata for Social Information Organization Kaser/Lemire Tag-Cloud Drawing

Tag-Cloud Drawing: Algorithms for Cloud Visualization

Embed Size (px)

DESCRIPTION

Tag clouds provide an aggregate of tag-usage statistics. They are typically sent as in-line HTML to browsers. However, display mechanisms suited for ordinary text are not ideal for tags, because font sizes may vary widely on a line. As well, the typical layout does not account for relationships that may be known between tags. This paper presents models and algorithms to improve the display of tag clouds that con- sist of in-line HTML, as well as algorithms that use nested tables to achieve a more general 2-dimensional layout in which tag relationships are considered. The first algorithms leverage prior work in typesetting and rectangle packing, whereas the second group of algorithms leverage prior work in Electronic Design Automation. Experiments show our algorithms can be efficiently implemented and perform well.

Citation preview

Page 1: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Tag-Cloud DrawingAlgorithms for Cloud Visualization

Owen Kaser1 Daniel Lemire2

1University of New Brunswick, Saint John campus

2Universite du Quebec a Montreal

WWW2007 Workshop on Tagging and Metadata for SocialInformation Organization

Kaser/Lemire Tag-Cloud Drawing

Page 2: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

1 Introduction

2 Clouds with inline text

3 Clouds with arbitrary placement

4 Conclusions

Kaser/Lemire Tag-Cloud Drawing

Page 3: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Overall goalsRelated work

Introduction

After the tagging . . . the tag cloud.

Visualize aggregate information, promote interaction.

What is a good cloud layout?

. . . and how can we compute it?

Kaser/Lemire Tag-Cloud Drawing

Page 4: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Overall goalsRelated work

(optional slide) Do. . .

Do tell me lots. Use

font FAMILYpoint size

colour

location and position. Cluster related tags.

HCI people can help here.

Kaser/Lemire Tag-Cloud Drawing

Page 5: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Overall goalsRelated work

(optional slide) Don’t. . .

pda and busy page withcloudugly

page impatientperson, clipart?

Don’t waste space. This favoursrectangles.

Don’t be ugly. Banishwhitespace blobs.

Don’t be weird. Use simpleHTML+CSS.

Don’t require horizontalscrolling. Limit width.

Don’t be tardy. Considercompute power and bandwidth.

Kaser/Lemire Tag-Cloud Drawing

Page 6: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Overall goalsRelated work

(alternative slide) Setup

Tag clouds are fixed-width rectangles.

Tags are displayed in a range of font sizes. For layout, onlytags’ bounding boxes matter.

A small gap (at least) is required between tags.

(alternative 1) Related tags should be in close proximity.

(alternative 2) Tags should be organized into lines.

Colour, font family etc. are ignored — except for their effectson bounding boxes.

Kaser/Lemire Tag-Cloud Drawing

Page 7: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Overall goalsRelated work

Related work

Hassan-Montero and Herreno-Solana : clustering, 8× 12 display(?)Millen et. al: auxiliary index, find tag in large cloudBielenberg(sp)

Kaser/Lemire Tag-Cloud Drawing

Page 8: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Inline text

Inline text uses only “inline” HTML: eg span, font, b or br.Not tables.A tag cloud with inline text consists of a sequence of rows of text.Most tag clouds on the Web are currently like this. . .. . . unlike most visualizations of topic/concept maps.

We have the power to insert 〈br〉

tag1 tag2 tag3tag4

Kaser/Lemire Tag-Cloud Drawing

Page 9: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Badness model

������������������������������

������������������������������

��������������������

��������������������

������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������

Tag 3Tag 1 Tag 2

Tag 4 T5 line 2

line 1

lines 3+

whitespace (here blue!) isbad. It’s left/right of tags+ above/below short tags.Required inter-tag spaceisn’t bad.

Line k ’s badness bk depends only on the tags on line k (and cloudwidth). Independent of spacing.Overall badness: the Lp norm of line badnesses: p

√∑k |b|p

except L∞ is “max”.

Kaser/Lemire Tag-Cloud Drawing

Page 10: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Cannot reorder tags

First problem variant: tags given some desired order: oftenalphabetic.Only ability we have is to insert br elements.Cloud (with/without br’s) will be rendered normally by thebrowser’s greedy line-breaking algorithm.Goal: insert them so as to minimize total badness.Almost like the text-justification problem for TEX. So we modifythe Knuth-Plass (K-P) algorithm [Knuth and Plass, 1982].

Kaser/Lemire Tag-Cloud Drawing

Page 11: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Knuth-Plass

K-P uses dynamic programming.It tabulates ci , the minimum total (L2) badness of justifying thefirst i words.K-P considers hyphenation + handles last line in paragraphspeciallyBut we consider varying font sizes instead; allow L1, L2 or L∞.

Kaser/Lemire Tag-Cloud Drawing

Page 12: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Implementation and Experiments

For n tags, our Knuth-Plass variant runs in O(n2) time with smallconstants: 140 tags takes less than 1 ms.Range of font sizes: ???Test data:

1 65 ZoomClouds (obtained from a REST API they publish).

2 80 tag sets obtained from Project Gutenberg e-Books. (Everye-Book is tagged by each long word occurring in it. Then takethe most frequent k tags for the e-Book.)

Kaser/Lemire Tag-Cloud Drawing

Page 13: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Badness vs Area

Within the cloud, we see only tags, (bad) whitespace,mandatory (small) inter-tag gaps.

The L1 norm combines line badnesses by summing them.

Ignoring gaps, we see minimizing badness also minimizes area.

The L2 norms could lead to more area than greedy linebreaking. Beauty vs. area trade-off (?) Not really: < 1%area increase over L1.

L∞ disastrous for area: 200% increase!

Kaser/Lemire Tag-Cloud Drawing

Page 14: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Measured Badnesses

2 custom barcharts, for L1 and L2. Show opt, greedy x Gut, Zoomx Alpha, weight.Note how well sorted-by-weight does, even with greedy linebreaking.

Kaser/Lemire Tag-Cloud Drawing

Page 15: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Reorderable tags

We gain additional flexibility to reduce badness/area if we canrearrange tags. How much? . . . experiments. . .

Sorting by weight looks promising

If inter-tag gap = 0, we have the (level version) Strip-PackingProblem (SPP) [Lodi et al., 2002].

SPP is NP-hard but good approximation algorithms areknown.

Kaser/Lemire Tag-Cloud Drawing

Page 16: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Strip-Packing Problem (SPP) - extra slide

pictures showing a bunch of boxes with tagspicture showing level-oriented and non-level oriented packings

Kaser/Lemire Tag-Cloud Drawing

Page 17: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Approximation Algorithms for SPP / Cloud Layout

Although we have non-zero inter-tag gap, we look to SPP forinspiration. See [Coffman, Jr. et al., 1980]. We find

First-Fit Decreasing Height (FFDH)

Next-Fit Decreasing Height (NFDH)

(our variant) Next-Fit Decreasing Height, Weight (NFDH).

We also tried 10 random orderings, followed by ourKnuth-Plass variant

Add pictures!

Kaser/Lemire Tag-Cloud Drawing

Page 18: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Fixed-order tagsReorderable tags

Results

PicturesFor L1 or L2, NFDHW is good.For L2, simply sorting by weight and using our K-P variant is alsogood.

Kaser/Lemire Tag-Cloud Drawing

Page 19: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Clouds with arbitrary placement

Let’s

relax level-oriented packing (full strip packing)

consider clustering related tags eg tags t1, t2 commonly onthe same e-Book lines ⇒ t1 placed near t2 in the cloud

Something seems very familiar about this problem. . .

Kaser/Lemire Tag-Cloud Drawing

Page 20: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Tag clouds and VLSI placement

(from [Cong et al., 2006])

Kaser/Lemire Tag-Cloud Drawing

Page 21: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Issue comparison

Similarities:

place oriented rectangles in the plane

cluster related rectangles

(sometimes) gaps needed between rectangles

(for “fast floorplanners”) use interactively (speed required)

(optional) slightly resize rectangles

Differences:

electrical connectivity is transitive

tags cannot be rotated

cloud “legibility gaps” differ from VLSI wiring space

clouds with 1M tags — unlikely

Kaser/Lemire Tag-Cloud Drawing

Page 22: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Our thesis

Thesis, part I: “Fast floorplanner” techniques can be adapted fortag-cloud layout.In particular, min-cut layout [Bruer, 1977] and floorplansizing [Stockmeyer, 1983] are mature.

Thesis, part II: Existing full-scale VLSI floorplanners/placerscannot be easily adapted.Part II is largely untested.

Kaser/Lemire Tag-Cloud Drawing

Page 23: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Mincut placement of tags

Key idea is recursive bipartitioning, a divide-and-conquer approach.

1 Divide tags into 2 balanced groups, keeping strongly relatedtags together.(Mincut refers to (broken) relationships from two tags beingin different groups)

2 Recursively lay out each group.Terminal propagation [Dunlop and Kernighan, 1985] considerstags’ relationships outside the current subproblem.

3 Compose the two layouts (horizontally or vertically). Tricky toenforce “nearly width w” constraint.

Kaser/Lemire Tag-Cloud Drawing

Page 24: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Slicing tree

One Tag Tag Two

is left of

Tag3is above

Mincut approach generates aslicing tree.

Leaves: individual tags

Internal nodes: recordwhether two layouts arecomposed vertically orhorizontally

Kaser/Lemire Tag-Cloud Drawing

Page 25: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

From slicing tree to HTML

To transform a slicing tree to HTML, use nested tables.1× 2 or 2× 1, depending on vertical or horizontal composition.

One Tag Tag Two

is left of

Tag3is above

<table><tr><td><table><tr><td>One Tag</td></tr><tr><td>Tag Two</td></tr></table></td>

<td>Tag3</td></tr></table>

Kaser/Lemire Tag-Cloud Drawing

Page 26: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

“Floorplan” Resizing

Gain better packings from adjusting a tag’s aspect ratio(while holding tag area near constant).

One extreme adjustment is 90◦ rotation.

That’s Bad

for tags!

Given s total shape alternatives, in O(s log s), we can find theoptimal size for each tag [Shi, 1995].

Note: Remarkably hard to play with font-size,font-family, letter-spacing, font-weight to getalternatives.

If only browsers really supported font-stretch!

Kaser/Lemire Tag-Cloud Drawing

Page 27: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Results measured

Results consider

layout area and

weighted distance =∑

tags i , j dist(i , j) ∗ weight(i , j), where

dist(i ,j) is Euclidean distance between the bounding-boxorigins of tags i and j . Weight is the strength of relationship.

We compared mincut placement against greedy-sorted-by-height,greedy-random-order and a block packer, compaSS.

Kaser/Lemire Tag-Cloud Drawing

Page 28: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Results

No. Tags Min-cut gr-sorted gr-random compaSS

Area (pixels2)

20 31 37 46 2950 63 62 85 59100 111 99 139 98200 192 165 231 170

Weighted distance (kilopixels)

20 61 124 120 6550 166 282 271 180100 296 465 482 382200 438 693 765 654

Min-cut placed even the largest clouds within 100 ms.

Kaser/Lemire Tag-Cloud Drawing

Page 29: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Ummm...isn’t that a VLSI problem?Mincut placementResizing: optional topicResultsPragmatics

Client vs. server

Who has what information? Client (browser) or server (tagdatabase)

Client: display width

Client: bounding box sizes.

Server: tags, requested font sizes

Server: semantic tag relationships

Tag relationships may be large; bandwidth problems?.

Layout may require significant processor horsepower.

Kaser/Lemire Tag-Cloud Drawing

Page 30: Tag-Cloud Drawing: Algorithms for Cloud Visualization

OutlineIntroduction

Clouds with inline textClouds with arbitrary placement

Conclusions

Conclusions and Future Work

Even simple heuristics can reduce badness and areasubstantially. (A factor of 3 in some cases.)

Simply sorting tags by weight and using K-P (or greedy) iseffective for area minimization.

Min-cut placement is fast, clusters tags well, and is not badfor area.

Future: take other “beauty metrics” (eg symmetry) fromgraph drawing and HCI tag-cloud usability studies. Do layoutfor them.

Adopt compute-intensive methods from VLSI design: createbeautiful clouds of “static content” (cloud-of-the-month).

Kaser/Lemire Tag-Cloud Drawing

Page 31: Tag-Cloud Drawing: Algorithms for Cloud Visualization

ZoomClouds, Jan. 24, 2007

Kaser/Lemire Tag-Cloud Drawing

Page 32: Tag-Cloud Drawing: Algorithms for Cloud Visualization

Animation of mincut placement

to do, can be optional

Kaser/Lemire Tag-Cloud Drawing

Page 33: Tag-Cloud Drawing: Algorithms for Cloud Visualization

Sizing of slicing floorplan

to do, can be optional

Kaser/Lemire Tag-Cloud Drawing

Page 34: Tag-Cloud Drawing: Algorithms for Cloud Visualization

pict

baby

diaper

tears

bottle

beer

rioting

tear gas

gas

sports

rioting

Left Right

baby

tears

bottle

diaper sports

gas

beer

tear gas

Kaser/Lemire Tag-Cloud Drawing

Page 35: Tag-Cloud Drawing: Algorithms for Cloud Visualization

pict

tears

sports

rioting

gas

beer

external

tear gas

bottle

Kaser/Lemire Tag-Cloud Drawing

Page 36: Tag-Cloud Drawing: Algorithms for Cloud Visualization

pict

1

2

4 6

8

HH

V V

HH

Vrioting

teargassports

bottlediaper tears

3

5 7

H

beer gas

baby

rioting

2

63

8

7

5

4

1

baby

diaper bottle

tears

beer

gas

tear gas sports

Kaser/Lemire Tag-Cloud Drawing

Page 37: Tag-Cloud Drawing: Algorithms for Cloud Visualization

pict

Kaser/Lemire Tag-Cloud Drawing

Page 38: Tag-Cloud Drawing: Algorithms for Cloud Visualization

pict

Kaser/Lemire Tag-Cloud Drawing

Page 39: Tag-Cloud Drawing: Algorithms for Cloud Visualization

pict

0 5000

10000 15000 20000 25000 30000 35000 40000 45000 50000

wei

gh

t

alp

ha.

wei

gh

t

alp

ha.10

FF

DH

W

FF

DH

NF

DH

greedyoptimal

0 2000 4000 6000 8000

10000 12000 14000 16000 18000

wei

gh

t

alp

ha.

wei

gh

t

alp

ha.10

FF

DH

W

FF

DH

NF

DH

greedyoptimal

0 5000

10000 15000 20000 25000 30000 35000 40000 45000 50000

wei

gh

t

alp

ha.

wei

gh

t

alp

ha.10

FF

DH

W

FF

DH

NF

DH

greedyoptimal

0 2000 4000 6000 8000

10000 12000 14000 16000 18000

wei

gh

t

alp

ha.

wei

gh

t

alp

ha.10

FF

DH

W

FF

DH

NF

DH

greedyoptimal

Kaser/Lemire Tag-Cloud Drawing

Page 40: Tag-Cloud Drawing: Algorithms for Cloud Visualization

Bruer, M. A. (1977).Min-cut placement.Journal of Design Automation and Fault-Tolerant Computing,1(4):343–362.

Coffman, Jr., E. G., Garey, M. R., Johnson, D. S., and Tarjan,R. E. (1980).Performance bounds for level-oriented two-dimensional packingalgorithms.SIAM J. Comput., 9(4):808–826.

Cong, J., Romesis, M., and Shinnerl, J. R. (2006).Fast floorplanning by look-ahead enabled recursivebipartitioning.IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, 25(9):1719–1732.

Kaser/Lemire Tag-Cloud Drawing

Page 41: Tag-Cloud Drawing: Algorithms for Cloud Visualization

Dunlop, A. E. and Kernighan, B. W. (1985).A procedure for placement of standard-cell VLSI circuits.IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems, 4:92–98.

Knuth, D. E. and Plass, M. F. (1982).Breaking paragraphs into lines.Software — Practice & Experience, 11(11):1119–1184.

Lodi, A., Martello, S., and Monaci, M. (2002).Two-dimensional packing problems: A survey.European Journal of Operational Research, 141(2):241–252.

Shi, W. (1995).An optimal algorithm for area minimization of slicingfloorplans.

Kaser/Lemire Tag-Cloud Drawing

Page 42: Tag-Cloud Drawing: Algorithms for Cloud Visualization

In ACM/IEEE International Conference on Computer-AidedDesign (ICCAD), pages 480–484.

Stockmeyer, L. (1983).Optimal orientation of cells in slicing floorplan designs.Information and Control, 57(2–3).

Kaser/Lemire Tag-Cloud Drawing