Page 1: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Visualization Taxonomies and Techniques

Text: Words, phrases, sentences, …

University of Texas – Pan American

CSCI 6361, Spring 2014

Page 2: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Text is ubiquitous– Documents, and more

generally text, are a primary information source

• (Verbal has its place!)

– Access to documents and text has grown exponentially in recent years due to networking infrastructure

• WWW • Digital libraries • Social media

• Visualization to aid users in understanding and gathering information from text and document collections

Page 3: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Visualization can aid in performing tasks

• For example: – Which documents contain text on topic XYZ? – Which documents are of interest to me? – Are there other documents that are similar to this one (so they are worthwhile)? – How are different words used in a document or a document collection? – What are the main themes and ideas in a document or a collection? – Which documents have an angry tone? – How are certain words or themes distributed through a document? – Identify “hidden” messages or stories in this document collection. – How does one set of documents differ from another set? – Quickly gain an understanding of a document or collection in order to

subsequently do XYZ. – Understand the history of changes in a document. – Find connections between documents.

From Stasko, 2013

Page 4: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

IntroductionChallenges of Text Visualization

• Text is unlike other data types seen so far, for example

• Context and Semantics– Context relevant to understanding and meaning– Indeed, natural language understanding a challenge of the nth + 1 century

• Dimensionality– Inherently, “not dimensional”, so must create “visually realizable” visual encoding – Often, first step is n-D, then 2- or 3-D

• Modeling Abstraction– Consider level of “understanding” require for task– Match analysis task with appropriate tools and models

Page 5: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

IntroductionRelated topics

• Information Retrieval – Active search process that brings back particular/specific items (will discuss that

some today, but not always focus) – InfoVis and HCI can help some…

• Visualization may be most useful when not sure precisely what you’re looking for when retrieving information

– More of a browsing paradigm than a search one – But, this is part of the information retrieval task

• Define information need, formulate “query”, examine/evaluate results, … repeat

• Sensemaking – Gaining better understanding of facts at hand in order to take some next steps

• A principle focus in visual analytics – Visualization can help make large document collection more understandable more

rapidly • Which is good: “Overview, zoom and filter, details on demand”

Page 6: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Recall, Visualization Pipeline: Visualization Stages

• Data transformations:– Map raw data (idiosynchratic form) into data tables (relational descriptions

including metatags)

• Text is nominal data– A word, or any text unit, does not map easily to any quantitative representation! – The “Raw data --> Data Table” mapping is a principle element of creating any

visual representation• How do you get numbers from words, sentences, …??

– Will see several solutions


VisualFormDataset Views

User - Task




F F -1



Page 7: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Recall, Visualization Pipeline: Visualization Stages

• Visual Mappings:– Transform data tables into visual structures that combine spatial substrates,

marks, and graphical properties

• And … visual mappings, as well, requires at least “the usual level” of creativity


VisualFormDataset Views

User - Task




F F -1



Page 8: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Understanding Text Content

Page 9: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Understanding Text Content

• Visual representations of words, phrases, and sentences – Main goal of understanding, versus search

• Visual presentation always part of text presentation – – Standard typography uses layout, font, style, color … – Electronic media, especially – pick a web page– “Single text content”

Page 10: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Single Text ContentWord Counts

• 2012 National Conventions• NY Times:

Page 11: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Tag / Word Clouds

Page 12: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Tag / Word Clouds

• Lots of popular interest – E.g., on web

• Idea is to show word/concept importance through visual means – Tags: User-specified metadata (descriptors) about something – Sometimes generalized to just reflect word frequencies

• Not a new technique– Milgram’s ‘76 experiment to have people label landmarks in Paris – Flanagan’s ‘97 “Search referral Zeitgeist” – Fortune’s ‘01 Money Makes the World Go Round

Page 13: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Tag / Word CloudsExample: US State of the Union Speeches

• Guardian•


• body link

Page 14: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Flickr Tag Cloud

Page 15: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

delicious Tag Cloud

Page 16: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Alternate Order

Page 17: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Many Eyes Tag Cloud

• Word pairs

Page 18: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


Page 19: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Wordle“Beautiful Word Clouds”,

• Tightly packed words– Horizontal, vertical or diagonal

• Size correlated with frequency

• Multiple color palettes

• User gets some control

• Layout Algorithm – Details not published – Sort words by weight, decreasing

order for each word– Init position randomly chosen

according to distribution for target shape

– Update position moves out radially

Page 20: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Wordle“Beautiful Word Clouds”,

• Course schedule, table of topics, and assignments

Page 21: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Wordle“Beautiful Word Clouds”,

• Course schedule, table of topics, and assignments

Page 22: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Wordle“Beautiful Word Clouds”,

• Course schedule, table of topics, and assignments

Page 23: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Can be many variations …

• A bit more order• Order the words more by frequency

Page 24: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Mani-WordleUser control

• Mani-Wordle – Start with nice default algorithm – Give user more control over design

• Alter color (within a palette) • Pin words, redo the rest • Move and rotate words


– Koh et al TVCG (InfoVis) ‘10

Page 25: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Tag / Word CloudsConclusions

• Weaknesses– Sub-optimal visual encoding (size vs. position)– Inaccurate size encoding (long words are bigger)– Font sizes are hard to compare – May not facilitate comparison (unstable layout)– Word frequency may not be meaningful

• Most use words vs. stems

– Does not show structure of the text– Studies have even shown they underperform (Gruen et al CHI ’06)

• Why so popular?– OK for “quick look”– Serve as social signifiers that provide a friendly atmosphere that provide a

point of entry into a complex site – Act as individual and group mirrors – Fun, not business-like

Page 26: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

BTW - Text Analysis Toolsvoyeur:

• Book• + tools for

text analysis and visualization

Page 27: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

BTW - Text Analysis Toolsvoyeur:

• .

Page 28: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014
Page 29: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Visualization and Information Retrieval

• Examples so far have focused on representing a single document– …, or, really, set of words as no consideration of even word order, let alone

sentence structure, etc.

• Principle question is how might visual representations aid text, or document, search

– I.e., how to find the proverbial needle in a haystack, where the haystack is all the documents on the www or a digital library

– Term information retrieval refers to this search and its history antedates computers

• IR entails:– Determine information need– Query formulation– Retrieval – Assessment of results– Reformulation of query or even information need– Repeat (until information need met)

Page 30: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Visualization and Information Retrieval

…• IR entails:

– Determine information need– Query formulation– Retrieval – Assessment of results– Reformulation of query or even information need– Repeat (until information need met)

• Provide visual representations that during this process– Document collection visually, support browsing, …

• Even for determining information need!

– Show query results visually – Show how query terms relate to results – … any aspect

Page 31: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Visualization and Information Retrieval

• Provide visual representations that during this process– Document collection visually, support browsing, …

• Even for determining information need!

– Show query results visually – Show how query terms relate to results – … any aspect

From Stasko, 2013

Page 32: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Evaluating Query ResultsTileBars, Hearst, 1996

• Hearst points out that query responses do not include:

– How strong the match is – How frequent each term is – How each term is distributed

in the document – Overlap between terms – Length of document

• Document ranking is opaque

• Inability to compare between results

• Input limits term relationships

Page 33: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Goal : Minimize time and effort for deciding which documents to view in detail

• Show the role of the query terms in the retrieved documents, making use of document structure

• Graphical representation of term distribution and overlap

• Simultaneously indicate: – Relative document length – Frequency of term sets in document – Distribution of term sets with respect to the document and each other

From Stasko, 2013

Page 34: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• TileBars screen:

From Stasko, 2013

Page 35: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

TileBarsDocument representation

• Visual representation of retrieved documents

• Video: TileBars-80mb-chi96_05_m1.mpeg

From Stasko, 2013

Page 36: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014




Page 37: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Clearly visually provides the information intended about each document

• Ease/effort/time of comparison?– Surely would improve with use

• … ?

Page 38: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Evaluating Query ResultsSparkler

• Abstract result documents more – Havre et al InfoVis ‘01

• Show “distance” from query in order to give user better feel for quality of match(es)

• Also shows documents in responses to multiple queries • Visualizing One Query

– Triangle – query – Square – document

• Distance between query and documents represents their relevance

From Stasko, 2013

Page 39: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Visualizing Multiple Queries • Six queries here • Bullseye allows viewer to select quality results

From Stasko, 2013

Page 40: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Test Example • Text Retrieval Conference (TREC-3) test document collection • AP news stories from June 24–30, 1990 • TREC topic: Japan Protectionist Measures • Sparkler found 16 of 17 relevant documents

From Stasko, 2013

Page 41: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Evaluating Query ResultsRankSpiral

• Compare search results from different search engines– Spoerri InfoVis ’04 poster

From Stasko, 2013

Page 42: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Color represents different search engines Compare search results from different search engines

From Stasko, 2013

Page 43: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Color represents different search engines Compare search results from different search engines

Page 44: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Evaluating Query Results ResultMaps

• Treemap-style vis for showing query results in a digital library– Clarkson, Desai & Foley TVCG (InfoVis) ‘09

From Stasko, 2013

Page 45: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Representing Multiple Documents

Page 46: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Representing Multiple Documents

• Previously, have seen various techniques for comparing multiple documents that are results of query, i.e., a subset of all documents

• Also, may want to just show everything, and then let user do “manual search”, or user-directed search

• Such displays of all documents also support the type of search common in visual analytics

– Query, browse, connect, drill-down

• Will see:– Parallel word clouds– Tree layout of synonyms– …

Page 47: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple DocumentsParallel Tags Clouds

• Tag clouds increase size of word as f(frequency)• Showing multiple documents as tag clouds allows visual inspection

– Automated and user directed, visual analytics

• Parallel Tag Clouds - name says it all– Video - Collins et al VAST ‘09 – different circuit courts–

Page 48: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple DocumentsDo different district courts differ in cases they handle?

• .

Page 49: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple DocumentsDo different district courts differ in cases they handle?

• .

Page 51: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple DocumentsCounting Words: Overview & Timeline

• State of the Union Addresses •

•NY Times demo

Page 52: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple Document Word UseDocuBurst

• Sets of synonyms grouped together

– Uses WordNet – show words from a

document in terms of their hypernym (ISA) links

– Size – # of leaves in subtree – Hue – diff synsets of word– Shade – frequency of use

• Demo, etc. –


Page 53: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Show patterns of words or n-grams – Don et al. CIKM ‘07

• Video

Page 54: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• Show patterns of words or n-grams – Don et al. CIKM ‘07

•Check Video

Page 55: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Combinations of words, phrases, and sentences

Page 56: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple SentencesSeeSoft Display

• Originally for software visualization

• One line of text on each horizontal line

• Color highlight for attributes

– E.g., for software, how often modified, days since modification

– E.g., for text where a particular word appears in a sentence,

• Conversations might be revealed

• Detail view in pop up window

Page 57: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Multiple SentencesTextArc - Simple Single Document Visualization

• Visualize an entire book – Word appearances – Sentences – … –

• Sentences laid out on circumference in order of appearance in spiral

• Frequently occurring words inside spiral

• Selecting word draws line on to sentences with word

– A kind of “visual concordance”

• Significant interaction

Page 58: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


Page 59: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Concordances and Word Frequencies

• From field of literary analysis

• Concordance– An alphabetical index

of the principal words in a book or the works of an author with their immediate context

• Word of interest in center, with text in which appears to left and right

• As, KWIC– Key word in context

Page 60: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Word Tree

• Shows context of a word or words – Follow word with all the phrases that follow it

• Wattenberg & Viégas TVCG (InfoVis) ‘08

• Font size shows frequency of appearance • Continue branch until hitting unique phrase • Clicking on phrase makes it the focus • Ordered alphabetically, by frequency, or by first appearance

Page 61: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Word TreeInteraction

Page 62: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Word TreeFrom King James Bible

• From King James Bible

Page 63: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

WordTreeMany Eyes

Page 64: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Finding Structure: Phrase Nets

Page 65: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Find Structure: Phrase Nets

• Concordances show local, repeated structure of word context• Phrase Nets In Many Eyes, van Ham et al.

• Other types of patterns– Lexical: <A> at <B>, <A> and <B>, <A> at <B>, <A> (is|are|was|were) <B>– Syntactic: <Noun> <Verb> <Object>

• Visualize extracted patterns in a node-link view– Occurrences -> Node size– Pattern position -> Edge direction

Page 66: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Phrase Net(larger next slide)

Portrait of the Artist as a Young Man<A> and <B>

Page 67: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Phrase Net

Portrait of the Artist as a Young Man<A> and <B>

Page 68: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Phrase NetsThe Bible: <A> begat <B>

Page 69: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Phrase NetsOld and New Testaments: <A> of <B>

Page 70: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Phrase Nets(<A> and <B>) and (<A> at <B>)

Page 71: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• .

Page 72: Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014


• F. Viegas, M. Wattenberg, "Tag Clouds and the Case for Vernacular Visualization", interactions, Vol. 15, No. 4, Jul-Aug 2008, pp. 49-52.
