Analytical, Visual, and Interactive Concepts for Geo ...ct/pub... · the core of research in cartography and geo-visualization. For decades, many powerful methods have been developed

Analytical, Visual, and Interactive Concepts for Geo-Visual Analytics

Heidrun Schumanna,∗, Christian Tominskia

aUniversity of Rostock, Albert-Einstein-Straße 21, D-18059 Rostock, Germany

Abstract

Supporting the visual analysis of structured multivariate geo-spatial data is a challenging task involving many differentaspects. In this paper, we describe a systematic view of this task based on Chi’s data state reference model. Theanalytical, visual and interaction components of the systematic view will be instantiated with specific examples thatdemonstrate how their tight interconnection facilitates exploration and analysis of geo-spatial data. In particular, weaddress the visualization of hierarchical structures on maps applying an extended focus+context concept. Moreover, weintroduce an approach to extracting association rules from geo-spatial data and visualizing them on maps.

Keywords: Visual analytics, geo-spatial data, hierarchical data, extended focus+context, association analysis2000 MSC: 68U05, 68U35

1. Introduction

Visualizing data in a geo-spatial frame of reference is atthe core of research in cartography and geo-visualization.For decades, many powerful methods have been developedin both fields. The focus of cartography is on the gener-ation of expressive and aesthetic maps, where interactivemanipulation is not a primary objective. Geo-visualizationprovides highly capable tools that allow users to navi-gate in space and time, to explore associated spatiotem-poral data, as well as to interactively manipulate the vi-sual representation and the underlying data. Today’s web-based mapping applications (e.g., OpenStreetMap, GoogleMaps) and virtual globes (e.g., NASA World Wind, GoogleEarth) most impressively demonstrate that spatiotempo-ral data analysis and exploration is for everyone, and notjust for experts.

The newly established research field of visual analyt-ics, defined as “the science of analytical reasoning facil-itated by interactive visual interfaces” [21], goes anotherbig step forward. It aims at providing support for gaininginsight into massive, heterogenous, dynamic, and ambigu-ous data by means of analytical computations, interactiveexploration, and visualization of data, models, alterna-tives, and state information. Although first approachesto achieving this goal have already been developed, thereare a variety of open questions and problems to be solved,specifically when considering visual analytics in space andtime (see [15], Chapter 4).

In this work, we consider visual analytics support forthe analysis of multivariate geo-spatial data and their struc-

∗Corresponding AuthorEmail addresses: [email protected]

(Heidrun Schumann), [email protected] (ChristianTominski)

tural relationships. We use a human health dataset as anexample for such data. The data domain contains datarecords with quantitative and qualitative variables. Addi-tionally, data abstractions (e.g., aggregates or clusters) andstructural information (e.g., a cluster hierarchy or linksbetween data records) associated with the data are of in-terest.

Such data can be analyzed with regard to many dif-ferent goals and questions [4, 23]. Here we focus on twoaspects in particular. First, we discuss the problem ofrepresenting hierarchies on maps and demonstrate our so-lution based on an extended focus+context concept. Andsecond, we introduce a novel method for representing as-sociation rules on maps, which allows not only for com-municating interesting patterns, but also for reducing thevolume of data to be visualized. Although our work inte-grates analytical and visual methods as well as appropriateinteraction techniques – the fundamental functionality ofvisual analytics tools – it clearly has a bias towards thevisual part, as documented by a number of examples ofmap-based visualizations.

This article is structured as follows. To establish a sys-tematic view, we will briefly describe a conceptual frame-work based on Chi’s data state reference model in Sec-tion 2. In Section 3, the focus will be on an extendedfocus+context approach to geo-visual analytics. Section 4serves to introduce the data we are dealing with. The noveltechniques for visualizing hierarchies as well as for extract-ing and visualizing association rules in a spatial contextare described in Sections 5 and 6. We conclude with abrief summary of the introduced concepts and directionsfor future work in Section 7.

Preprint submitted to Journal of Visual Languages & Computing January 29, 2011

ct

Sticky Note

This is an unedited preprint of Schumann & Tominski: "Analytical, Visual, and Interactive Concepts for Geo-Visual Analytics". The final version of this article can be found in Elsevier's Journal of Visual Languages and Computing. http://dx.doi.org/10.1016/j.jvlc.2011.03.002

ct

Draft

2. Design of a Conceptual Framework

Next, we describe a conceptual framework that canserve as a basis for designing tailored visual analytics solu-tions for multivariate geo-spatial data. We do not proposea specific instantiation of the framework, but describe thegeneral functionality involved. The data state referencemodel by Chi [6] is an excellent basis for this purpose. Itdescribes the visualization process with all its facets in ageneric manner. As Figure 1 illustrates, the model system-atically describes: (1) the mapping process for transform-ing the raw data into image data via analytical and visualabstractions and (2) the application of a broad range ofoperators at different levels of the data stages. The visu-alization process is organized as a pipeline of four stagesthat the data have to pass through:

1. Raw data: In our case, the input data are multivari-ate data elements and relationships among them aswell as the geo-spatial context.

2. Analytical abstractions: Abstractions are obtainedfrom raw data values, for instance, by calculatingaggregated values or by computing statistical mo-ments. Analytical abstractions represent the majorfeatures of a dataset.

3. Visual abstractions: Visual variables such as size,color, or shape encode data values visually. Theydefine the specifics of a visualization technique anddecide about the design of a visual representation.Commonly, the geo-spatial frame of reference is givenin terms of a geometrical description, which is a vi-sual abstraction.

4. Images: The result of the rendering process are pixeldata to be displayed on a visual output device.

Data is transformed and propagated through the pipelineby operators of two different classes: stage operators andtransformation operators. Stage operators work within asingle data stage, while transformation operators transferdata from one stage to another. Since we have four datastages, there are three different types of transformationoperators (see top of Figure 1):

Figure 1: Schematic view of Chi’s data state reference model.

• Filtering operators transform raw data values intoanalytical abstractions. In the light of very largedatasets, this step is important, because one firsthas to condense the data down to a displayable size.In our case abstractions can be defined in terms ofdata, relationships, and/or the geo-spatial frame ofreference. Thus, a particular challenge is to answerthe question regarding which abstraction should becomputed for which kind of raw data and how torelate both kinds of data.

• Mapping operators take analytical abstractions orsubsets of the raw data as input and map these ontovisual abstractions. This step is crucial and has tobe handled carefully. In our case, visual encodingshave to be found for data, structures, and geographicspace, and these encodings should be visually distin-guishable. For example, it should be clear to theuser whether a line represents a part of a boundaryin the geographic space or a relation between dataelements.

• Rendering operators process the visual abstractionsin order to obtain the visual representation. Here,an interesting question is how to manage differentrendering facilities, e.g., representing the geographicspace in Google Maps, but defining the multivariatedata display by another rendering system.

The data state reference model distinguishes betweendata space and view space. Data operators, analytical op-erators, and filtering operators belong to the data spaceand provide automatic computations on data. They canbe summarized as the analytical component of our concep-tual framework. On the other hand, visual operators, im-age operators, and rendering operators belong to the viewspace; they define the visual component of the framework.The question is where to put the remaining mapping oper-ators. Taking into account the importance of the mappingstep, one option could be to define a separate componentthat provides the mapping functionality. However, we ar-gue for including the mapping operators into the visualcomponent. We do this because the mapping strategydetermines the design of the visual representations, andtherefore it is tightly interconnected with the operators ofthe view space.

All operators are characterized by their functionality aswell as by the data they work on. Moreover, each operatoralso has its specific means to enable users to interactivelymanipulate the associated parameters and data. Concep-tually, one can capture the interaction functionality bya separate interaction component to allow for a uniformaccess on data and operators through a consistent userinterface.

In summary, the conceptual framework consists of threemain building blocks: the analytical component, the visualcomponent, and the interaction component. In order to be

2

broadly applicable, these components should provide op-erators for all kinds of data: multivariate data, geo-spatialdata, structural relationships, and data abstractions.

In the literature one can find various instantiations ofthe conceptual framework. The OECD eXplorer [14] vi-sualizes statistical data and supports collaborative web-enabled exploration and analysis. Other approaches ad-dress the challenge of extracting and visualizing meaning-ful patterns from massive movement data [3]. Not onlyclustering, but also self-organizing maps are utilized tomake sense of geo-spatial data [16]. An application areaparticularly related to our work is the visual analysis ofepidemics [20]. All these approaches have in common thatthey integrate analytical, visual, and interactive means.But each of them focuses on specific aspects and data,and hence, implements a different subset of operators ofthe conceptual framework.

In the following, we take a closer look at the visualcomponent. It has to include diverse operators that areable to handle various kinds of data (e.g., multivariategeo-spatial data, structural relationships, and abstractionsderived from the data), and to generate different kinds ofvisual output (e.g., 2D and 3D presentations). This way,data can be explored with respect to various tasks andanalysis goals [4]. There are many powerful techniquesto visualize the data to support users in finding answersto their questions. Choropleth maps [17], cartograms [8],and glyphs on maps [22] as well as small multiples [24] canbe considered classic means to visualize multivariate spa-tial data. Because of the immense size of today’s datasets,more and more techniques address the visualization of dataabstractions, most prominently the visualization of clus-ters [2].

In order to enhance the functionality of the visual com-ponent, we introduce three novel approaches. First, in thenext section, we extend the well-known focus+context con-cept. This sets the stage for the visualization of complexdata on maps. Second, in Section 5, we utilize the ex-tended focus+context concept to visualize hierarchicallyorganized information on maps. Third, in Section 6, wediscuss the combination of analytical and visual methodsusing the example of extracting and visualizing associationrules. Association rule analysis is a well-established datamining technique, but it is hardly applied in the context ofgeo-visual analytics. The goal is to extract relationships(i.e., associations) between specific data values, and in thisway to crystallize patterns in large datasets.

3. Extended Focus+Context

The focus+context concept has been used in informa-tion visualization, geo-visualization, and cartography for along time. Focus+context techniques (see [19, 7] for sur-veys) combine a focus representation that displays a partof a greater whole at high detail with a surrounding con-text representation that shows information at lower detail

to provide an overview. The different levels of granularitycan be achieved in two ways:

• by semantic properties of the data, e.g., showing in-dividual values in the focus, but only average valuesor cluster centroids in the context, or

• by applying a combination of different rendering stylesto emphasize the region of interest and deemphasizethe context.

The first case addresses the question “What has to bedisplayed?” and has to be handled by the analytical com-ponent. The second case focuses on the question “Howdoes it have to be displayed?” and has to be realized bythe visual component of the conceptual framework.

Usually, current approaches work on the basis of oneunique domain of reference. In the context of geo-visualanalytics, this concept has to be extended. Here, we havetwo domains of reference: the geographic spatial domainand the associated data domain. Consequently, there arenow two foci: the map focus that corresponds to the cur-rent geographic region of interest and the data focus thatencompasses the data associated with that region of inter-est. These foci are mutually dependent: Selecting a mapfocus defines the data focus, and vice versa.

Since we have two foci, we consequently consider twocontexts: one for the map space and the other for the datadomain. The map context consists of all geographic regionsthat are not focus regions. The data context includes alldata outside the data focus, but at a coarser level of ab-straction. However, it cannot always be taken for grantedthat the degree of abstraction is adequate for showing thedata context on a map. Reasons for this might be thatthe data is not sufficiently abstractable or that the taskat hand prohibits stronger abstraction. As a consequence,the data context might still be too complex (i.e., morethan just a single data value per region) to keep visualclutter under control. Therefore, it makes sense to displaythe data context not for the entire map context, but onlyfor a well-defined subset of the map context. We call theseregions the immediate context.

In summary, full data detail is shown in the focus,coarser data abstractions are shown in the immediate con-text, and no data is shown for the map context (only thegeographic regions are retained to maintain map coherencefor better user orientation).

Because this limits the amount of information visibleat a time, focus+context approaches have to support theflexible specification of the focus and the context by var-ious interactive and automatic means. When using maprepresentations, an intuitive way of specifying the map fo-cus is the selection of regions of interest on the map. Thismeans that the map focus is selected first, and then thecorresponding data focus is derived by searching the datadomain for data associated with the map focus. This mightrequire additional operations if the granularity of the mapdoes not match the level of abstraction of the data domain.

3

(a) Interactive Selection. (b) Neighborhood.

(c) History. (d) Analytical Calculation.

Figure 2: Different options to determine the immediate context.

In such cases, further aggregation of data records or au-tomatic enlargement of the map focus can be applied toresolve inconsistencies between map granularity and dataabstraction. The opposite way is that the user first selectsdata items of interest as the data focus. The map focuscan then be derived directly from the selected data records,since each record is associated with a concrete geo-spatialreference.

These two ways of defining the focus (i.e., select regionsand derive data vs. select data and derive regions) corre-spond to direct lookup and inverse lookup, which are twoelementary tasks of exploratory analysis of spatiotemporaldata as defined in [4].

Once the foci have been specified, the map regions ofthe immediate context have to be found. As illustratedin Figure 2, there are several options for determining theimmediate context with respect to a user selected focus:

• Interactive Selection – The user manually selects re-gions for the immediate context.

• Neighborhood – The system determines a k-neighbor-hood of the focus and automatically sets it as theimmediate context.

• History – The immediate context is constructed fromregions that have previously been focus regions dur-ing the course of the user’s interactive exploration.

• Analytical Calculation – The immediate context isdefined based on calculations in the data domain.

For instance, similarity measures can be computedand if the similarity value relative to the data focusis above a certain threshold, the corresponding mapregion is part of the immediate context. It wouldalso be possible to cluster the data. Then, all regionswhose data fall into the cluster that contains the dataof the focus are part of the immediate context.

After the regions of the immediate context have beendetermined, the corresponding data context has to be ex-tracted. To this end, appropriate analytical methods gen-erate data abstractions at a coarser level of detail.

Finally, the representation style of the focus and thecontext need to be chosen. The goal is to communicatethat the focus bears relevant detail information. For themap focus this can be achieved by means of accentuation ofthe focus region. A variety of visual modifications can beused to accentuate the focus, as long as there is no conflictwith the actual visualization of the data. As examplesof accentuation of the map focus one can imagine drawingwider contour lines or using a dedicated highlighting color.

For the data focus, a suitable visualization techniquemust be selected and the resulting visual representation ispositioned within the map focus. We will illustrate thispoint in more detail in Section 5.

The depiction of the context should make clear thatthe information displayed there is not as relevant as thatof the focus. The implication for the map regions of theimmediate context is that the visual attributes have to bechosen so that they are less prominent than those used to

4

draw the map focus. Moreover, if the immediate contextsupports the notion of distance to the map focus, the vi-sual representation should convey this fact. In Figure 2,we illustrate this by varying the color in the immediatecontext to attenuate the regions with decreasing proxim-ity from the focus region. Similar visual effects can beachieved by modifying the width of contour lines or byalpha-blending entire context regions. Note that proxim-ity can have different meanings, including recency of theinteractive selection, distance in the neighborhood graph,or similarity of regions according to computations of ananalytical method. The remaining map context is usuallyrepresented in a most abstract fashion, for example, bydrawing only the boundaries of the associated regions.

For the presentation of the data context, the generalgoal is to avoid overloading the picture. This can be ac-complished by semantic means (e.g., data abstraction orfiltering) or graphical means (e.g., graphical simplificationor de-accentuation by reducing opacity). Figure 5 in thenext section demonstrates this with an example.

Taken together, the extended focus+context conceptintegrates visual, analytical, and interactive means to as-sist users in conducting geo-visual analytics. By focusingon relevant aspects and omitting less relevant ones, the ex-ploration and analysis can be made scalable with regard tothe size and complexity of geo-spatial data. The conceptsupports multiple focus regions and can be combined withother existing cartographic display techniques (e.g., carto-graphic lens). It is generally applicable and not restrictedto a specific use case.

4. Structured Spatiotemporal Data

In the next two sections, we will detail the discussionabout representing data in their geo-spatial frame of refer-ence. First, in Section 5, we describe a focus+context-based visualization of hierarchies on maps. Second, inSection 6, we include analytical methods into our consid-erations and introduce a visualization of data abstractionsderived from association analysis. In this section, we de-scribe the data that will be used in those sections.

Our examples are based on the scenario of visually ana-lyzing public health data. Each data entry contains a timestamp, a geo-reference, and a diagnosis identifier (threeindependent variables), as well as the number of peoplesuffering from that diagnosis at that location at that time(one dependent variable). More formally, this correspondsto a mapping f : Time× Space×Diagnoses→ N.

In practice, we are interested in higher level aggre-gations of the data, say the number of people sufferingfrom problems with the respiratory tract (aggregation ofdiagnoses) in February (aggregation in time) at the levelof counties (aggregation in space). Hierarchies, irrespec-tive of being natural or artificial, are an effective means tostructure data along different levels of aggregation [9]. Ourdata contain three hierarchical structures (see Figure 3):

Figure 3: Data model structured along three different hierarchies.

• the hierarchically structured dimension of time (i.e.,days, weeks, months, quarters, years),

• a hierarchically organized set of geographic regions(i.e., municipalities, counties, state), and

• the hierarchical classification of diseases based on theICD-10 1.

These three hierarchies allow us to flexibly access thedata and combine different levels of abstraction, and thusthey are the primary means to support interactive infor-mation drill-down via focus+context mechanisms.

5. Representation of Hierarchies on Maps

As with our data, hierarchical relationships are inher-ent in datasets of a variety of applications. The relation-ships can be explicit (e.g., hierarchical structure of time)or can be implicit in the data, which requires extraction byappropriate means (e.g., hierarchical clustering). Becauseof their widespread use, hierarchies and associated datavalues are often the subject of analytical investigations.By looking at the hierarchies in their geo-spatial context,we hope to find answers to a number of analytical ques-tions in terms of the structural properties of a hierarchy aswell as in terms of the data associated with the hierarchy:

• What pattern does the hierarchical structure (e.g.,data abstraction or categorization) exhibit for a spe-cific region?

• Are the hierarchies’ structural properties similar acrossthe regions of the map?

1http://www.who.int/classifications/icd/

5

http://www.who.int/classifications/icd/

(a) Hierarchy layout in 2D. (b) Layout projected onto hemisphere in 3D. (c) Folding of subtrees.

Figure 4: Magic Eye View showing the ICD-10 classification hierarchy with 43k nodes.

• How are the data values distributed in the hierarchyof a particular region?

• Is there a significant difference of the value distribu-tion when comparing a specific region with its neigh-bors?

• Do certain patterns emerge when focusing on thedata along a path through the map?

As a concrete example, one can consider an analyst whowants to find out how the total number of people being sickin February is distributed across different diagnoses, andwhether there are any significant differences depending onwhere the people live. To satisfy the analyst’s needs, wehave to display the hierarchical categorization of diseases(ICD-10) and the corresponding number of people afflictedin February as well as the geo-spatial context, all at anappropriate level of aggregation. From a conceptual pointof view, this corresponds to the visualization of multipleattribute hierarchies (i.e., hierarchies where each node isattributed with one data value) on a geographic map.

Recent work has addressed techniques to display hier-archical relations on maps [11]. We extend this work andpresent an alternative display. Our approach uses an ex-plicit 3D representation of hierarchy layouts, rather thanan implicit 2D embedding of layouts into the regions ofthe map, and it makes use of the extended focus+contextconcept.

In order to integrate visual representations of hierar-chies into a map, a number of requirements have to bemet:

• The visual mapping of the hierarchy layout has tobe compact.

• The hierarchy should be clearly distinguishable fromthe map.

• The relation between a hierarchy and the map regionit is associated with must be visualized.

• Different levels of map granularity and data abstrac-tions need to be considered.

• Appropriate interaction techniques are required in-cluding data selection, view manipulation, and infor-mation drill-down.

The combination of the extended focus+context con-cept with the Magic Eye View (MEV) approach [18] fulfillsthese requirements. In order to create a compact visualrepresentation of the hierarchy, we first compute a layoutof the hierarchy nodes in two dimensions. To this end,we adapt the Walker layout described in [5] so as to usepolar coordinates, rather than Cartesian coordinates. Asillustrated in Figure 4(a), this results in a layout wherethe hierarchy’s root node is located in the center, and allother nodes are arranged on concentric circles around thecenter, with the deepest hierarchy nodes being located onthe outermost circle. This layout is then projected onto athree-dimensional hemisphere as depicted in Figure 4(b).

This compact representation is easily distinguishablefrom the map background and allows us to place severalsuch hemispheres into the map display much like floatingballoons anchored at individual map regions. However,Figure 4(b) also clearly makes a point that large structuressuch as the ICD-10 tree with its 43k nodes need additionaleffort to make them comprehensible. Therefore, we limitthe number of nodes shown per MEV as illustrated in Fig-ure 4(c), as well as the overall number of MEVs to beembedded into the map display. This reduction of the in-formation load is steered depending on foci and contextsinteractively selected by the user or automatically deter-mined by analytical means (see previous section).

Figure 5 illustrates our approach. For the focus re-gion, a larger MEV is shown with a high elevation abovethe map. Around the focus, the immediate context shows

6

Figure 5: Illustration of the combination of the extended focus+context concept with Magic Eye Views.

increasingly smaller MEVs at increasingly lower elevation.Additional MEVs are displayed for regions that have beenin the focus earlier during the visual exploration. Usingsize and elevation are purely graphical means to realizethe focus+context concept. In terms of analytical means,our implementation utilizes automatic folding of subtreesbased on the classic filter fisheye approach [10]. For thefocus, the most relevant nodes are shown, while for theMEVs in the context, more and larger subtrees are auto-matically folded retaining just essential structural infor-mation. Moreover, it is quite easy to visualize attributesassociated with nodes and edges of the hierarchy. Nodeattributes could be encoded with node color or size, andedge attributes could be visualized by varying edge width.The latter is shown in Figure 5, where complexity of sub-trees (estimated by Strahler numbers [12]) is indicated byedges of different width.

A number of interaction techniques allow users to adaptthe visual representation, including switching to differentlevels of spatial granularity, selecting a different focus, ac-cessing the data of a different time period, and manuallyadjusting the level of folding of hierarchy nodes in theMEVs. It goes without saying that the map and the MEVscan be rotated and zoomed interactively. Optionally, therotation of MEVs can be linked, meaning that rotating oneMEV automatically rotates all other MEVs as well. An-other useful option is to unlink map and MEV rotation.This way, the map can be viewed from different perspec-tives, but the view on the data (i.e., the nodes facing theuser) remains constant.

The approach we presented here is generally applicableto visualize hierarchically structured data that are associ-ated with the regions of a map. It can also be used to

complement the visual analysis of data abstractions, forexample, to represent hierarchies generated through hier-archical clustering (see [25]), where individual regions mayyield different cluster hierarchies.

While clustering is widely applied to support geo-visualanalytics, there are alternative data abstractions whoseusefulness in the context of visual analytics has not yetbeen sufficiently explored. Association rules represent onesuch data abstraction; they will be described in the nextsection.

6. Extraction and Visualization of Association Rules

The extraction of association rules from data is a well-established data mining technique. The goal is to detectrelationships or associations between specific data valuesof categorical or classified variables, and in doing so, to en-able analysts to uncover hidden patterns in large datasets.The goal is to arrive at statements such as “When a largenumber of people suffer from diagnosis A in region B attime C it is likely that diagnosis X occurs in region Y attime Z”.

6.1. Basics of Association Rule Analysis

Let use briefly review the basic notations behind as-sociation rule analysis [1, 13]. Let I = {i1, i2, . . . , in}be a set of n distinct literals, called items. Let D ={T1, T2, . . . , Tm} be a set of m transactions, called thedatabase. Each transaction Ti ∈ D has a unique trans-action ID i and contains a subset of the items, i.e., Ti ⊆ I.Let X and Y be itemsets with X,Y ⊆ I and X ∩ Y = ∅.Then, an association rule is an implication of the form

7

X ⇒ Y , where X and Y are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) ofthe rule, respectively.

In order to find interesting rules among all possiblerules it is necessary to define measures of significance. Sup-port and confidence are most commonly used for this pur-pose. They are defined as follows. The support supp(X) ofan itemset X is defined as the proportion of transactions in

D that contain X, that is, supp(X) = |{T∈D|X⊆T}||D| . The

support of a rule X ⇒ Y is defined as supp(X ⇒ Y ) =supp(X∪Y ). Using the support, the measure of confidence

of a rule can be formalized as conf(X ⇒ Y ) = supp(X∪Y )supp(X) .

This formalization gives an estimate of the probability offinding Y in D provided that D contains X.

Using the two significance measures support and con-fidence as well as user-specified thresholds for both, asso-ciation rule mining is usually conducted in two phases:

1. Find frequent itemsets, i.e., find all itemsets in Dwhose support is above the minimum support thre-shold.

2. Use the frequent itemsets and the minimum confi-dence threshold to form rules.

The problem with the first step is that one has to searchthe power set P(I), which is of size 2n − 1 (empty set notincluded). Thanks to the downward-closure property ofsupport [1], efficient algorithms can be applied to mine forfrequent itemsets. The second step is a rather straightfor-ward filtering process.

6.2. Determining Association Rules

These basic steps are commonly agreed on in associa-tion rule mining. However, finding expressive associationrules is not an easy endeavor. In particular, we shouldraise awareness about the fact that conclusions can only bedrawn with regard to the given literals (or items). For thehuman health data we consider here, no ad-hoc specifica-tion of the literals is possible. Our data are of multivariatequantitative nature, but literals have to be qualitative.

We suggest an interactive approach to derive literalsfrom the data. Users specify different parameters and se-lect statistical measures and transformation functions tobe computed, and in this way interactively generate sen-sible literals, which later on can be used to find frequentitemsets, and eventually to mine to association rules com-plying with the task at hand.

A data object of our human health dataset is given bya tuple (t, r, d, n) where t, r, and d are independent vari-ables with t representing a point in time, r a geographicregion, d a diagnosis, and n being a dependent variableholding the number of people suffering from d in region rat time t. While time, regions, and diagnoses can be easilyinterpreted as qualitative values from their respective ag-gregation hierarchies (e.g., time: 3rd quarter, region: cityof Rostock, diagnosis: influenza), the overall tuple is not,

because the measured number of people is a quantitativevalue.

In order to determine qualitative literals (t, r, d, q), whereq is a qualitative value, first appropriate statistical mea-sures are computed such as minimum, maximum of dis-eases, mean, variance, and others. Then, a set of qualita-tive literals is constructed. In this initial study, we usedfive options for q:

• q = THRES – A literal (t, r, d,THRES) is constructed,if for the tuple (t, r, d, n) the number of sick people nexceeds a certain threshold, that is, n is significantlylarge.

• q = MIN/MAX – A literal (t, r, d,MIN/MAX) isconstructed, if the number of sick people reachesa minimum/maximum within a short time windowaround t.

• q = INC/DEC – A literal (t, r, d, INC/DEC) is con-structed, if the number of sick people steadily in-creases/decreases during a short time period startingat t.

While THRES resorts to basic thresholding, the otheroptions capture a notion of short-term trends in the data.These literals are then used to construct itemsets. Thenext steps follow the general procedure for associationrule mining: frequent itemsets are determined based onthe minimum support threshold and those itemsets areextracted that comply with the minimum confidence thre-shold. We experimented with itemsets with one or twoliterals only. This is sufficient to finally arrive at associa-tion rules such as ”If during a period of five days influenzareaches a peak in the city of Stralsund as well as in thecounty of Nordvorpommern, then there is a maximum inthe city of Rostock one week later with a probability of 96.6percent.”.

Let us once more emphasize that the construction ofliterals is an interactive process, because the different di-agnoses require individual thresholds and trend windowsof individual length, which usually have to be determinedby domain experts. Using different parameter values (e.g.,for thresholds or for the minimal support) as well as differ-ent options for constructing literals also leads to differentrules, and thus to different results and insights. This iswhy an interactive specification is indispensable in orderto support application-specific and task-dependent asso-ciation rules. Also note that our list of options to con-struct literals is not exhaustive. Alternative transforma-tions could take spatial proximity into account to constructfurther literals that are particularly useful with regard tospatial analysis.

6.3. Visualizing Association Rules

While the association rules determined through the an-alytical component of our conceptual framework, the vi-sualization of rules on a map has to be realized by the

8

Figure 6: Visualization of association rules using arrows on a map.

(a) Single-rule thumbnails reduce occlusion problems. (b) n-to-m rules at different levels of abstraction.

Figure 7: Small multiples for visualizing association rules.

9

visual component. This is a novel challenge, because wehave to represent association rules in their spatial frameof reference, rather than plain data values. It makes senseto utilize multiple views to present the various aspects in-volved. The association rules are represented by a tableview (see bottom of Figure 6). Each row of the table rep-resents one rule. The first two columns contain the literalsof the left-hand side of the rule, the columns three and fourcontain those of the right-hand side. The last two columnsof the table show the values for support and confidence.Antecedent, consequent, and significance measures are in-dicated to the user by differently colored column headers.This table representation provides an initial overview ofthe given rules. The rule table provides all the function-ality that is typically included within such table views:scrolling, sorting, as well as selecting and highlighting ofrules.

The visualization of the rules selected within the tableview integrates different visual means with a map display:

• arrow plots (see Figure 6),

• small multiples (see Figure 7), and

• color coding (see Figure 8).

The arrow plot shows the implications LHS ⇒ RHSas straight arrows superimposed on the map (see top ofFigure 6). Arrow plots communicate well the spatial rela-tionships of rules, and each arrow encodes rule support andconfidence with arrow width and saturation, respectively.

However, the number of rules that can be presentedon a single map is limited. Moreover, rules with self-reflections (circular arrows) and rules with multiple literalsin LHS and/or RHS (n-to-m arrows) increase clutter andthus cognitive efforts during the analysis. Rules that con-tain geographic regions from different levels of granularity(e.g., state vs. counties) cannot be represented on a singlemap, because a map usually shows only the regions of oneselected level of granularity.

Therefore, our prototype provides different arrange-ments of small multiples [24]. Figure 7(a) shows the pre-sentation of small multiples where each map visualizes asingle rule. Here, arrow-arrow and arrow-map occlusionsare minimized to clear the view on a per-rule basis at thecost of reducing the map display to a small thumbnail. Al-though details cannot be recognized by these small-sizedimages, the general dependences of regions can be com-municated. Consider, for instance, the fourth and fifththumbnails in the top row. We can infer that both rulesoriginate from the same region, but the rules’ destinationsas well as their significance are different. This might indi-cate a stronger relation between the regions highlighted inthe fourth thumbnail.

Figure 7(b) shows an arrangement that is suitable forrepresenting rules with n-to-m dependencies and depen-dencies across different levels of spatial granularity. Eachrow of the arrangement illustrates one rule as follows. The

Figure 8: Color coding the frequency of regions in antecedents (left)and consequents (right) of association rules.

left side shows n map thumbnails, each of which highlightsa region of the antecedent, and the right side shows m mapthumbnails highlighting regions of the consequent. Differ-ent hues are used to indicate THRES, MIN, and so forth.The length of the central arrow encodes the rule’s support,and the saturation reflects the confidence associated withthe rule. For both arrangements, sorting the small mul-tiples according to significance measures offers additionalassistance to the user.

In addition to just sorting according to rule signifi-cance, we can also use color-coding to emphasize regionsthat are particularly relevant with regard to their fre-quency in itemsets. Figure 8 shows the color coded fre-quency of the regions of the antecedent (left) and of theconsequent (right). One fact that can be derived from thefigure is that the city of Rostock (saturated blue regionin the center) occurs more frequently in the consequentthan any other region. Another derivable fact is that theisland of Rugen (top-right-most region) rarely occurs inrules at all. Such insights can be quite useful for adjust-ing the construction of qualitative literals from the data.The color-coding could also be extended to find correla-tions. For example, when marking one region, all rulesthat include this region as antecedent could be selectedautomatically and the regions belonging to consequentsof this antecedent could visualize the rules’ confidence byvarying color saturation.

Association analysis is a well-established method forknowledge extraction. Doubtless, it makes sense to con-sider this method in a geo-visual analytics framework aswell. Our concept shows that, in addition to the analyticalextraction of association rules, the interactive definition ofappropriate literals as well as the visualization of rules andcorresponding significance measures have to be considered.We have developed a very first approach in this directionand believe that it is worth conducting further research onthis topic.

10

7. Conclusion and Future Work

In this article, we discussed several aspects of geo-visual analytics. From talks and discussions with prospec-tive users of visual analytics we recognized the need forclarification of the process of generating interactive visualrepresentations of data. Therefore, we started with a con-ceptual view. We used Chi’s data state reference modelfor this purpose because it describes data states and oper-ators in a generic way. The set of operators is conceptuallynot limited, and thus any functionality of visual analyticsframeworks can be smoothly related to this model. More-over, the operators can be connected arbitrarily, and inthis way a tight connection of analytical and visual meth-ods can be achieved encompassed by a common interactionlayer.

We introduced a number of novel concepts (operatorsof the conceptual framework if you will) that are related toanalysis, visualization, and interaction. The extended fo-cus+context concept is quite useful when larger geo-spatialdatasets have to be explored. It combines a visualizationstrategy with interactive and automatic means to steerthe foci and contexts. We also addressed a new visual-ization challenge, namely the integration of structural in-formation (hierarchical structures in our case) with a mapdisplay. For this purpose we combined the extended fo-cus+context with the Magic Eye View technique. Finally,we shed some light on association rule analysis as an al-ternative to the most often used clustering. Associationrule analysis has not received much attention in previouswork on geo-visual analytics. With our work we hope tohave made a first step that illustrates the potential of thisnovel combination of analytical and visual means.

For all techniques described here we see that a wholeensemble of tightly connected operators is involved. Whileeach operator has its specific functionality, the ensemblealways follows the same goal: “support analytical reasoningfacilitated by interactive visual interfaces” [21].

What will be useful for the future development is toinvestigate new combinations of operators and to increasereusability and exchangeability of operators – conceptu-ally, but also implementation-wise. We have seen that theinteractive focus+context concept can be combined with ahierarchy visualization to enable the exploration and anal-ysis of hierarchically structured data in a geo-analyticsscenario. But it is not yet possible to simply plug ourconcept into other frameworks and tools. The same holdsfor analytical operators. While existing frameworks are of-ten based on clustering, there are other analytical meanssuch as association analysis, principal component analy-sis, trend analysis, and others. But it is not yet clear to ushow to combine these methods flexibly in order to assistanalysts in extracting meaningful information from largeand heterogenous geo-referenced data. Finally, consider-ing alternative analysis methods or novel combinations ofthem will lead to a need for new visual representations andnew ways for interacting with them. Therefore, it makes

sense to investigate concepts that allow us to flexibly plugin and combine diverse operators much like in the spirit ofChi’s model.

Acknowledgements

We gratefully acknowledge conceptual and implemen-tation contributions by Lars Kornelsen, Matthias Kreuseler,Arne Klaassen, Thomas Nocke, Uta Rennau, and PetraSchulze-Wollgast. This work has been partly conducted inthe context of the EU coordination project “VisMaster”.

References

[1] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. MiningAssociation Rules Between Sets of Items in Large Databases.In Proceedings of the ACM SIGMOD International Conferenceon Management of Data. ACM, 1993.

[2] G. Andrienko, N. Andrienko, S. Rinzivillo, M. Nanni, D. Pe-dreschi, and F. Giannotti. Interactive Visual Clustering of LargeCollections of Trajectories. In Proceedings of the IEEE Sym-posium on Visual Analytics Science and Technology (VAST),pages 3–10, 2009.

[3] Gennady Andrienko and Natalia Andrienko. A General Frame-work for Using Aggregation in Visual Exploration of MovementData. The Cartographic Journal, 47(1):22–40, 2010.

[4] N. Andrienko and G. Andrienko. Exploratory Analysis of Spa-tial and Temporal Data. Springer, Berlin, Germany, 2006.

[5] Christoph Buchheim, Michael Junger, and Sebastian Leipert.Improving Walker’s Algorithm to Run in Linear Time. In Pro-ceedings of the International Symposium on Graph Drawing(GD), pages 344–353. Springer, 2002.

[6] Ed H. Chi. A Taxonomy of Visualization Techniques Usingthe Data State Reference Model. In Proceedings of the IEEESymposium on Information Visualization (InfoVis), pages 69–76, Washington, DC, USA, 2000. IEEE Computer Society.

[7] Andy Cockburn, Amy Karlson, and Benjamin B. Bederson. AReview of Overview+Detail, Zooming, and Focus+Context In-terfaces. ACM Computing Surveys, 41:2:1–2:31, January 2009.

[8] Danny Dorling, Anna Barford, and Mark Newman. Worldmap-per: The World as You’ve Never Seen it Before. IEEE Transac-tions on Visualization and Computer Graphics, 12(5):757–764,2006.

[9] Niklas Elmqvist and Jean-Daniel Fekete. Hierarchical Aggrega-tion for Information Visualization: Overview, Techniques, andDesign Guidelines. IEEE Transactions on Visualization andComputer Graphics, 16(3):439–454, 2010.

[10] G. W. Furnas. Generalized Fisheye Views. In CHI ’86: Proceed-ings of the SIGCHI Conference on Human Factors in Comput-ing Systems, pages 16–23, New York, NY, USA, 1986. ACM.

[11] Steffen Hadlak, Christian Tominski, and Heidrun Schumann.Visualization of Attributed Hierarchical Structures in a Spatio-Temporal Context. International Journal of Geographical In-formation Science, 24(10), 2010.

[12] Ivan Herman, Guy Melancon, and M. Scott Marshall. GraphVisualization and Navigation in Information Visualization: ASurvey. IEEE Transactions on Visualization and ComputerGraphics, 6(1), 2000.

[13] Jochen Hipp, Ulrich Guntzer, and Gholamreza Nakhaeizadeh.Algorithms for Association Rule Mining – A General Surveyand Comparison. SIGKDD Exploration Newsletter, 2(1):58–64,2000.

[14] Mikael Jern. Collaborative Web-Enabled Geoanalytics Appliedto OECD Regional Data. In Proceedings of the 6th Interna-tional Conference on Cooperative Design, Visualization, andEngineering (CDVE), pages 32–43, Berlin, Heidelberg, 2009.Springer.

11

[15] Daniel Keim, Jorn Kohlhammer, Geoffrey Ellis, and FlorianMansmann, editors. Mastering The Information Age – Solv-ing Problems with Visual Analytics. Eurographics Association,2010.

[16] E. L. Koua and M.-J. Kraak. An Integrated Exploratory Geo-visualization Environment Based on Self-Organizing Map. InP. Agarwal and A. Skupin, editors, Self-Organising Maps: Ap-plications in Geographic Information Science. John Wiley &Sons, 2008.

[17] Menno-Jan Kraak and Ferjan Ormeling. Cartography: Visual-ization of Spatial Data. Longman Singapore Puplishers, Singa-pore, 1996.

[18] Matthias Kreuseler and Heidrun Schumann. Information Vi-sualization Using a New Focus+Context Technique in Com-bination With Dynamic Clustering of Information Space. InProceedings of the Workshop on New Paradigms in Informa-tion Visualization and Manipulation (NPIVM), pages 1–5, NewYork, NY, USA, 1999. ACM.

[19] Ying K. Leung and Mark D. Apperley. A Review and Tax-onomy of Distortion-Oriented Presentation Techniques. ACMTransactions on Computer-Human Interaction, 1(2):126–160,1994.

[20] Anthony C. Robinson. A Design Framework for ExploratoryGeovisualization in Epidemiology. Information Visualization,6(3):197–214, 2007.

[21] J. J. Thomas and K. A. Cook. Illuminating the Path: TheResearch and Development Agenda for Visual Analytics. IEEEPress, 2005.

[22] Christian Tominski, Petra Schulze-Wollgast, and Heidrun Schu-mann. 3D Information Visualization for Time Dependent Dataon Maps. In Proceedings of the International Conference In-formation Visualisation (IV), pages 175–181. IEEE ComputerSociety, 2005.

[23] Christian Tominski, Petra Schulze-Wollgast, and Heidrun Schu-mann. Visual Methods for Analyzing Human Health Data. InNilmini Wickramasinghe and Eliezer Geisler, editors, Encyclo-pedia of Healthcare Information Systems, pages 1357–1364. In-formation Science Reference, 2008.

[24] Edward R. Tufte. The Visual Display of Quantitative Informa-tion. Graphics Press, Cheshire, CT, 1983.

[25] Jarke J. Van Wijk and Edward R. Van Selow. Cluster and Cal-endar Based Visualization of Time Series Data. In Proceedingsof the IEEE Symposium on Information Visualization (Info-Vis), pages 4–9, Los Alamitos, CA, USA, 1999.

12

Documents

Analytical, Visual, and Interactive Concepts for Geo ...ct/pub... · the core of research in cartography and geo-visualization. For decades, many powerful methods have been developed