24
Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi 1 , Yuyin Liu 2 , Hanghang Tong 3 , Jingrui He 3 , Gang Yan 1 , Nan Cao 1 1 Tongji University, China 2 Imperial College London, United Kingdom 3 University of Illinois at Urbana-Champaign, United States Abstract—The increasing accessibility of data provides sub- stantial opportunities for understanding user behaviors. Un- earthing anomalies in user behaviors is of particular importance as it helps signal harmful incidents such as network intrusions, terrorist activities, and financial frauds. Many visual analytics methods have been proposed to help understand user behavior- related data in various application domains. In this work, we survey the state of art in visual analytics of anomalous user behaviors and classify them into four categories including social interaction, travel, network communication, and transaction. We further examine the research works in each category in terms of data types, anomaly detection techniques, and visualization techniques, and interaction methods. Finally, we discuss findings and potential research directions. I. I NTRODUCTION The increasing accessibility of data collected from various sources provides potential opportunities for understanding user behaviors. Identifying anomalies in user behaviors is of particular interest in many application domains such as cybersecurity, urban planning, and social media. For instance, detecting rumors and tracking their spreading patterns alert people to the risks of being influenced by misinformation, which is especially critical in political elections. Detecting anomalous user behaviors is a challenging task as the boundary between abnormal and normal data cannot be clearly defined. Even equipped with domain knowledge, analysts may find results of automatic machine learning approaches lack contextual information to support decision- making, e.g., analysts are limited to exploring who did what when and where, why (5W’s) and how. To address the issue, visualization integrates human knowledge into information processing tasks. It presents anomalous patterns intuitively to decision makers as well as involves a human-machine dialog as they interact with the data set. Our work aims to sum- marize the-state-of-art in visual analytics of anomalous user behaviors, with the purpose of highlighting current research trends as well as future directions. In this survey, we contribute a taxonomy of visual analytics of anomalous user behaviors. The overview of the analytical pipeline is summarized in Figure 1. We categorize four user behaviors, including social in- teraction, travel, network communication, and transaction based on the data collected from specific data sources. We extract four common data types from these four behaviors, including text, network, spatiotemporal information, and multidimensional data. We review how research works use visualization techniques combined with interaction methods to analyze anomalous user behaviors. We extract six visualization techniques, including sequence visualization, graph visualization, text visualization, geographic visualization, chart visualization, and glyph visualization. We also summarize six interaction methods, including tracking & monitoring, exploration & navigation, pattern discovery, knowledge externalization, and refinement & identification. The remaining survey is organized as follows. First, we describe related surveys in Section II. Then, we present the taxonomy, methodology, and taxonomy used in this survey in Section III. Section IV, V, VI, and VII analyze the four user behaviors respectively using the taxonomies explained in Section III. Analysis of each behavior follows the general visual analytics pipeline. We start with identifying data types and anomaly detection techniques, visualization techniques and interaction methods are then discussed. Finally, we dis- cuss findings and trends acquired from surveying papers in Section VIII and conclude our work in Section IX. II. RELATED SURVEYS In this section, we discuss related surveys for visual anoma- lous user behaviors analysis. There are survey papers in the literature that focus on analyzing user behaviors. Jin et al. [2] categorize user behaviors in online social network into four types including connectivity and interaction, traffic activity, mobile social behavior, and malicious behavior. Jiang et al. [3] classify anomalous behaviors when using web applications (e.g., Hotmial, Facebook, Amazon) into four categories: traditional spam, fake reviews, social spam, and link farming. Surveys regarding visualization of user behaviors data explore application domains such as urban computing [4], social media [5], [6], financial domain [7], and network security [8], [9]. In the field of anomaly detection, Chandola et al. introduce categories of anomaly detection (AD) tech- niques [1]. [10] and [11] examine techniques used in intrusion detection systems and for detecting graph-based anomalies, respectively. Recent work of Chalapathy and Chawla [12] present a structured overview of research approaches in deep learning-based anomaly detection. Our survey covers a wider range of application domains than existing surveys. To the best of our knowledge, it is the first survey that explores anomalous user behaviors from a perspective of visual analytics. arXiv:1905.06720v2 [cs.HC] 21 May 2019

Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Visual Analytics of Anomalous User Behaviors:A Survey

Yang Shi1, Yuyin Liu2, Hanghang Tong3, Jingrui He3, Gang Yan1, Nan Cao1

1Tongji University, China2Imperial College London, United Kingdom

3University of Illinois at Urbana-Champaign, United States

Abstract—The increasing accessibility of data provides sub-stantial opportunities for understanding user behaviors. Un-earthing anomalies in user behaviors is of particular importanceas it helps signal harmful incidents such as network intrusions,terrorist activities, and financial frauds. Many visual analyticsmethods have been proposed to help understand user behavior-related data in various application domains. In this work, wesurvey the state of art in visual analytics of anomalous userbehaviors and classify them into four categories including socialinteraction, travel, network communication, and transaction. Wefurther examine the research works in each category in termsof data types, anomaly detection techniques, and visualizationtechniques, and interaction methods. Finally, we discuss findingsand potential research directions.

I. INTRODUCTION

The increasing accessibility of data collected from varioussources provides potential opportunities for understandinguser behaviors. Identifying anomalies in user behaviors isof particular interest in many application domains such ascybersecurity, urban planning, and social media. For instance,detecting rumors and tracking their spreading patterns alertpeople to the risks of being influenced by misinformation,which is especially critical in political elections.

Detecting anomalous user behaviors is a challenging taskas the boundary between abnormal and normal data cannotbe clearly defined. Even equipped with domain knowledge,analysts may find results of automatic machine learningapproaches lack contextual information to support decision-making, e.g., analysts are limited to exploring who did whatwhen and where, why (5W’s) and how. To address the issue,visualization integrates human knowledge into informationprocessing tasks. It presents anomalous patterns intuitively todecision makers as well as involves a human-machine dialogas they interact with the data set. Our work aims to sum-marize the-state-of-art in visual analytics of anomalous userbehaviors, with the purpose of highlighting current researchtrends as well as future directions.

In this survey, we contribute a taxonomy of visual analyticsof anomalous user behaviors. The overview of the analyticalpipeline is summarized in Figure 1.

• We categorize four user behaviors, including social in-teraction, travel, network communication, and transactionbased on the data collected from specific data sources. Weextract four common data types from these four behaviors,including text, network, spatiotemporal information, andmultidimensional data.

• We review how research works use visualization techniquescombined with interaction methods to analyze anomaloususer behaviors. We extract six visualization techniques,including sequence visualization, graph visualization, textvisualization, geographic visualization, chart visualization,and glyph visualization. We also summarize six interactionmethods, including tracking & monitoring, exploration &navigation, pattern discovery, knowledge externalization,and refinement & identification.

The remaining survey is organized as follows. First, wedescribe related surveys in Section II. Then, we present thetaxonomy, methodology, and taxonomy used in this surveyin Section III. Section IV, V, VI, and VII analyze the fouruser behaviors respectively using the taxonomies explainedin Section III. Analysis of each behavior follows the generalvisual analytics pipeline. We start with identifying data typesand anomaly detection techniques, visualization techniquesand interaction methods are then discussed. Finally, we dis-cuss findings and trends acquired from surveying papers inSection VIII and conclude our work in Section IX.

II. RELATED SURVEYS

In this section, we discuss related surveys for visual anoma-lous user behaviors analysis. There are survey papers inthe literature that focus on analyzing user behaviors. Jinet al. [2] categorize user behaviors in online social networkinto four types including connectivity and interaction, trafficactivity, mobile social behavior, and malicious behavior. Jianget al. [3] classify anomalous behaviors when using webapplications (e.g., Hotmial, Facebook, Amazon) into fourcategories: traditional spam, fake reviews, social spam, andlink farming. Surveys regarding visualization of user behaviorsdata explore application domains such as urban computing [4],social media [5], [6], financial domain [7], and networksecurity [8], [9]. In the field of anomaly detection, Chandolaet al. introduce categories of anomaly detection (AD) tech-niques [1]. [10] and [11] examine techniques used in intrusiondetection systems and for detecting graph-based anomalies,respectively. Recent work of Chalapathy and Chawla [12]present a structured overview of research approaches in deeplearning-based anomaly detection. Our survey covers a widerrange of application domains than existing surveys. To the bestof our knowledge, it is the first survey that explores anomaloususer behaviors from a perspective of visual analytics.

arX

iv:1

905.

0672

0v2

[cs

.HC

] 2

1 M

ay 2

019

Page 2: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Fig. 1. Taxonomy of this survey, addressing the data type, anomaly detection techniques [1], visualization techniques, and interaction methods in the visualanalysis of anomalous user behaviors.

III. TERMINOLOGY, METHODOLOGY, AND TAXONOMY

In this section, we first explain the terminology used inthis survey and describe our methodology of selecting paperssuitable for the topic of the survey. Next, we introduce thetaxonomy of anomalous user behaviors regarding commondata types, anomaly detection techniques, visualization, andinteraction methods.

A. Terminology

The survey aims to summarize visualization works thatfocus on anomalous user behaviors. Here, user behaviorscan be derived directly and indirectly from user actions. Forexample, posting a tweet is a behavior directly related to useractions while a cyber-attack is conducted by nodes in networksbut indirectly manipulated by the perpetrator. Investigation ofuser behavior focuses on tracking, collecting, and assessingpatterns caused by users’ as opposed to information of devicesand events [13], [14]. Analyzing and identifying anomaloususer behaviors uses anomaly detection techniques. Accordingto Chandola et al. [1], anomalies are “patterns in data thatdo not conform to a well-defined notion of normal behavior”.As we collect research works from a diverse set of domainssuch as social media, finance, and cybersecurity, the scope ofanomaly detection in our survey is broader than the scope iden-tified in specific domains. For example, e.g., Chen et al. [5]identify data outside normal ranges of attributes as anomaliesin social media while in the field of cybersecurity, anomaliesrefer to malware, insider threats, and targeted attacks [13],[14]. In our work, anomalies refer to frauds, spam, intrusion,sudden increases in the volume of data, and periodic patternsof users, etc. In short, as long as results detected express“interestingness of real-life relevance” [1], we claim that thevisualization works are within the scope of anomaly detection.

B. Methodology

Our interested range of publications is constrained by threeconditions: user behaviors, anomaly detection, and visualanalytics/visualization. We started from a core set of relevant

research works known to us in advance, and followed refer-ences from “Related Work” as well as papers that cite thepreviously identified papers. We also conducted a keywordsearch for papers published in visualization conferences orjournals. Examples of keywords are “anomaly, anomalous,outlier, abnormal, unusual” and “rare”. The research paperswere checked to affirm that they are indeed associated with theconcept of anomaly in [1]. The association with user behaviorswas expected to be seen in Case Study section in publications.During the process of investigating research works, we foundthat the range of pertinent papers is relatively narrow. To solvethe potential shortage in the number of references, our surveyrange covered publications that incorporate anomaly detectionas one of their visual analytic approaches in addition to thosethat solely address the issue of anomaly detection, e.g., weinclude [15] in our collection through the authors’ ultimategoal is predictive analysis of event evolution.

We also keep our exploration spectrum balanced in terms ofapplication domains. We noticed the number of publicationsrelated to travel and network communication outnumber oth-ers. The outnumbering of travel probably results from the earlyhistory of visualizing spatiotemporal data (in 1869 CharlesMinard produced a map to illustrate Napoleon’s March toMoscow) and continuous study ever since. As for cyberse-curity, the establishment of a conference for visualization ofcybersecurity, IEEE Symposium on Visualization for CyberSecurity (VizSec), encourages researchers to devote efforts inthis field. As such, we allocated more time to searching forresearch works of other user behaviors comparatively. We arehoping to capture possibly interesting relationships across userbehaviors by maintaining a broad scope of investigation.

C. Taxonomy

Based on a literature review of more than 150 papers thatrelevant to visual analytics of anomalous user behaviors, wesummarize four user behaviors including social interaction,travel, network communication, and transaction. For eachof the four user behaviors, we attempt to identify commondata types, anomaly detection techniques, visualization, and

Page 3: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

interaction methods. The different categories are highlightedin the overall pipeline of visual analytics in Figure 1. Theselected papers are summarized in Table 2, with color indicateseach category.

User Behaviors. User behaviors are seen in a varietyof application domains. Based on the data collected formspecific data sources, we classify user behaviors into fourcategories: social interaction, travel, network communication,and transaction. Social interaction describes the communi-cation of ideas and thoughts between people. Its data iscollected from publicly accessible social platforms or privatetelecommunication platforms. Travel is the physical movementof users between places containing geographic information.Its data is collected from Global Positioning System (GPS),mobile phones and base stations, etc. Network communicationis sending and receiving information between machines vianetworks. Its data is collected from server logs. Transactionrefers to monetary flows in buying and selling, whose data iscollected from system logs.

We also categorize anomalous user behaviors into egocentricand collective behaviors. The categorization is inspired bythe concepts of point and collective anomalies [1]. Note thatour survey focuses on the investigation of anomalous userbehaviors which constitutes a subset of anomalies. Egocentricbehavior refers to the user behavior that distinguishes itselffrom the rest of data in anomaly detection. Collective behavioris a set of user behaviors that appear anomalous. Whenanalyzing separate user behaviors categorized into collectivebehavior, they may appear normal on an individual basis.As egocentric and collective behaviors emphasize differentaspects, specific visualization designs should be introduced.It will be discussed when analyzing visualization techniquesin the following sections.

Data Types. A variety of data can be extracted from userbehaviors across different domains. By analyzing multipleattributes of these data, we summarize four common datatypes including text, network, spatiotemporal information, andmultidimensional data [3], [5]. A brief explanation for eachdata types is described as follows. Text provides semanticinformation of identities and backgrounds objects. Network,also called subgraph, consists of a set of nodes interlinkedwith a set of edges. A formal definition of a graph can befound in [16]. Spatiotemporal information captures spatialand temporal attributes of data. Multidimensional data usesmultiple attributes to describe the properties of objects. Adetailed explanation of data types for each user behavior isintroduced in the following sections.

Anomaly Detection Techniques. The categorization ofanomaly detection techniques used in this survey is bor-rowed from the survey written by Chandola et al. [1]. Thesix categories are classification-based, nearest neighbor-based,clustering-based, statistical, information theoretic, and spectralanomaly detection techniques. Classification-based anomalydetection techniques develop models in the training phase anddistinguish anomalies from normal data instances in the testingphase. In the training phase, classifiers are learned via training

a set of data instances. In the testing phase, test instances areclassified into one of the classes - normal or anomalous. Near-est neighbor-based techniques compute anomaly scores fromdistance or relative density measures in a community. Anoma-lies are separated using distance-based nearest neighbor-basedtechniques, which calculate anomaly scores based on dis-tance to its kth nearest neighbor. Clustering-based techniquesgroup similar data instances into clusters, and separate normalinstances from anomalous instances. Statistical techniquespresume probability distributions of data instances. Outliersare found in space of low probability whilst normal instancesare observed with a high probability of occurrence. Statisticaltechniques can be further divided into parametric and non-parametric anomaly detection techniques providing whetherthere exists a model structure a priori. Information theoretictechniques analyze information content using measures such asentropy, relative entropy and Kolomogorov Complexity. Spec-tral techniques aim to find an approximation of the data bydecomposing the problems and constructing suitable attributes.The attributes or components can then be embedded into lowerdimensional subspace in which anomalous instances can bedistinguished from normal instances. A detailed explanationof categories and sub-categories can be referred to [1].

We focus our discussion on visualization works that applyanomaly detection techniques. A small proportion of visualanalytic tools manage to detect anomalies by using carefullydesigned visualizations from which anomalous data instancescan be visually distinguished from normal ones [17], [18],[19], [20], [21]. The designs encode attributes and/or fre-quency using easily recognizable visual channels such as hues,heights of glyphs, sizes of nodes, etc [22], [21], [23]. Weexclude these papers in the discussion of anomaly detectiontechniques.

Visualization Techniques. We categorize visualizationtechniques that have been applied to anomalous user be-haviors, including sequence, graph, text, geographic, chart,and glyph visualizations. Sequence visualization illustratesrelations between successive events with temporal information.Anomalous sequences include spreading patterns of rumors,sudden changes in the volume of posts, and unusual businessprocesses. Common visual representations are timeline visu-alization, flow visualization, and parallel coordinates. Graphvisualization shows structured patterns composed of nodesand edges. Anomalous graph indicates special communicationpatterns in a group or communities, financial frauds conductedbetween employees and clients, and unauthorized networktraces directed from sources to destinations. Typical graphvisualizations are node-link diagram, circular-based designs(i.e., a network topology map inside an outer ring), tree, andmatrix. Text visualization focuses on textual data. Anomaloustext is indicated by specific keywords, topics, and sentimentsextracted/abstracted from texts. Word cloud is one of the usualvisualization techniques for text. Text can also be combinedwith other visualization techniques such as flow visualizationto present more contextual information. Geographic visual-ization depicts mobility patterns of people or vehicles in

Page 4: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Fig. 2. The selected papers regarding visualization and visual analytics of anomalous behaviors. DTs: text, network, spatiotemporal information, andmultidimensional data. ADs: classification-based, nearest neighbor-based, clustering-based, statistical, information theoretic, and spectral anomaly detectiontechniques. VTs: sequence, graph, text, geographic, chart, and glyph visualizations. ITs: tracking & monitoring, exploration & navigation, knowledgeexternalization, pattern discovery, and refinement & identification.

geographic space. Mobility patterns include discrete as wellas continuous patterns. Discrete patterns describe distributionand co-occurrence while continuous patterns depict trajectoriesof users when they move from one point to another. Abnormalmobility patterns are hot spots, an opposite traveling directionto most, and uncommon movement when compared to history.Heat maps and flows/bubbles projection on a geographic mapare used most often for visual analysis of mobility patterns.Chart visualization and Glyph visualization represent the at-tributes of a multidimensional data item using a chart (e.g., x-, y-axis, color of objects) and the feature of an icon (color,size, shape), respectively. Examples of anomalies include userswho only reply in a discussion board but never initiate a postand who send an unusual amount of emails at a certain time.Typical visualization techniques include 2D/3D scatter plot,bubble chart, bar chart, Gantt chart, etc.

Interaction Methods. Interaction plays an important role invisual analytics. Based on analyzing interactions methods [24]used in research works regarding detecting of anomalous userbehaviors, we summarize the categories of interaction tasksincluding tracking & monitoring, exploration & navigation,knowledge externalization, pattern discovery, and refinement& identification. Analysts may mark data of interest viaclick, hover or brush for tracking & monitoring. Analystsmay observe data via panning, zooming, or drill-down/roll-upfunctions for exploration & navigation. Analysts may adjustattributes of data (e.g., color, size, range) to reveal interest-ing patterns (pattern discovery). Analysts may collect, save,and extract the current visualization (e.g., take a snapshot)for knowledge externalization. Analysts may label data withknown identities (i.e., abnormal or normal data item) for

refinement & identification of results.

IV. SOCIAL INTERACTION

Social interaction describes communication of ideas andthoughts between people. Social interaction can be furtherclassified into private and public interaction. Private socialinteraction behaviors include sending and/or receiving emails,making phone calls, and sending text messages betweenfamiliars on a normal basis. Examples of anomalous in-teraction are communication of fraudsters [25], [26] andcriminals [27], [28], emailing patterns of core contributorsin a working group [29], [30] and spam [31]. Public socialinteraction behaviors associate with posting/sharing/replyingcontents on publicly accessible social platforms. Specifically,writing reviews on e-commerce platforms and editing articlesin Wikipedia are also counted as public social interaction.Anomalies related to this interaction consist of diffusion ofrumors [32], [33], social bots [34], [35], and detection ofevents [36], [37], [38], [39].

We observe a few differences between private and publicsocial interactions. The linkage between senders and receiversis not explicit in public interaction compared to one-on-oneconversations in private. The information accessible on publicplatforms is much more than that in private settings, leading tolarger volumes of data collected relevant to public behaviors.The differences can also be implied from design principles ofvisual analytics tools which will be discussed in Section 4.3.

A. Data Types

Text data such as keywords, hashtags, and email contentshelp analysts comprehend social interaction behavior, as it

Page 5: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

provides information including sentiment, categories, and clus-ters of text under a certain topic. Gloor et al. [30][40] filteremails by keywords that are known to be related to crimepatterns. For example, “bonus” means the most importantthing, “investigation” refers to what is coming up for criminals.TargetVue [34] incorporates content features to detect socialbot accounts. Mentioning of a topic under which suddenchanges in the number of relevant tags are observed, isregarded as an anomalous behavior. Echeverria et al. [41]discover a bot network in Twitter by solely mining the textualfeatures of tweets. They found that the tweets of the botnetare taken directly from “Star Wars” novels. Beagle [28] allowsanalysts to filter contents from a filter set as well as toconstruct filters using keywords that are found useful duringthe investigation of scamming activities.

As social interaction concerns with passing, sharing, andexchanging information, network are often seen when con-versations are held between users. Follower relationship insocial media, back-and-forth communication via emails, andamendments made by one user in Wikipedia in response tothe edit of another user are considered as network data. Glooret al. [30] identify the team leader, practice leader, and prac-tice coordinator from visualization of social email networks.These anomalous users are placed in the center of the socialnetwork and connected to multiple nodes. Fu et al. [29]explore small-scale email networks, where a node representsan email address, and an edge between two nodes indicates anemail exchange. Analysts are able to identify different emailnetworks for specific research groups as little communicationis made across different groups. FluxFlow [32] derives usernetworks when exploring the process of anomalous informa-tion spreading. Indegree and outdegree are extracted based onthe interaction graph of a Twitter user. These measures signalthe influential power of the user.

Temporal information can be found from timestamps ofmicroblogs, time and date of emails and calls, and days when auser appear on a forum. Location of geo-located microblogs,the location of calls, and the terrorist network of a countryare spatial data. Temporal data facilitates the analysis ofcommunication evolution whereas and spatial data explainswhere the behavior occurs. Elzen et al. [25], [26] detectcommunication bursts using dynamic network visualization.One important part is the temporal analysis of events (e.g., mo-bile phone calls), where trends opposite to global trends,periodic repetition, and a sudden block between homogeneousbehaviors are considered abnormal. CloudLines [42] regardssudden changes in the number of specific keywords within aperiod as anomalies. The keywords are collected from tweets,which arrive in data streams at non-uniform time intervals.Some visualization works combine temporal and spatial anal-ysis in event detection. ScatterBlogs [37], [43] detects eventscontaining geographic information such as power outages anddisasters from microblogs, and in the meantime representmessages related to the events on a map.

Multidimensional data for detecting anomalous user behav-iors include the length of a tweet, number of posts/emails,

and average rating scores in e-commerce platforms. Multi-dimensional data not only offers comprehensive descriptionsof social interaction, but also helps abstract anomalousnessof behaviors. Webga and Lu [44] detect anomalous ratingsby incorporating multidimensional data into the analysis. Themultidimensional data includes the scores given by everyuser at the corresponding time. Rating frauds are discoveredby measuring differences in average ratings and the numberof rating activities in two time windows. Cao et al. [34]detect anomalous users in social media by carefully selectingcommunication features. To investigate the interaction aspectof a social account, features such as whether users tend tocommunicate within a group or spread information in public,and whether users are responded from others are measured.FraudVis [45] selects ten features based on the rank ofanomaly score to investigate which features contribute mostto frauds on the Internet. The activity count within differenttime periods, for instance, is one of the features that evaluatethe number of views on a video website.

B. Anomaly Detection Techniques

Classification-based techniques are popular in discoveringabnormal social interaction when compared to the applica-tion of the techniques in the other three user behaviors.The retrieval of “Star Wars” botnet [41] is achieved witha naïve Bayesian classifier based solely on textual features.This basic technique is effective because the tweets postedby the botnet are cited from the “Star Wars” novels. Scat-terBlogs2 [38] proposed a supervised, Support Vector Machine(SVM) classification-based approach to train classifiers asuser-adjustable filters. A random forest algorithm, i.e., rule-based classification detects misinformation that is spread bysocial bots in a supervised approach [35]. RumourLens [33]analyzes the impact of rumors during the information diffusionprocess. It performs iterative expansion of a query set anditerative refinements of a classifier (ReQ-ReC retriever andclassifier) [49]. The output is a ranked list of tweet clusters thatseem to be rumors, which can be refined by users. FluxFlow[32] utilizes one-class conditional random fields (OCCRF) toperform sequential anomaly detection. The OCCRF modelassumes the highly dynamic and one-class nature of anomalies,and computes an anomaly score by measuring dissimilarityfrom unlabeled training samples. The dissimilarity is derivedfrom the difference of the posterior probability of a normallabel and that of an abnormal label.

Nearest neighbor-based techniques calculate anomaly scoresfrom distance or density. Metrics such as density, betweennesscentrality, and group degree centrality in networks are theranking criteria of homogeneity/risk for Collaborative Inno-vation Networks [50]. MobiVis [48] incorporates semanticinformation of phone calls and geographic proximity into aheterogeneous graph. Through importance filtering based onvariables such as node degree of a neighborhood, importantnodes and edges can be pruned from interaction with theontology graph. TargetVue [34] employs time-adaptive localoutlier factor model to quantify sudden changes of posting or

Page 6: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Fig. 3. Visualizations of anomalous social interaction behaviors. (a) TargetVue [34] uses circle-based glyph visualization to encode individual users’ temporalposting/reposting behaviors, anomalousness of their behaviors, and correlation between suspicious users. (b) Leadline [46] visualizes event episodes usinghorizontal pulse-shaped timeline visualization. (c) FluxFlow [32] shows anomalous information spreading on social media using packed circle timelinevisualization. (d) Chae et al. [47] present public behavior responses to disaster events in microblog using a heat map and hexagons on a map. (e) Mobivis [48]visualizes the calling behavior of a network consisting of university staff and students using a node-link diagram.

emailing behaviors. A user can be identified as a time-seriesvector in multidimensional feature space. Each user is givenan anomaly score computed from features that distinguishone user from others, and from his/her own history. Kerneldensity estimation (KDE) is used for computing continuousdistribution. It scales the parameters of estimation by enablingthe kernel scale to vary based on the distance from the point tothe kth nearest neighbor in a data set. Cloudlines [42] allowslogarithmic distortion of amplifying recent events in time. Akernel density estimator and a truncation function help focuson recent events that appear dense in time series. KDE is alsoused in [39] to inspect spatiotemporal regularities of topics.Point patterns are related to continuous regions by comparativekernel density analysis.

Statistical techniques are used in event detection, whereanomalousness is quantified by measuring differences frommodels constructed from history behaviors. TwitInfo [36] findspeaks from time-series events by considering exponentiallyweighted moving average and variance in a time window.The algorithm starts a new window if a significant increasein counts relative to the historical mean is encountered. It-erartive non-parametric regression based on Loess smoothingdecomposes time series of interest to three components: trend,seasonal, and remainder component. Z-scores of remaindervalues are abnormality rating. This novel method was first usedin ScatterBlogs [43]. It was later applied to identify unusualtopics in the selected regions [47] and used as part of predictiveanalytics based on topic trends in historic time series [15].

EventRiver [51] applies a clustering-based approach based

on temporal locality in the analysis of streaming texts. Theclusters are related in contents regardless of time spans.ScatterBlogs [37] uses the Lyold clustering technique to distin-guish unusual events from general message clusters originatingfrom high densities in time and space. Episogram [52] selectappropriate features for clustering, and generate clusters thatare always centered at the positions with highest densitiesin the data space. FraudVis [45] employs the CopyCatchalgorithm, a graph-based clustering approach to explore fraudgroups who suddenly follow a user in social media on a singleday.

Spectral techniques are used to detect interesting networkstructure of editing histories [53] and rating frauds in e-commerce systems [44]. Brandes et al. [53] abstract weightedattributes on nodes and edges from users and relationshipsbetween users respectively. A weighted graph is projected intocontroversy space where collaboration or competition structureof two user groups are easily identified. Webga et al. [44] adopta dimension reduction algorithm, singular value decomposition(SVD), to detect fake ratings that are written to boost thepopularity of selected items in e-commerce stores. Once thesuspicion level is raised above a threshold, alerts are sent tothe visualization.

C. Visualization Techniques

Egocentric Behaviors. Egocentric Social Interaction behav-iors study the role of a user from his/her interaction withothers. Examples of anomalous egocentric behaviors are userswho only reply in a discussion board or who send an unusualamount of emails at a certain time. We observe that glyph, text,

Page 7: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

and graph visualizations are favored visual representations foregocentric behaviors.

Anomalous user behaviors can be identified via glyphvisualization that are in different appearances to those ofnormal ones. Episogram [52] uses arrow-based and arc-basedtimelines to demonstrate posting and reposting activities, re-spectively. The two timelines can be aggregated to obtain over-all tweeting behaviors. Users who always repost immediatelyafter a message is posted are identified as arcs that alwaysstart from one end. TargetVue [34] (Figure 3 (a)) tackles thechallenge of discovering social bots in Twitter. The circle-based glyph visualization facilitates investigation in termsof topics, sentiments, temporal dynamics of communicationand its impacts, and relationship among accounts. Specifi-cally, individual users’ temporal posting/reposting behaviors,anomalousness of their behaviors, and the correlation betweensuspicious users are encoded by behavior glyph, feature glyph,and relation glyph, respectively.

Text visualization can be used to describe egocentric com-munication patterns in emails [18], [54]. PostHistory [18]shows the evolution of emailing patterns. It consists oftwo views, with one revealing the intensity of exchangedmessages with each contact in a calendar view, and theother demonstrating how email addresses evolve over timein movies. Analysts can change addresses’ positions by ver-tical/circular/alphabetical arrangement. Social Network Frag-ment [18] represents social networks in a graph where nodesare replaced by colored names of individuals. The larger thefont of the name, the stronger an individual is tied to others.Viégas et al. [54] study changes of relationships implied fromchanges of keywords in email contents. The frequency anddistinctiveness of keywords can be inferred from the sizesof texts, and thus anomalies such as changes of relationships(e.g., from peer to boss) can be inferred.

In addition to glyph and text visualization, graph visualiza-tion, especially node-link visualization helps detect anomalousindividual behaviors from their social interaction. Li et al. [55]explore email patterns in two graphical modes: cliques andemail flows. A spam bot is detected in the email flow panelwhen only edges originating from one node are visualized.Gloor et al. [50], [30], [40] investigate communicationpatterns of working groups in node-link visualization, andstudy the evolution of social structures over time in animation.Networks are drawn in personalized mode or subject mode toidentify core contributors in groups and important messages,respectively [50]. The visualization tool TeCFlow [30], [40]detects the hidden communication structure from the Enronemail corpus. The hierarchical social networks uncover howEnron employees conduct collusion and frauds by emphasizingthe roles of influencers, gatekeepers, and leaders. Semanticnode-link views enable investigation in terms of email ad-dresses, keywords or time. Shao et al. [35] evaluate the extentto which an account expresses similarity to the characteristicsof social bots based on diffusion patterns of tweets. In the“Hoaxy” platform, a node-link diagram represents the socialnetworks, with brighter hues indicating higher anomalous

scores.Collective Behaviors. Collective Social Interaction behav-

iors derived from users acting in a group or acting in responseto each other. Anomalous collective social behaviors includetemporal development of tweets, the reaction of people to spe-cial incidents, and separate group patterns of communication.Sequence, geographic, and graph visualizations used often forcollective behaviors.

Sequence visualization represents the evolution of collectivebehaviors in various forms such as parallel coordinates andpulses/bubbles arranged along a timeline visualization. Viégaset al. [23] visualize revision history of Wikipedia pagesin modified parallel coordinates. Each revised version of anarticle is represented by a vertical axis, with the axis’ lengthindicating the length of the article. The vertical axis is dividedinto parts with each corresponding to revisions made by everyauthor. By linking the axes together, a modified form ofparallel coordinates shows the competition/mass deletion his-tories of articles. RumorLens [33] demonstrates the movementbetween different states of interaction with a rumor. The mainview shows a Sankey diagram. The number of people exposedto the rumor and the associated correction is illustrated withlengths of colored segments (blue for rumors and red forcorrections) in one axis. By linking different states betweenaxes that correspond to time epochs, analysts can understandthe influence of rumors and the corrections.

Pulses and bubbles arranged according to temporal sequenceillustrate the anomalies of collective social interaction behav-iors. Major changes in temporal development of texts aredetected by highlighting unusual shapes of timelines. As oneof the earliest visualizations that investigate the emergence ofevents, TwitInfo [36] visualizes bursts of events in a line chart.The highlighted and labeled event peaks suggest events thattrigger heated discussion on Twitter. CloudLines, LeadLineand EventRiver [42], [46], [51] detect events by relatingvolume of text data extracted from online news within aperiod of time to temporal density of keywords. Horizontalpulse-shaped timeline visualization represents event episodes,with the sizes of pulses indicating the importance of events.LeadLine (Figure 3 (b)) and EventRiver [46], [51] arrangevertical positions of events according to similarity of topics.FluxFlow [32] (Figure 3 (c)) discovers temporal trends andimpacts of users in information spreading process (e.g., ru-mors). The main view consists of packed circles arrangedalong a timeline. A user’s influence (i.e., the number offollowers) and anomaly score are encoded by the size andcolor of a circle, respectively. A user can be analyzed fromthree perspectives simultaneously: tweet volume, sequence,and distribution of anomalous accounts. A complementary treevisualization demonstrates the correlation of user accounts inthe diffusion process.

Geographic visualization is used to reveal events contain-ing spatial as well as temporal references. With geographicdetails, anomalies can be detected from spatial intensitiesobtained from a collection of social interaction behaviors. Leeet al. [56] introduce one of the earliest works of applying

Page 8: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

spatiotemporal analysis to social media, where flows of peopleare represented as arrows on a map. ScatterBlogs [37], [43]employ geographic visualization for anomaly detection oftopics and events as well as their spatial and temporal marks.ScatterBlogs2 [38] uses dots on a map to portray geo-locatedmicroblog posts. It differs from its previous version since thereare two settings in ScatterBlogs2: a classifier creation envi-ronment and a monitoring environment. Analysts create task-tailored filters based on messages of well-understood eventsin the classifier creation environment, and obtain contexts ofinteresting events from a filter orchestration view and a timeslider in the monitoring environment. Thom et al. [37] extractterms from messages and cluster topics as tag clouds on azoomable map. Anomalous events are labeled and positionedon a map according to its detected location. The “Star Wars”botnet was discovered by accident when Echeverria et al. [41]observed sharp boundaries of the latitudinal and longitudinalposition of some tweets, which were generated from botsconsidering the unusual spatial distribution.

Heat map, one of the geographic visualizations, is effec-tive at illustrating geographically-marked microblog messages.Pozdnoukhov et al. [39] compute heat maps from streamingtweets. Density of heat maps indicates spatial variability ofpopulation’s response to various stimuli such as large scalesportive, political or cultural events. The difference in densitybetween two heat maps implies temporal evolution of events.Chae et al. [47] (Figure IV (d)) collect a sheer volumeof real-time microblog messages and mine public behaviorresponse to disasters. A heat map and hexagons on a mapidentify spatiotemporal differences between crisis and normalsituations.

Graph visualization including node-link and circular-basedvisualization uncover anomalous structures of social inter-action. Perer and Shneiderman [27] emphasize the need toexamine social networks systematically in SocialAction. Thevisualization tool is designed accordingly to encourage in-teraction with clustered node-link visualization. Analysts canquickly direct their attention to the most anomalous networksas nodes/subgroups are colored according to their ranks ofanomalousness. Fu et al. [29] examine small-world emailnetworks using several visualizations. For example, stackeddisplays of graphs on a spherical surface visualize commu-nication patterns between different groups. A hierarchicaldrawing emphasizes important nodes by placing them highin the hierarchy. MobiVis [48] (Figure 3 (e)) visualizes thecalling behavior of a network consisting of university staff andstudents using a node-link diagram. The goal is to investigateinformation exchanges and the implicit social relationship. Theresearchers design a “behavior ring” for user(s), which arrangeevents in a radial form around a node. Analysts study structuralinteraction from the correlation between nodes and temporalinteraction from the rings.

Circular-based representation demonstrates collective so-cial interaction behaviors in a packed visualization. Elzenet al. [25], [26] combine the circular hierarchical edgebundle view and massive sequence view (MSV) to detect

unexpected suspicious communication patterns. The noveltyof this visualization tool is that it incorporates node reorderingstrategies in MSV. The reordering techniques take account ofclosure, proximity, and similarity to ensure outliers stand outfrom mass data. Webga and Lu [44] project nodes (i.e., users)into a circular layout to discover rating frauds from the tem-poral relationship between users and items. The combinationof singular value decomposition diagram, re-ordered matrixrepresentation, and the temporal view reveals interesting grouppatterns of items. These patterns share a similar rating historyand users of similar behaviors.

D. Interaction Methods

Visual analytics of social interaction behaviors applies track-ing & monitoring as one of the first steps of exploratoryanalysis. TwitInfo [36] tracks bursts of events in time seriesby highlighting the event peaks in a line chart. These peakssuggest events that trigger heated discussion on Twitter. Kovenet al. [28] multi-select summaries of email contents in themain panel to keep track of important keywords regardingscamming activities. FluxFlow [32] monitors information dif-fusion using multiple coordinated views. As analysts selecta point in tree view, the diffusion pattern generated by theuser’s reposting behavior is shown in the thread view. The in-teraction is usually achieved in tools with multiple coordinatedviews [25], [18], [48], [34], [57], [58].

Exploration & navigation allows analysts to focus on dif-ferent subranges of data flexibly. Végas et al. [54] design ascrolling bar, allowing analysts to review email conversationin different periods of time. TargetVue [34] enables analyststo zoom and pan in global and inspection view to locate toanomalous areas. Exploration in Episogram [52] is not limitedto zooming function. Analysts can select a user of interest, andaggregate all users who perform the same posting/repostingactivity. In this way, an individual’s details as well as thegeneral trend are obtained. MobiVis [48] designs a “behaviorring”, from which analysts select different levels of granularityto arrange calling events in a radial form around a node. Thelength of petals corresponds to the duration of selected events.

Pattern discovery is achieved in various forms of inter-action such as filtering. Gloor et al. [50] visualize emaildata to discern the structure of networks and identify corecontributors. Emails are presented according to the type oflinks (i.e., “To”/“From”/“Cc”) in the email network. Scat-terBlogs2 [38] supports generation of task-tailored filters inthe classifier creation environment. In the monitoring set-ting, analysts can orchestrate the filters to detect anomaloususers. Sorting visual objects also uncovers interesting patterns.Cloudlines [42] visualizes online news events in timelines ineither linear or logarithmic scale. The tool allows analyststo reconfigure visual objects via click and drag. Webga andLu [44] detect rating frauds in the projection view, whichcontains two orthogonal axes inside a circle. Analysts canchoose any two dimensions and the mapping method to digout the outlier pattern. Changing encoding scheme is useful.Chae et al. [47] demonstrate events detected from microblog

Page 9: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

messages with a heat map, scatters, and hexagons on a map.TargetVue [34] encodes users’ action in a time sequence,anomalousness of their behaviors, and correlation to threeglyph designs, so that analysts acquire various perspectivesof the social accounts.

Analysts may want to save results of analysis for futurestudy. For example, documents of interest can be saved inthe evidence box of the EventRiver [51] visualization tool.This function supports hypothesis evaluation and evidenceexchange. Koven et al. [28] allow analysts to share tagscreated during analysis of email contents. Visualization on awebsite tends to have more flexible applications of knowl-edge externalization than stand-alone tools. After one analyzethe anomalous extent of social bots in the Hoaxy platform(https://hoaxy.iuni.iu.edu/) [35], the results can be saved intoCSV files for sharing.

Refinement & identification is conducted after analysts haveobtained a basic understanding of social interaction behaviors.LeadLine [46] associates events with corresponding time-sensitive keywords automatically. Analysts can then annotatethe events manually to provide accurate labels. There are twolabeling strategies in EventRiver [51]: representative eventlabeling and outlier labeling. On one hand, representativelabeling is for events that contribute to the biggest cluster of astory. On the other hand, outlier labeling labels outlier eventsin a story. Koven et al. [28] emphasize tagging abilities indiscoveries of anomalies. Analysts can label an account asa scammer, victim, service, or other categories. These tagscan be used for creating filters as well as the calculation ofstatistics about scamming activities.

V. TRAVEL

Travel is physical movements of users between placescontaining geographic information. Analysis of travel behav-iors is meaningful for traffic monitoring, urban safety, andurban planning [59]. Travel behavior data can be collectedfrom mobile phones and base stations, Global PositioningSystem (GPS), maritime search and rescue events, and medicalrecords. Anomalous travel behaviors differ from the expectedpatterns indicated by individual historic records or activities ofthe crowd. Examples include irregular driving direction [60],[59], hotspots (e.g., crowded neighborhoods) [59], [61], [62],and characteristic travel patterns associated with groups oftravelers [20], [63]. These anomalous behaviors can revealpotentially harmful events such as disease outbreaks andterrorist attacks.

A. Data Types

Spatiotemporal data is essential to describe the informationof when and where about users’ physical motion. Spatial dataconsists of latitudes and longitudes, trajectories, pickup/drop-off locations, locations of base stations, etc. Temporal dataincludes timestamps of indoor activities, estimated time arrival,and pickup/drop-off date and time. Analysis of travel behaviorusually combines both spatial and temporal data. Pu et al. [64]explore mobility patterns of different user groups from mobile

phone data collected from each base station and handoffdata (i.e., successive calls with different base station IDs).Spatiotemporal data related to communication include the starttime of calls, time duration, the city of the opposite side ofcalls, and location and direction of base stations. TelCoVis [61]explores co-occurrence of people using telco data, which isa type of all-in-one mobile phone data containing activityrecords of calls, messages, and Internet usage. Data of eachtype of activity is comprised of timestamps, base station IDand the corresponding latitude and longitude. Kim et al. [65]create a visualization that helps comprehend flow patterns byanalyzing the spatial distribution of non-directional discreteevents over time.

Multidimensional data enriches skeletons of analysis oftravel behavior. A combination of attributes including distancetraveled, speed of cars, tip amount and toll amount for taxitrips, and frequency of residents’ indoor activities provides adetailed description of travelers or vehicles. Pu et al. [64]aggregate multidimensional data associated with base stationsand mobile phone users. The data includes the total numberof phone calls made by each user at each station and at allstations, in addition to spatiotemporal details. Malik et al. [66]evaluate the potential risks of Coast Guard search and rescue(SAR) operations to better plan response actions to mitigaterisks. The SAR data consists of two components: responsecases and response sorties. Multidimensional data of eachcomponent contains the number of lives saved, lost, andassisted. Voila [59] extracts multidimensional features to detectabnormal incoming and outgoing taxi flows in a cell (a regionis segmented into multiple cells). Examples of the features arethe number of vehicles that flow in and out from one cell toanother. Analysis of inflows and outflows for multiple cellsconsist of multidimensional data.

Text associated with travel behavior is mainly used foridentification and categorization. Examples include user ID,textual messages, and roam type and toll type. Pu et al. [64]collect information of mobile phone ID, International MobileEquipment Identity, city ID, roam city, roam type, and tolltype to describe properties of mobile phones. These detailshelp explain the nature of mobile phone users, i.e., travelers.Beecham et al. [63] categorize people into different groupsin order to summarize group-cycling behaviors. Cyclists underthe cycle hire scheme are classified according to age, sex,full postcode, whether they cycle more with others or onan individual basis, and spatiotemporal information. Liaoet al. [67] study resident indoor activities. These activitiesinclude not only long-term activities such as sleep, relax, watchTV, but also short-term ones such as entering home.

Network data refers to trajectories between origins anddestinations. Network data is mainly used to complementspatiotemporal analysis. Ko et al. [68] assess flight journeysthat often delay by analyzing pairs of origin and destinationairports. By aggregating the amount of delays for each flightjourney (i.e., network), analysts detect anomalous airportsand flights where prevalent delays are often found. Beechamet al. [63] study group-cycle journeys that link starting points

Page 10: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

and destinations.

B. Anomaly Detection Techniques

Statistical anomaly detection technique is the most oftenused to analyze travel behaviors. A data-driven approach [70]using self-organizing maps and Gaussian mixture modelsare applied to describing normal behaviors of vessels. Bycomparing data with rules and signatures, unusual travelingpatterns are detected through visual analytics. A box-plotmethod [56] checks whether geographical regularity deviatesfrom normal conditions by large extents. Two visualizationworks [71], [69] apply cumulative sum (CUSUM) algorithmsafter kernel density estimation to better identify outliers intime series. One [71] calculates density estimation for theevent category as well as density estimation for all categories,and obtain the expected number of events within a givenarea. Outbreaks in the temporal domain can be detected withthe cumulative summation algorithm for the given location.Applying CUSUM after kernel density estimation enablesanalysts to spot spatial areas worth investigation quickly, andthen analyze historical time series to look for unusual trends.The other work [69] also utilizes CUSUM algorithm to traceuncommon development patterns.

Clustering-based is employed to reduce computation com-plexity and visual clutter for large-scale databases. Andrienkoet al. [72] use k-means clustering to analyze spatiotemporalphenomena described by multiple spatial time series. Theclustering approach groups spatial objects by the similarityof their corresponding time series, and thus spatially unusualevents can be detected. The clustering approach is used inconjunction with statistical methods to model time seriessuch that residuals are randomly distributed over time. Highdeviations from expected time values are seen as anomalies.K-means clustering is also used to detect anomalies of mo-bility patterns around base stations [64] and group-cyclingbehavior [63]. This clustering approach requires the numberof output clusters to be specified before computation. Linet al. [73] propose VizTree and Diff-Tree to mine anomalouspatterns by comparing time series (e.g., yoga postures) withnormal references. It uses bottom-up hierarchical clusteringto produce a nested hierarchy of similar groups of objectsbased on a pairwise distance matrix. TelCoVis [61] appliesa biclustering technique in binary matrices, where 1 meansco-occurs of human mobility and 0 means otherwise. Thus,origins and destinations of human mobility can be bundledinto coordinated sets as biclusters.

Nearest neighbor-based anomaly detection techniques com-pute the continuous distribution for detection and anomalyscores. KDE computes the spatial and/or temporal distributionof discrete events, which is particularly useful for detectinghotspots in density-based visualizations. Malik et al. [66]employs a modified variable KDE technique to identify spatialhotspots of search and rescue cases in the U.S. Coast Guard.Kim et al. [65] compute continuous spatiotemporal distribu-tions of discrete events by applying the KDE approach to two-dimensional data, which is achieved without trajectory infor-

mation. Local outlier factor (LOF), a density-based nearestneighbor-based technique is used to calculate anomaly degreeof indoor daily activities of residents. Duration, number oftimes, and start time are selected as the properties to computeoutliers.

C. Visualization Techniques

Egocentric Behaviors. Egocentric Travel Behavior is indi-vidual physical movement in geographic space. An exampleof anomalies associated with egocentric travel behavior is anunexpected increase in time spent on indoor activities. Chartvisualization is seen to represent egocentric travel behaviors.

VizTree [73] uses suffix tree visualization to indicate ab-normal parts of the time series by comparing with reference(i.e., normal) patterns. Anomaly detection is achieved bytransforming a time series into a symbolic representation andvisualizing it as a modified suffix tree. Weaver et al. [20]explore individual hotel visitors in a calendar view, a mapview, and an arc diagram. A calendar view shows total visitson each day, with squares and circles indicating weekendsand weekdays, respectively. A multi-layer map view describespaths from residences to hotels, relative to railroads and rivers.By synthesizing temporal and spatial patterns observed frommultiple views, analysts obtain circuitous routes taken bysalesmen, cooperation between traveling merchants, and theeffects of weather and seasonal variations, etc. Liao et al. [67]are interested in resident behaviors recorded by smart homevisual systems. A heat Gantt chart view shows start time,duration, and the number of occurrence of different activitieson a daily basis. By combining the heat Gantt chart with otherviews, activities that deviate from daily routines are detectedthrough comparison on different daily records.

Geographic visualization is also seen for egocentric travelbehaviors. A transit map displays GPS traces [60] of movingtaxis in basic mode, monitoring mode, and tagging mode.Taxis are represented by glyphs on the map, with the colorsdependent on whether the taxi is loaded with passengers. Ataxi with an irregular driving direction or moving at highspeed, and a crowded neighborhood are egocentric anomaloustravel behaviors.

Collective Behaviors. A collection of users move togetherin time and space, we say their travel behaviors are collective.Abnormal travel behaviors can be identified from regionscrowded with people. As most visualization tools studyingcollective travel behaviors employ geographic visualization,we analyze travel behaviors using the finer categories undergeographic visualization including flow maps, heat maps, andbubble/dot map.

Flow maps represent trajectories by linking origins and des-tinations on a map. Andrienko [72] proposes a framework forspatiotemporal analysis and modeling. Anomalies are found intemporal line charts displaying model residuals. Spatial flowsbetween cells are represented by directed half-arrows whosewidths are proportional to the total counts of objects thatmove. The flows are laid upon Voronoi maps. Trajectories ofcycling patterns are shown as flows on a London city map [63].

Page 11: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Fig. 4. Visualizations of anomalous travel behaviors. (a) Kim et al. [65] show origin and destination via directions of arrows in a flow map. (b) Ferreiraet al. [62] investigate anomalous taxi trips in New York city in multiple coordinated views of a dot map and a line chart. (c) Voila [59] displays unusualtraffic flows between a focal region using heap map. (d) Von et al. [69] visualize different types spatiotemporal patterns by parallel coordinates. (e) Wuet al. [61] design a contour-based treemap to illustrate spatial and temporal characteristics of human mobility at a specific place.

The straight and curved end of a flow represent origin anddestination, respectively. Group journeys are colored red onthe map whereas non-group journeys are colored blue. Oneof the findings is that female cyclists are more likely tomake late evening journeys when cycling in groups. Kimet al. [65] (Figure 4 (a)) extract, represent, and analyze flowmaps and heat maps of spatiotemporal data without the useof trajectory information. The flow map visualizes origin anddestination via directions of arrows, and the difference of flowsare encoded in heat maps. Hot spots can be found with thisvisualization.

Heat maps display spatial densities of collective travelbehaviors. Maciejewski et al. [71] develop an interactivevisual environment to dig out hot spots in spatiotemporal datafor crime analysis or surveillance syndrome. Bivariate andmultivariate heat maps help detect spatiotemporal hot spots bycombining height maps, colors, and contours. To analyze risksof Coast Guard search and rescue (SAR), Malik et al. [66]identify potential hot spots using heat maps. Risks of stationsare indicated by the intensity of colors. The red heat mapshows the time taken by stations to deploy an asset to an SARaccident while the green heat map indicates the SAR coverage.Ferreira et al. [62] (Figure 4 (b)) investigate anomalous taxitrips in New York city in multiple coordinated views of a dotmap and chart visualizations. Dots on a map imply pickup anddropoff sites in the region. In the cases of Hurricane Sandyand Irene, there are virtually no dots during hurricanes, buttraffic seemed to go back to normal in the following days.Voila [59] (Figure 4 (c)) explores taxi trips to detect suddenchanges in traffic patterns. There is an anomaly detection

mode giving visual cues of regional anomalies, and a contextmode providing information of volume difference, traffic flow,and expected patterns at different times. Unusual traffic flowsbetween a focal region and two other places are highlightedby the deep red color of heat maps. Feedback from analystscan update the anomaly score and thus change the color ofheat maps for the selected region.

We analyze other visualization techniques for travel be-haviors including sequence and graph visualization. Vonet al. [69] (Figure 4 (d)) categorize spatiotemporal patternsinto different types of locations according to home, work,tennis, etc. The main view is Dynamic Categorical Data Viewin a varied form of parallel coordinates, which show theevolution of all types of data. Each axis of parallel coordinatesindicates a point in time. When analysts select a type ofdata, related geographic information is plotted in the linkedmap, where arrows on the map indicate physical movementof people. In TelCoVis [61], Wu et al. design a contour-basedtreemap to illustrate the spatial and temporal characteristicsof human mobility. By combining with heat map, matrix, andparallel coordinates, analysts gain insights into co-occurrenceof human mobility and correlations of co-occurrence.

D. Interaction Methods

Analysts track and monitor data to look for anomalies.Uninteresting and expected patterns can be unmarked [73].This improves the efficiency of detection processes and re-duces false positives. TelCoVis [61] emphasizes the correlationbetween spatial and temporal data for exploring the co-occurrence of human mobility. When analysts hover on asector in the contour-based treemap, all sectors corresponding

Page 12: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

to the same region will be highlighted. Moreover, analysts canmark the region for exploration. Analysts can track a set offeatures of categoric data [69] including location, movementpattern, group membership, and group changes. The selecteddata instances are highlighted in the linked map view and thecategoric view.

The interactions associated with exploration & navigationpiece separate fragments of data. Panning and altering viewsvia scrollbars facilitate detection of non-trivial patterns in largetime series databases [73]. High-level outlooks and detailsshould be accessed interchangeably when exploring travelbehaviors. Different levels of aggregation in time [66], [62],[72] and space [62], [68], [59], [63] are seen in a variety ofvisualization tools.

Unusual travel patterns are uncovered by filtering, config-uration, and encoding to various visual forms. The anomalygrading view in SHVis [67] present anomaly scores of selectedactivities. Analysts click on different days and drag dateintervals to compare the activities during the different periodsof time. In order to analyze maritime operations and assessrisks associated with the allocation of resources [66], analystsgenerate a combination of filters which can be applied tospatial regions and temporal plots. In addition, analysts canevaluate the effects as a result of opening/closing a station, anddetermine which station is suited for closing. Visualization canbe altered in color and in form to reveal anomalous patterns.Andrienko [72] builds a framework for spatiotemporal analy-sis. A rich set of interactive exploration is embedded. Analystscan change the color scheme and assign colors to clusters onmaps and line charts. Analysts can choose the parameters tobe mapped in the parallel coordinates, and adjust smoothingparameters as well as the time period for the contour-basedtreemap in TelCoVis [61].

Externalization of results records analysts of importantdiscoveries. Voila [59] includes a snapshot panel for analyststo conveniently capture the overall and detailed map views.Ferreira et al. [62] explore taxi trips using TaxiVis, whichsupports exporting query results in CSV files, the same type offiles as their input source. The visual analytics framework [72]models spatiotemporal data. The model description files canbe stored externally along with group membership of place,statistical details.

As analysts gradually develop basic knowledge, they rec-ognize suspicious areas and integrate domain knowledge inanomaly detection. After a link is described as anomalous, thelink is placed on the top of visualization while the other linksbecome transparent [68]. In Voila [59], analysts incorporatetheir judgments about whether the region is anomalous. Thisfeedback is taken into consideration in the recalculation ofanomaly scores of all regions in the space.

VI. NETWORK COMMUNICATION

Network communication is sending and receiving informa-tion between machines via networks. Examination of net-work communication has practical significance for national

defense [74] and commercial enterprises [75]. Network com-munication behaviors include routing, network traffic, andport activities, etc. Anomaly detection associated with net-work communication is usually concerned with cyber security,which is protecting computers and systems against maliciousactivities in a computer-related system. Anomalies are indi-cated by alarms and suspicious patterns that deviate fromexpectation. Investigation into these signals reveal attacks suchas BGP routing instability [76], [77], virus outbreak [78], portscans [79], [80], [81], and intrusion into systems [82], [83].

A. Data Types

The identified connection between sources and destinationsis seen as network data. Network data is important for detect-ing anomalous network communication, as it is the foundationfor analyzing information exchange between machines. Forexample, the network connection between autonomous do-mains (ASes) [84] and that between subnets and hosts [85] canbe analyzed. VisFlowConnect [78] focuses on network trafficbetween an internal domain sender and an internal/externaldomain receiver. Liao et al. [75] represent enterprise net-works consisting of hosts, users, and applications as host-user-application connectivity graphs. From the graphs, the similar-ity of users by applications can be assessed. VisAlert [86],[74] considers large-scale attack patterns between alerts andlocal networks. Analysts can obtain an overview of intrusionattempts and general situations by inspecting networks formedby alerts and a topology map of local network nodes.

Multidimensional data contains multiple numeric attributesto describe context information in network communication.Attack frequency, flow rates (i.e., number of packets and bytesfor a fixed period), and system load are examples of mul-tidimensional data when discussing network communicationbehavior. Teoh et al. [76] uses intensity, categorical, andcounting measures to describe routing behaviors. Each mea-sure has its corresponding degree of abnormality. The anomalythreshold is calculated from the anomaly degrees of multiplemeasures. SpiralView [87] presents a connection as a list ofevents introduced in terms of time, source host, application,and destination host. The details of connection are describedusing multidimensional data, which are incorporated in thedescription of alarms. MVSec [80] uses multidimensional dataincluding the number of connections, flow counts, and flowbytes. The statistics are combined with temporal features toexplain each unit of network security data.

Spatiotemporal data of network communication associatesmainly with addresses of receivers and/or senders, and tem-poral information of occurred activities. Spatiotemporal dataprovides details of timestamps and IP addresses. Investiga-tion of spatiotemporal data is helpful for traffic monitoring,as can be seen in [88] which deals with timestamps frommillisecond to year together with IP addresses from IP prefixto continents. Erbahcer et al. [21] explore time and differencein IP addresses between the external domain and that ofthe monitored system. The greater the differences betweenaddresses, the more suspicious the network communication

Page 13: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

is. SpiralView [87] is interested in how alarms evolve in timewith the purpose of detecting periodic patterns. By inspectingalarms of the same level of attack severity, alarms can besegmented based on their temporal distribution to better under-stand network behaviors. VisTracer [77] visualizes destinationASes of traceroutes against time to assess spatiotemporalpatterns of occurred anomalies.

Text data type provides low-level details about connectionsin cyber networks. Text data can be encoded to visualizationfor high-level exploration, or acts as evidence for confirmationof hypothesis regarding anomalousness. Text data includes tex-tual logs and categories of events. Erbacher et al. [21] repre-sent textual log information using glyphs. Textual logs containtime, locations and, types of connection. Teoh et al. [82]project connections with known classes (i.e., normal, probe,DOS, U2R, and R2L) into regions in a visualization panel.Suspicious data is found separate from normal data, facilitatingfurther investigation.

B. Anomaly Detection Techniques

Statistical anomaly detection techniques are widely used inthe detection of abnormal network communication. Detectiontechniques of cyber attacks are categorized into signature-based (matching suspicious behaviors with known attackpatterns based on existing statistical models or rules) andanomaly-based (comparing behaviors against a “normal” base-line) [10], both of which can be described using statisticalmethods.

We describe visualization works that incorporate statisticalmethods below. Teoh et al. [76] investigate BGP routinginstability with a signature-based detection and a statistics-based algorithm. Signatures based on bursts of sequence withina time window are matched with data. The statistics-basedapproach raises an alarm when current behaviors deviatefrom expected patterns obtained from history. VIAssist [90]highlights data instances that meet the criteria of attacksseen in the catalog and discovers the unexpected patternsby interactive exploration of visualization. Mansmann [88]applies a signature-based algorithm to detecting botnet spreadpropagation whereas significant traffic changes are visual-ized in a readily noticeable form. VisTracer [91], [77] com-pares anomalies with existing scenarios of BGP hijacking.Unknown suspicious attacks are found by adapting onlinechange-point detection algorithm and comparing path similar-ity. MVSec [80] uncovers overall network state details by visu-alizing several statistical time series including network trafficand the number of distinct active IPs over time. Suspiciouspatterns are analyzed in terms of what, when, and where fromstatistics (e.g., time interval, flow counts, flow bytes). Taoet al. [89] detect point anomalies with a Gaussian model-based technique for labeled data, and with a histogram-basedtechnique for unlabeled data. The correlation analysis andpropagation of anomaly score is performed to detect collectiveanomalies.

Classification-based methods are used in intrusion detec-tion [82], [87]. Teoh et al. [82] utilize a user-directed drawing

program, PaintingClass, to classify each object and predict thecategories. Unsupervised attacks are found by comparing posi-tions of normal instances and unlabeled data. SpiralView [87]models user behaviors using Bayesian networks, and raisesanomalies for deviations from usual behaviors.

Nearest neighbor-based techniques based on similarity isapplied in [75], which transforms relations among hosts, users,and applications into network connectivity graphs, bipartitegraphs, multidimensional scaling, and similarity graphs. Theinter-graph similarity is evaluated in a top-down manner, andnode similarity is analyzed based on the dynamics of nodedegrees. LongLine [92] uses local outlier factor to facilitatethe comparison of temporal patterns of anomalous systemsbehaviors. The tool employs a frequency-based model whichidentifies files and addresses in audit logs as an individualentity. The entity is described by a feature vector constructedfrom their extended bag of system call models.

TVi [93] uses a spectral technique to direct users to timeperiods of anomalous activities. The tool derives a scalablemetric (entropy from IP addresses and ports) and conducts di-mension reduction using principle component analysis (PCA).NStreamAware [83] applies a DBSCAN algorithm to clustertimelines, which achieves event detection in streaming data.The possibly important temporal segments are further assessedby analysts through interactive exploration.

C. Visualization Techniques

Egocentric Behaviors. An egocentric network communi-cation behavior triggers alarms due to suspicious networkproperties of the connection between source host(s) and des-tination host(s). Examples of egocentric anomalous networkcommunication behaviors are hijacking network traces byanother AS, a port scan, and unusually high volume of trafficon a machine. Glyph and graph visualizations are used torepresent egocentric behaviors.

Erbacher et al. [21] initiated one of the earliest visual-izations to display IP addresses of alarms in a glyph-basedradial form. Line glyphs surrounding a central node representdifferent types of connection (e.g., parallel lines indicate initialconnection requests). The difference in IP addresses betweenthe external domain and that of the monitored system isencoded in the length of line glyphs. The suspicious connec-tion is colored red due to unexpected user activity such astimeouts expire. Teoh et al. [76] inquire into Border GatewayProtocol (BGP) routing instability. Near-real-time monitoringof Internet routing is pictured as temporal line charts andglyphs, where a suspicious event detected from statistics isillustrated with a large circle in high position and a spike intimeline.

Graph visualization, especially matrix is used to de-tect anomalous egocentric network communication. Goodallet al. [81] develop a matrix showing network activity of hostsover time. Communication between hosts is superimposed onthe matrix, complemented by multiple linked views detailingport activity and raw packets. NVisionIP [85] detects tracesof abnormal network behaviors in multiple levels of an entire

Page 14: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Fig. 5. Visualizations of anomalous network communication behaviors. (a) VisTracer [77] visualizes routing anomalies in traceroutes using matrix. (b)Tao et al. [89] design a high-order correlation graph to show collective anomalies. (c) MVSec [80] mines correlation of events attributed by what, when andwhere in a dandelion-metaphor using circular-based design. (d) SpiralView [87] analyzes how alarms evolve in time and detect suspicious patterns using aradar chart.

class-B IP network. NVisionIP consists of a galaxy view inmatrix, a small multiples view, and a machine view with barchart. Spikes in traffic volume are seen as changes in nodecolors in the matrix. Simple scanning attacks are discoveredas clusters in the matrix, where x- and y-axis stand forsubnets and hosts, respectively. VisTracer [77] (Figure 5 (a))tackles large trace route data sets to distinguish legitimaterouting changes and spam campaigns. Time and destinationof ASes are represented by x- and y-axis in a matrix layout.Rectangular glyphs in the matrix layout are anomalies. Twonearly identical anomaly patterns at the same x-position in thematrix indicate routing anomalies in two ASes.

Collective Behaviors. Collective network communicationbehaviors involve more than one exchange of information be-tween two machines or among multiple machines. Anomalousbehaviors include botnet infection and periodic attacks, whichare represented in graph and sequence visualizations.

Tree visualization, one of the graph visualization, helpsidentify anomalous network communication behaviors. Teohet al. [84] examine routing behavior of BGP data. Each IPaddress is mapped to one pixel in a quadtree visualizationto detect anomalous origin AS changes. An event is repre-sented by a line connecting the affected IP prefix and ASes.Anomalies are revealed as an area concentrated in lines, sinceevents that take similar paths multiple times are suspicious.Teoh et al. [82] detect intruders by allowing analysts tointeractively explore activity logs in an interactive decisiontree visualization layout. Complementary to this view, a three-dimensional scatter diagram pinpoints unlabeled anomalieswhen a high-density cluster lies in areas of sparse training data.Mansmann et al. [88] aggregate IP addresses according toprefix, autonomous system, country and continent in treemapsbased on two layout algorithms. This visualization helps

monitor large-scale network data. Segments in treemaps arecolored indicating sharp changes in the number of incomingconnections.

Node-link diagrams visualize structures of collective net-work communication. Tao et al. [89] (Figure 5 (b)) designa high-order correlation graph to show collective anomalies.When applied to software analysis, malicious attacks due tosoftware vulnerabilities are identified as collective anomalies.In this case, a node illustrates each line of code, an eventrepresents an execution, and a correlation link represents dataflow. NIVA [94], [95] coordinate 3D node-link view withglyph design and circular histograms. It distinguishes fromother visualizations as it builds attack severity into interactioninspired by the “haptic” concept. For example, when draggingnodes in the three-dimensional view, users can feel the forceof “push” and “pull” motion computed based upon attackfrequency.

Circular-based visualization is also used to demonstrate col-lective network communication behaviors. VisAlert [86], [74]identifies critical attacks of hosts through analyzing “what,when, where” information of alerts. The alerts are allocated onsegments of rings according to the severity of attacks. “When”attribute is mapped such that the innermost ring representsthe most recent activities. Inside the ring, a network topologymap is used to depict network under scrutiny. FloVis [79]observes interactions between host pairs on either side ofthe monitored border. A bundle diagram displays connectionsbetween entities in a radial tree layout. Scanning activitiescan be detected by examining bundles directed from 9000consecutively numbered ports to the internal host. MVSec [80]presents four coordinated views to discover anomalies andretrieve stories behind subtle events. The event radar view(Figure 5 (c)) mines correlation of events attributed by what,

Page 15: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

when and where in a dandelion-metaphor in a ring. Seeds(i.e., subnets) spread from the center of the dandelion stalk,which represents the only entrance to the network. Antennas(i.e., hosts) extend from the seed, giving a two-layer hierarchi-cal structure. The seriousness of botnet infection, for instance,is indicated by the number of colored nodes in the dandelion-metaphor.

Sequence visualization uncover abnormal trends of collec-tive network communication. While NVisionIP [85] focuseson activities occurred on machines, its complementary toolVisFlowConnect [78] explores network flows between ma-chines using parallel coordinates. VisFlowConnect investigatesthe relationship between senders and receivers. A cluster oflines originating from an external host sender indicates a virusoutbreak. SpiralView [87] (Figure 5 (d)) analyzes how alarmsevolve in time and detect suspicious patterns (e.g., alarmsappearing everyday at the same time). The alarms are scattereddots in a radar chart, which is useful for identifying periodicpatterns of intrusions. The alarms are arranged from thecenter to the outer part so that recent events are allocatedwith more space. NStreamAware [83] analyzes a condensedheterogeneous data stream and uses a sliding slice to providea summary for the selected period of time. The tool supportsomitting and merging normal ranges so that suspicious portactivities, attack patterns, and routing behaviors are revealed.

D. Interaction Methods

Detection of anomalous network communication requirestracking & monitoring. Teoh et al. [76] direct analysts’attention to anomalies by highlighting the background gray.In the TVi [93] visual querying system, analysts select anitem in the anomaly list, and then the associated time range ishighlighted in the timeline visualization. In NVisAware [83],analysts can click the star icon to store the real-time slidingslice under investigation. The events marked with star iconsare added to the same view. Analysts can determine suspiciouspatterns from flagged and labeled events from the starredtime slices. There are four coordinated views in MVSec [80].Interaction in one view is linked to visualization in anotherview, which is helpful for digging hidden network attacks thatare hard to recognize.

Interesting network communication behaviors are found byexploring visual elements in the same scale or in multiplelevels of granularity. VisAlert [86], [74] enables panning andzooming operations of the topology map in the ring. Analystscan also configure projections onto rings by collapsing andexpanding alert grouping on rings. Tao et al. [89] employsthe direct-walk technique (i.e., a series of mouse clicks) for ex-ploring anomalies. When an analyst notices a suspicious node,he/she clicks another node that contributes to the anomalyof the suspicious node. That is, the analyst extends examineseffects on the node due to more nodes. Mansmann et al. [96],[88] aggregate IP addresses according to prefix, autonomoussystem, country and continent in treemaps. Drill-down androll-up functions can be applied for nodes of the same levelof detail.

Interactive methods are used to unveil suspicious patternsof data. The filter dialogue in NVisionIP [85] restricts whatdata flows to be visualized. Analysts visualize network trafficaccording to the filters based upon the combination of IPaddress, ports, protocols, and display type. The visual analyticstool FloVis [79] has a bundle diagram that describes networkflows between a source and a destination. Analysts can loosenthe bundles to find suspicious attack patterns. Additionally,analysts can choose to linearly distort points on the circleof the bundle diagram. Mansmann et al. [96], [88] colordata in treemaps in a linear or logarithmic scale. Coloring inthe logarithmic scale makes the visualization resistant to therandomness of data. Teoh et al. [82] use a painting programto help categorize the same type of anomalies into one group.Analysts interactively arrange data instances through drawing,partition, and appropriate coloring.

Analysts may keep a record of results for further analysis.The intrusion detection tool NIVA [95] allows analysts to ex-port results in an ASCII format. VIAssist [90] is designed forcollaborative working environments. The report builder in thevisualization tool allows analysts to drag and drop graphicalobjects in the current display. The results with annotations canthen be saved as PowerPoint or PDF file. MVSec [80] simplifyanalysts’ operation by offering frequently-used configurationfiles for anomaly detection. Analysts can export their config-urations as a new configuration file.

VIAssist [90] has an expression builder and E-Diary to ful-fill the refinement & identification task. Analysts can formulatea hypothesis about a suspicious activity into an expression.A catalog of expressions collects knowledge, i.e., hypothesesmade by analysts during analysis. The E-Diary helps docu-mentation of hypotheses. This encourages sharing annotationswith colleagues and communication of hypotheses in a group.Analysts can annotate suspicious patterns in SpiralView [87]for long-term analysis and policy’s assessment. The annota-tions can be an explanation for the anomalies and the actionapplied to the system.

VII. TRANSACTION

Transaction refers to monetary flows in buying and selling.The goal is to connect financial sources to companies orindividuals. In a broad sense, stock market deals [97], creditcard transactions [98], business processes [99], [100] are underthis category. Frauds are the typical type of anomalies associ-ated with transactions, as people may be allured by monetarybenefits to perform illegal transactions. Clients may colludewith employees in financial institutes in activities of moneylaundering, unauthorized transactions, and embezzlement, etc.[101]. Other anomalies include unexpected business processes[102], [100] and high default group in a network of guaranteedloans [103].

A. Data Types

Spatiotemporal data describes details of location, times-tamps of transaction, and time series of events. Spatioemporalanalysis is critical in financial analysis, and thus detection

Page 16: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

of anomalous transactions often incorporates analysis of geo-graphic locations and time series. Attributes including timeof transaction [100], [98], how often a customer executesoperations [104] and geographic regions [105], [101] providea foundation for first-step analysis. For example, the EventTunnel [100] conducts temporal correlation to link seeminglyisolated events, and thus business patterns and fraud patternsinvolving more than one individual [97] can be uncovered.Huang et al. [97] perform spatial correlation in additionto temporal and spectral (based on frequency) to identifysuspected traders and attack plans.

Multidimensional data is often used in conjunction with spa-tiotemporal data to detect anomalous transactions. By probinginto time series along with details of the amount of moneytransferred [101], [98], [105], the number of transactionswithin a period of time [99], [105], and number of the activitiesthat are new to the user [106], analysts can gain an overallpicture of the histories of financial transactions. An exampleof using multidimensional and spatiotemporal information isVisImpact [105]. VisImpact correlates variables of purchasequarter (i.e., temporal details), fraud amount, and fraud countto reveal relationships among important factors. Legg [106]identifies insider threats in an organization by inspectingmultidimensional data including the number of times that theuser performs particular tasks, number of these activities thatare new to this user and to any user in this same position.

Network data describes relationships among entities in-volved in transactions. A network can be links betweentraders [97] in trading networks, between entities such aspeople, companies, and banks [107], and between enterprisesthat take loan guarantee [103]. For example, Niu et al. [103]consider high default groups as communities in networks.A community that interacts with each other internally morefrequently than those outside of it can trigger serious financiallosses. Didimo et al. [107] analyze categorical networks thatcontain different types of entities to discover financial crimes.Indices such as the centrality of a node, like betweenness, andnode degree are measured to indicate anomalousness.

When analyzing transaction behavior, categories derivedfrom text help describe the relationship between a payerand a payee [108], [109], label different types of activitiesconducted by employees [106], and identify the type of statechanges in a business process [100]. Text data is used todistinguish between senders, intermediates, and receivers infinancial transactions, and to build profiles for analyzing theirpotential suspicious behaviors. For example, WireVis [108],[109] extracts pre-defined keywords from a set of transactionsand relates the keywords that appear in the same transaction.Keyword-to-account relationship is analyzed based on thenumber of time the keywords appear in that transaction. Jigsaw[110] help identify any linkages between people or companiesrelevant to financial frauds such as fictitious suppliers’ invoicesand systematic deletion of suppliers’ invoices. These linkagesare found by keyword/sentence summaries of transactions,sentiment, and word clouds of a document.

B. Anomaly Detection Techniques

Statistical methods applied to the detection of suspicioustransactions build normal profiles of customers, and then eval-uate new transactions against known anomalies in historicaldata. Huang et al. [97] match suspected patterns in spatial,temporal, and spectral (i.e., frequency) domains with similarpatterns seen in historical databases, which act as anomaloussignatures. Leite et al. [104] first build customer profilesfrom their frequency, amount, and location of transfer fromhistories. New transactions are then evaluated against theprofiles to see if they are anomalous. The visualization toolEVA [101] generates customer profiles and provides differ-ent statistical measures for new transactions. The statisticalprofiles combine histograms and rules specified by experts toprovide references. Sudden behavior changes in comparisonto the profiles are identified as suspicious. Anomalies arehighlighted if anomaly scores exceed a threshold.

Application of clustering-based techniques is based on theassumption that anomalous financial communities share com-mon features within a group. WireVis [108], [109] implementsthe k-d tree algorithm to detect suspicious behaviors in wiretransactions. It treats accounts as points in k-dimensionalspace, where k is the number of attributes. The accounts aregrouped using a centroid-based clustering technique. Schaeferet al. [112] cluster entries based on similarity of temporalevent patterns so that analysts can identify suspicious patternsin a packed visualization. An event pattern refers to an eventsequence or event episode that displays interesting properties.Didimo et al. [107] apply hierarchical clustering by findingk-cores in a graph, which is effective for discovering relevantgroups in networks. This graph-based clustering defines clus-ters of cohesive structures, in which each cluster has at least kinter-connected neighboring points. Clustering based on graphstructure is used in Network Explorer [113]. Communities inthe financial network can be identified as clusters convertedfrom undifferentiated nodes and edges. Two clustering algo-rithms are employed to process large-scale networks on theserver side and process smaller networks on the client side.

Classification-based, nearest neighbor-based, informationtheoretic, and spectral techniques are discussed below. Ol-szewski [98] uses a threshold-type binary classification tech-nique to determine whether an account in self-organizing maps(SOM) is fraudulent or not. The threshold is computed bymeasuring dissimilarity between the centroid of SOM grid andthe maximal value in the matrix. Accounts with values higherthan the threshold are anomalous. A decision tree [111] isgenerated from the patterns suggested by auditors. To detectinternal frauds conducted by employees, each employee isassigned an anomaly value. The value indicating the severityof anomalousness is obtained by evaluating event series of anemployee against fraud patterns. Structured networks revealanomalies. Two risk indices [114] based on neighborhoodstructure, i.e., pattern centrality and transaction pattern central-ity, are computed by assigning weights to each edge that corre-sponds to a taxpayer in a transaction network. Niu et al. [103]

Page 17: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

Fig. 6. Visualizations of anomalous transaction behaviors. (a) Argyriou et al. [111] use a multi-layer radial drawing to describe activities between employeesand clients. (b) Niu et al. [103] assess risk of guaranteed loans by visualizing networks of small and medium enterprises groups using a node-link visualization.(c) Leite et al. [101] design user-friendly views of chart visualizations and parallel coordinates to help identify anomalous transactions.

employ an information theoretic-based approach to uncoverrisk guarantee pattern and detect high default groups for loansrisk management. Specifically, the proxy for information flowis the probability flow of random walks in directed weightednetworks. PCA is utilized for identifying insider threat [106]due to its effectiveness in detecting users that exhibit irregularvariances across the set of derived features. An interactivePCA helps comprehend relationships between the PCA spaceand the original higher-dimensional space in a visual interface.

C. Visualization Techniques

Egocentric Behaviors. An egocentric transaction is de-scribed as buying or selling behaviors conducted by anindividual. An anomalous egocentric transaction can be anunauthorized transaction or a deal with an exceptionally highamount of value. Detection of these behaviors mainly usessequence visualization.

VisImpact [99], [105] organizes attributes of transactions byallocating them onto three parts/axes of a ring: left semicircle,bisector, and right semicircle. Each axis stands for an attributeof interest (e.g., region, client, fraud amount, fraud count).Suntinger et al. [100] display events as nodes in a cylindricaltunnel. The top view of the cylinder represents historicalevents, which are laid out such that more recent events arein the outer ring. Details of events are encoded by the colorand size of glyphs of the Event Tunnel. Anomalous bettingbehaviors of a user are discovered by temporally correlatingthe account history events of the user to known suspiciousaccount profiles. Argyriou et al. [115] study the temporalrelationship of transactions between a pair of client andemployee in a radar chart. The nodes in the radar chart repre-sent transactions, which are positioned according to the timeof action, pre-defined periodicity, and ordering of timelines.Events/transactions related to the same client along the radius

of the radar chart are considered suspicious, as the patternssuggest the employee falsifies the client’s invoices.

Graph and text visualizations are also used to demon-strate suspicious egocentric transaction behaviors. Argyriouet al. [111] (Figure 6 (a)) use a multi-layer radial drawingto describe activities between employees and clients. Eachlayer represents a pattern that is suspicious in different aspects(e.g., actions, systems, periodicity), with heat maps in the sideview measure anomalousness. When an employee is foundto perform events that share similarity with fraud patterns,a suspicious egocentric behavior is identified. Jigsaw [110]mines relationships between entities in text documents. Theparallel coordinates view reveals the correlation of selectedattributes (e.g., company, person). By combining with theheat map for sentiment/similarity analysis, cluster view forgrouping similar documents, and document view for details,anomalous behaviors can be detected from unique text entities.Following that work, Kang et al. [116] studies applications ofJigsaw in various situations including financial transaction. Anemployee’s egocentric behavior of creating fictitious supplierinvoices was discovered.

Collective Behaviors. A collective transaction behavior in-volves several parties in transaction and businesses. Collectivetransaction behaviors include a series of wire transfer andperiodic transaction. Graph visualizations are popular amongresearch works interested in transaction behaviors.

Graph visualization is popular for uncovering collectivetransaction anomalies. Huang et al. [97] develop two stages toinspect stock market security. Firstly, market performance isevaluated using three-dimensional treemaps, with the heightsof blocks indicating the current price of stocks. Secondly,trading networks are compared against suspicious patterns inthe historical database. Structured networks are regarded as

Page 18: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

collective anomalies in transactions. Several visual analyticstools [107], [114], [103], [113] develop categoric node-linkvisualizations where analysts can merge, split, define newsubgraph structure, cluster nodes by top-down or bottom-upparadigm, and adjust node sizes by a chosen measurement.Users edit networks interactively to discover communities,which are signals for suspicious financial transactions. Didimoet al. [107] detect financial activity networks such as moneylaundering by illustrating entities involved in transactions withnodes. The entities include banks, companies, persons, bankaccounts, transactions, and reports filing. Edges between nodesrepresent semantic connections. For instance, two disjointclusters that indicate fraudulent patterns are revealed afterclustering. The level of depth of a cluster reflects the extent ofcriticism of the illegal activity. Niu et al. [103] (Figure 6 (b))assess the risk of guaranteed loans by visualizing networksof small and medium enterprises groups which back eachother to enhance the financial security. Anomalies, i.e., highdefault groups, are identified as communities in the networkusing a node-link visualization. A complementary treemapsupports navigation of labels/categories and presentation ofdefault rates.

Chart and sequence visualizations are also used to detectcollective transaction behaviors. WireVis [108], [109] usesmultiple coordinated chart visualization to analyze suspiciouswire transfers between a payer to a payee via a chain ofintermediaries. The overall trends of activities and individualtransactions are represented by strings and beads in an x-y plotof transaction value against time. Suspicious transactions arethe ones relevant to a keyword that is only found in the secondhalf of the year, and a transaction of much higher value thanothers. Leite et al. [101] (Figure 6 (c)) design user-friendlyviews of chart visualizations and parallel coordinates to helpidentify the anomalous connection between the amount andthe suspicious transactions. If anomaly scores of transactionsdeviate from normal ranges, the days that contain at least onesuspicious transaction are highlighted in red.

D. Interaction Methods

Analysts track suspicious data by highlighting and cor-relating relevant data. The visual analytics tool EVA [101]computes the overall anomaly scores and sub-scores accordingto different standards. If the overall score of transactionsexceeds a threshold, the transactions are highlighted in redin the parallel coordinates view. Also, selection in anothercoordinated chart highlights associated transactions and grayout others in the parallel coordinates view. When analystsclick a node of interest, relevant data that are originally notvisualized is displayed [107]. This helps analysts discoverinteresting features that are not apparent from one view, andidentify different relationships between data instances. A sim-ilar operation is seen in [111], where the selection of one nodeadds related employees (i.e., nodes) into the visualization.Thus, frauds carried out by two or more employees can betracked.

VisImpact [99], [105] supports simultaneous browsing andnavigation of multiple nodes. Details of a single node repre-senting a transaction record can be obtained using the drill-down function. For the transaction of an account, transac-tions can be aggregated in terms of day, week, or monthin WireVis [108], [109]. Zooming is enabled in the heatmap and temporal chart view. One can also drill down toindividuals and compare their records against each other inWireVis. Network Explorer [113] includes an overview andan egocentric mode which detects important clusters andindividual nodes, respectively. In the overview mode, analystscan navigate to one cluster and compute sub-communities ondemand. In the egocentric mode, analysts navigate nodes usingthe direct-walk from a starting point.

Pattern discovery is often used to help identify anomalousbehaviors. Filtering in WireVis [108], [109] is conducted usinga set of keywords and criteria like amounts of words. An-alysts can select reasonably sized subsets for re-clustering togenerate clusters that exhibit interesting features. Furthermore,the color scheme is chosen depending on the characteristic(e.g., sequential or diverging) of the measurement in the heatmap. Jigsaw [110] allows involvement in defining clusters oftext documents, removing false positives, adjusting the numberof words shown, and reordering the entity list. Dragging,merging, and splitting visual elements are often seen in node-link visualization [107], [114], [103], [113]. To discover thetax evasion behaviors [114], analysts can merge and splitnode-link representation. A selection of subgraphs is rankedaccording to criteria such as the total amount of economictransactions or the risk index. Additionally, analysts can defineand draw suspicious graph patterns using pre-defined opera-tors.

A few visualization tools support exporting analyzed results.The Event Tunnel [100] contains a snapshot managementconsole that captures the current state and configuration.Argyriou et al. [115], [111] design the exporting function inthe visual analytics tools for detecting occupational frauds. Theranking results of anomalousness can be exported in separatelog files. The visualization containing suspicious transactionpatterns can be stored for post-analysis.

Visual analytics involve domain knowledge into the pro-cess of anomaly detection. Analysts are enabled to reassignlabels of the “structure hole spanner” during interactive ex-ploration [103]. The structure hole spanner interlinks differentcommunities in a network, which can be modified throughmerging and splitting operations. High default groups arefound to be associated with these labels. In TAXNET [114],analysts can define graph patterns based on their understandingof tax evasion frauds. Textual labels are attached to thegraphs to describe rules for nodes (i.e., taxpayers) or edges(i.e., relationship).

VIII. DISCUSSION AND OUTLOOK

In this section, we first summarize trends of research interestin the community of data visualization regarding anomaloususer behaviors. We then discuss our findings regarding data

Page 19: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

types, anomaly detection techniques, visualization techniques,and interaction methods across different user behaviors.

A. Visual Analytics of Anomalous User Behaviors

Visual analytics of private social interaction behaviorsrelated to emailing received substantial attention in 2000sbut showed significant decreases since then. Recent researchworks [117], [26] are more interested in the social networkstructures found in emailing, calling behaviors. A clear trendworth noticing is the popularity in analyzing public socialinteraction behaviors related to posting in social media since2010. The volume of social media data ensures wide cover-age of people’s behaviors including anomalous and normalbehaviors. Application to real world is attractive from theperspective of social science and possibly more. We have seenmany visualization tools that address event detection frommassive information, information spreading, and identificationof social bots. However, to the best of our knowledge, wefound that only a few visualization works [45] focuses onsecretive or collusive anomalous behaviors, when compared tomachine learning approaches [3] that detect suspicious behav-iors. Specifically, we have not seen visual analytics methodsfor detecting social Sybil attacks (i.e., astroturfing) [118] orprivate information inference [119] related to the postingbehavior. We are hoping to see more efforts to be put indiscovering anomalous behaviors conducted in a collusive,secretive manner.

As for network communication, the research interest remainsrelatively strong, though classical works [78], [85] that analyzethis behavior are mostly published in 2000s. Visual analyticsof network communication focuses on aggregating differentlevels of data as well as real-time monitoring. Aggregation ofdata is often used to monitor high-level structures of networksand at the same time, to visualize anomalies in an interfaceof limited space. As data sources of audit logs and networktraffic provide detailed and systematic information, attacks areoften traceable to individual machines even though maliciousactivities originate from more than one device. In addition,the preference for real-time or near-real-time monitoring inintrusion detection [120] is emphasized, manifested by therealization of analyzing streaming data in many visualizations.This results from the need for timely detection of maliciousattacks. As computing abilities advance, we expect to see morevisualization tools that can handle streaming data.

Travel receives continuous attention of researchers giventhat more data is available for analysis (mobile phones [69],geo-located messages [121], maritime search and rescueevents [70]). Though visualization techniques used for analyz-ing travel behaviors are similar (i.e., geographic visualization),a rich set of interaction methods is implemented in order todetect and comprehend anomalies [69], [62]. By analyzingpatterns in user-specified spatial and temporal ranges, analystsstudy user behaviors in multiple levels of granularity to andfro, and gradually develop their understanding during interac-tive exploration. As more and more sensors are available in

daily life, we hope to finer segmentation of groups of peopleto offer an accurate description of travel patterns.

Visualization works regarding anomalous transaction be-haviors modernizes traditional visual methods in the financialfield. For example, EVA [101] integrates human decisions intothe analysis of frauds into the existing alert system. In recentyears, we have seen an increased number of visualization toolsdesigned for detecting suspicious users involved in financialtransactions. However, by comparing the average number ofcitations between user behaviors, the overall research interestin financial transactions is less than those in travel behaviors,for example. Privacy issues can largely limit the resourcesavailable for research. Having said that, we are hoping to seemore in-depth collaboration between academic researchers andfinancial institutes to resolve transaction frauds by recognizingfraudsters’ behaviors.

B. Data Types

Application of multidimensional data to anomaly detectioncan be found across four behaviors. It offers a variety offeatures for detecting anomalous behaviors and is often usedin conjunction with other data types. Text is an importantdata type for detecting abnormal social interaction behaviors,whereas text is a compliment in the analysis of other userbehaviors. Text provides information about identities andbackgrounds of objects involved, which is used to categorizeobjects. Network is used frequently in the analysis of networkcommunication as well as social interaction behaviors. Linksexist in cyber networks between sources and destinations, andsocial networks between senders and receivers. Spatiotemporalinformation enriches skeletons of analysis by incorporatingcontextual information of users’ travel behaviors. Detection ofanomalous transaction and social interaction behaviors oftenincorporates temporal analysis.

Analysis based on data types helps indicate overlappingareas between user behaviors, which is a signal of borrow-ing analytics approaches from other behaviors. For example,exploration of rating behaviors in online e-commerce storesis similar to that of network security problems. Sensitivityto time-critical behaviors in anomaly detection is emphasizedin [44], in which streaming data is processed. Network be-tween sources and destinations is found in network com-munication, whilst network between users and items is alsoimportant for discovering rating frauds. We see a trend ofincorporating multiple types of data. Since anomaly detectionproblems often encounter unknown ill-defined anomalies, us-age of all four data types can create a relatively thoroughpicture for investigation.

C. Anomaly Detection Techniques

Statistical techniques are most widely used. The principle ofemploying statistical techniques is more intuitive compared tothe other techniques: data that are not described by the knowndistribution are anomalous. For example, a majority of networkcommunication behaviors are studied using statistical tech-niques. Detection techniques for cyber attacks are classified

Page 20: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

into signature-based and anomaly-based [10], both of whichcan be applied with statistical-based approaches. Clustering-based techniques are often used in studying travel and trans-action behaviors. Clustering is often employed to tackle large-scale databases associated with travel behaviors. Clusteringmethods in transactions divide customers into groups basedon the assumption that abnormal behaviors are found outsidethe clusters. Nearest neighbor-based techniques are applied todetecting anomalous social interaction behavior. For example,in graphs composed of senders and/or receivers in associatedwith emailing and calling, anomaly scores are computed fromdistance or densities.

We expect to see more visualization tools to employanomaly detection techniques such as machine learning ap-proaches in the future. The effectiveness of machine learningmethods in visualization is well-recognized [122]. Though thetime interval between the release of detection techniques andthe implementation in visualization might be long (e.g., afive-year interval for FraudVis [45] to apply the CopyCatch[123] algorithm), we believe machine learning techniques areof great value for anomaly detection in visualization. Recently,Chalapathy and Chawla survey [12] deep learning techniquesfor anomaly detection. For example, Malhotra et al. [124]develop a Long Short Term Memory Networks based Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) thatis able to uncover predictable, unpredictable, periodic, andaperiodic in long and short time series. Anomalies in mul-tivariate time-series data are uncovered using a Multi-ScaleConvolutional Recurrent Encoder-Decoder (MSCRED) [125],which can capture dynamics and encode the inter-correlationsbetween different pairs of time series.

D. Visualization Techniques

Among graph visualization, node-link diagram is mostlyused in social interaction, transaction, and network communi-cation. Node-link diagram is advantageous in its traceabilityfrom one node to the other. It is capable of tracking down toabnormal individuals from email and call records, to individualmachines in malicious cyber attacks, and to a pair of employeeand client in financial frauds. Text visualization is favoredin the analysis of public social interaction behaviors such asposting. These visualization tools are usually equipped withviews showing text data to enable interactive exploration andaffirmation of suspicious events or users. For example, tocomplement inspection of microblogs, original messages andkeywords are often found in a table format or tag clouds [126],[34]. Detection of anomalous transaction behaviors also usessequence visualization such as parallel coordinates. Variationsof the relationship between subsequent events can be trackedby changes of linkage between two successive axes, whichsuggest suspicious transactions occurred. Varied configurationsof parallel coordinates include radar chart and Sankey diagram.To illustrate social interaction behaviors, changes of heightsand size of bubbles in timeline visualization are used toencode sudden and/or important changes in the volume ofkeywords. Geographic visualization is often used to represent

travel behaviors as it has the advantage of illustrating two-dimensional physical movement. Flows and bubbles projectionon a map show differences in traveling directions and spatialdensities of distribution. Heat map is popular to demonstratespatial densities of humans and vehicles, as it minimizesvisual occlusion that may happen in flows/bubbles projectionon maps. Chart visualization is effective in illustrating well-understood anomalies as long as dimensions of the displaysare selected properly.

We also found that the number of visualization worksthat address egocentric behavior forms is much fewer thanthose studying collective behavior forms. Glyph visualizationis suited to visualizing egocentric behaviors as differences inindividuals’ roles can be identified more efficiently. Visualiza-tions of collective behaviors take a variety of representationsTo better explain, we use an example in social media where thesame user behavior results in problems viewed from egocentricand collective perspectives, respectively. Both FluxFlow [32]and Episogram [52] analyze retweeting behaviors in Twitter.FluxFlow emphasizes the information diffusion process andvisualizes temporal evolution of a group of retweeted mi-croblogs using packed colored circles. Episogram, on the otherhand, considers whether a Twitter account is anomalous bycomparing one’s individual retweeting patterns with others’.A user is represented as a glyph, which is later found to beused as a typical visualization for egocentric behavior form.

The trend of applying visualization techniques to detectinganomalous user behaviors is summarized as follows. Node-link diagram has long been a popular choice of visualizinganomalous user behaviors. It is still a favored techniqueas it is effective to present an overall structure as well asdetailed information when incorporated with rich interactiontechniques. Circular-based designs are gaining attention fromresearchers for its ability to show connections in a packedvisualization, where hierarchical structure is displayed usingbundles and tree layout inside the ring. Also, circular-baseddesigns usually represent structures of larger-scale than those(e.g., stars, cliques) in node-link structures.

We observed an increasing trend of using heat maps whencompared to flows/bubbles/3D projection on a map. Thereason may be that flows/bubbles/3D map result in visualocclusion, which can only be resolved with appropriate in-teraction techniques. The opposite trend to that of heat mapcan be explained by its potential to visualize large-scale datawith geographic references. It is able to encode some degree ofgeographical information, and at the same time, variables suchas density of users, anomaly degree can be encoded on the mapwithout occlusion. Interest in applying chart visualization hasdecreased in recent years. Chart visualization is restricted to afew variables, which is ineffective in anomaly detection whenan analysis of multiple variables is required.

E. Interaction Methods

Exploration & navigation has been the most popular interac-tion task in visual analytics of anomalous user behaviors. Mostvisualization tools support users to gain a high-level summary

Page 21: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

of large-scale data first and then drill down to anomalies onrequest. The second most popular interaction task is tracking& monitoring. As the papers surveyed are related to anomalydetection, keeping track of suspicious spots is important duringinteractive exploration. Analysts also highlight data of interestto show its correlation between in the coordinated views,which helps form a picture of where anomalies originatefrom. Pattern discovery is also frequently used. During theprocess, the visual representation of data changes accordingly.These updates of one’s knowledge drive analysts to constructhypotheses of anomalies.

We observe trends of utilizing interaction tasks in differentuser behaviors. Visualization works that study travel behavioroften incorporate exploration & navigation in map visual-ization. The reason is that panning on a map is seen oftenwhen tracking physical movement [59], [72]. Pattern discoveryillustrates more than one abnormal feature of anomalies bychanging color spectrum and representing traveling patterns invarious forms on a map [47], [62]. Also, filtering by keywordsis seen in social interaction [44], [34], [27], [127] where textualcontents are important for determining anomalies. Knowledgeexternalization is usually seen in network communication [90],[80] and transactions [108], [111]. This interaction task en-ables the processed results to be outputted for further analysisand validation with domain experts.

We increasingly see visualization tools involve refinement& identification in rendering visualization. This type of inter-action goes beyond the definition of interaction methods [24]because adjustments in anomaly detection algorithms are al-lowed (e.g., Filter technique). Several research works allowanalysts to adjust parameters in constructing queries [62], [38],changing thresholds of anomalies [44], [69], and updatingfeedback in anomalies [59]. Visual representation is modifieddue to fundamental calculation rather than the adjustmentof visual encoding. These works facilitate visual analyticsby involving human perception and interpretation into thecomputation process of anomaly detection, which is a deeperlevel of computer-human interaction than those identified in[24].

IX. CONCLUSION

In this work, we present a survey of visual analytics ofanomalous user behaviors. We analyze the related the-state-of-art according to the proposed taxonomies. Our survey sug-gests trends and preferences in data types, anomaly detectiontechniques, visualization techniques, and interaction methods.With these findings, we also highlight potential researchdirections. We believe our work shed light on understandingand analyzing anomalous user behaviors using visual analyticsapproaches.

X. ACKNOWLEDGMENTS

Nan Cao is the corresponding author. This research wassponsored in part by the Fundamental Research Funds for theCentral Universities in China.

REFERENCES

[1] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: Asurvey,” ACM Comput. Surv., vol. 41, pp. 15:1–15:58, 2009.

[2] L. Jin, Y. Chen, T. Wang, P. Hui, and A. V. Vasilakos, “Understandinguser behavior in online social networks: A survey,” IEEE Communica-tions Magazine, vol. 51, no. 9, pp. 144–150, 2013.

[3] M. Jiang, P. Cui, and C. Faloutsos, “Suspicious behavior detection:Current trends and future directions,” IEEE Intelligent Systems, vol. 31,no. 1, pp. 31–39, 2016.

[4] Y. Zheng, W. Wu, Y. Chen, H. Qu, and L. M. Ni, “Visual analyticsin urban computing: An overview,” IEEE Transactions on Big Data,vol. 2, no. 3, pp. 276–296, 2016.

[5] S. Chen, L. Lin, and X. Yuan, “Social media visual analytics,” inComputer Graphics Forum, vol. 36, no. 3. Wiley Online Library,2017, pp. 563–587.

[6] Y. Wu, N. Cao, D. Gotz, Y.-P. Tan, and D. A. Keim, “A survey on visualanalytics of social media data,” IEEE Transactions on Multimedia,vol. 18, no. 11, pp. 2135–2148, 2016.

[7] S. Ko, I. Cho, S. Afzal, C. Yau, J. Chae, A. Malik, K. Beck, Y. Jang,W. Ribarsky, and D. S. Ebert, “A survey on visual analysis approachesfor financial data,” in Computer Graphics Forum, vol. 35, no. 3. WileyOnline Library, 2016, pp. 599–617.

[8] H. Shiravi, A. Shiravi, and A. A. Ghorbani, “A survey of visualizationsystems for network security,” IEEE Transactions on visualization andcomputer graphics, vol. 18, no. 8, pp. 1313–1329, 2012.

[9] V. Lavigne and D. Gouin, “Visual analytics for cyber security andintelligence,” The Journal of Defense Modeling and Simulation, vol. 11,no. 2, pp. 175–199, 2014.

[10] A. Patcha and J.-M. Park, “An overview of anomaly detection tech-niques: Existing solutions and latest technological trends,” ComputerNetworks, vol. 51, pp. 3448–3470, 2007.

[11] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detectionand description: a survey,” Data Mining and Knowledge Discovery,vol. 29, pp. 626–688, 2014.

[12] R. Chalapathy and S. Chawla, “Deep learning for anomaly detection:A survey,” arXiv preprint arXiv:1901.03407, 2019.

[13] A. Litan and M. Nicolett, “Market guide for user behavior analyt-ics,” URL: https://www.gartner.com/doc/2831117/market-guide-user-behavior-analytics, 2014, accessed 2019-01-15.

[14] M. Rouse and M. Bacon, “user behavior analytics (UBA) searchsecurity,” URL: https://searchsecurity.techtarget.com/definition/user-behavior-analytics-UBA, 2017, accessed 2019-01-15.

[15] H. Yeon, S. Kim, and Y. Jang, “Predictive visual analytics of eventevolution for user-created context,” Journal of Visualization, vol. 20,no. 3, pp. 471–486, 2017.

[16] R. Balakrishnan and K. Ranganathan, A textbook of graph theory.Springer Science & Business Media, 2012.

[17] K. C. Cox, S. G. Eick, G. J. Wills, and R. J. Brachman, “Briefapplication description; visual data mining: Recognizing telephonecalling fraud,” Data Mining and Knowledge Discovery, vol. 1, no. 2,pp. 225–231, 1997.

[18] F. B. Viégas, D. Boyd, D. H. Nguyen, J. Potter, and J. Donath,“Digital artifacts for remembering and storytelling: Posthistory andsocial network fragments,” in System Sciences, 2004. Proceedings ofthe 37th Annual Hawaii International Conference on. IEEE, 2004,pp. 10–pp.

[19] P. Gatalsky, N. Andrienko, and G. Andrienko, “Interactive analysis ofevent data using space-time cube,” in Information Visualisation, 2004.IV 2004. Proceedings. Eighth International Conference on. IEEE,2004, pp. 145–152.

[20] C. Weaver, D. Fyfe, A. Robinson, D. Holdsworth, D. Peuquet, andA. M. MacEachren, “Visual exploration and analysis of historic hotelvisits,” Information Visualization, vol. 6, no. 1, pp. 89–103, 2007.

[21] R. F. Erbacher, K. L. Walker, and D. A. Frincke, “Intrusion andmisuse detection in large-scale systems,” IEEE Computer Graphics andApplications, vol. 22, no. 1, pp. 38–47, 2002.

[22] R. Xiong and J. Donath, “Peoplegarden: creating data portraits forusers,” in Proceedings of the 12th annual ACM symposium on Userinterface software and technology. ACM, 1999, pp. 37–44.

[23] F. B. Viégas, M. Wattenberg, and K. Dave, “Studying cooperationand conflict between authors with history flow visualizations,” inProceedings of the SIGCHI conference on Human factors in computingsystems. ACM, 2004, pp. 575–582.

[24] J. S. Yi, Y. ah Kang, J. T. Stasko, J. A. Jacko et al., “Toward a deeperunderstanding of the role of interaction in information visualization,”IEEE Transactions on Visualization & Computer Graphics, no. 6, 2007.

[25] S. van den Elzen, D. Holten, J. Blaas, and J. J. van Wijk, “Reorderingmassive sequence views: Enabling temporal and structural analysisof dynamic networks,” in Visualization Symposium (PacificVis), 2013IEEE Pacific. IEEE, 2013, pp. 33–40.

Page 22: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

[26] ——, “Dynamic network visualization withextended massive sequenceviews,” IEEE Transactions on Visualization & Computer Graphics,no. 8, pp. 1087–1099, 2014.

[27] A. Perer and B. Shneiderman, “Balancing systematic and flexibleexploration of social networks,” IEEE transactions on visualization andcomputer graphics, vol. 12, no. 5, pp. 693–700, 2006.

[28] J. Koven, C. Felix, H. Siadati, M. Jakobsson, and E. Bertini, “Lessonslearned developing a visual analytics solution for investigative anal-ysis of scamming activities,” IEEE transactions on visualization andcomputer graphics, 2018.

[29] X. Fu, S.-H. Hong, N. S. Nikolov, X. Shen, Y. Wu, and K. Xuk,“Visualization and analysis of email networks,” in Visualization, 2007.APVIS’07. 2007 6th International Asia-Pacific Symposium on. IEEE,2007, pp. 1–8.

[30] P. A. Gloor and Y. Zhao, “Tecflow-a temporal communication flowvisualizer for social networks analysis,” in ACM CSCW Workshop onSocial Networks, vol. 6, 2004.

[31] C. Muelder and K.-L. Ma, “Visualization of sanitized email logs forspam analysis,” in Visualization, 2007. APVIS’07. 2007 6th Interna-tional Asia-Pacific Symposium on. IEEE, 2007, pp. 9–16.

[32] J. Zhao, N. Cao, Z. Wen, Y. Song, Y.-R. Lin, and C. Collins, “#fluxflow: Visual analysis of anomalous information spreading on socialmedia,” IEEE transactions on visualization and computer graphics,vol. 20, no. 12, pp. 1773–1782, 2014.

[33] P. Resnick, S. Carton, S. Park, Y. Shen, and N. Zeffer, “Rumorlens:A system for analyzing the impact of rumors and corrections in socialmedia,” in Proc. Computational Journalism Conference, 2014.

[34] N. Cao, C. Shi, S. Lin, J. Lu, Y.-R. Lin, and C.-Y. Lin, “Targetvue:Visual analysis of anomalous user behaviors in online communicationsystems,” IEEE transactions on visualization and computer graphics,vol. 22, no. 1, pp. 280–289, 2016.

[35] C. Shao, G. L. Ciampaglia, O. Varol, K. Yang, A. Flammini, andF. Menczer, “The spread of low-credibility content by social bots,”arXiv preprint arXiv:1707.07592, 2017.

[36] A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, andR. C. Miller, “Twitinfo: aggregating and visualizing microblogs forevent exploration,” in Proceedings of the SIGCHI conference on Humanfactors in computing systems. ACM, 2011, pp. 227–236.

[37] D. Thom, H. Bosch, S. Koch, M. Wörner, and T. Ertl, “Spatiotem-poral anomaly detection through visual analysis of geolocated twittermessages,” in Visualization Symposium (PacificVis), 2012 IEEE Pacific.IEEE, 2012, pp. 41–48.

[38] H. Bosch, D. Thom, F. Heimerl, E. Püttmann, S. Koch, R. Krüger,M. Wörner, and T. Ertl, “Scatterblogs2: Real-time monitoring ofmicroblog messages through user-guided filtering,” IEEE Transactionson Visualization and Computer Graphics, vol. 19, no. 12, pp. 2022–2031, 2013.

[39] A. Pozdnoukhov and C. Kaiser, “Space-time dynamics of topics instreaming text,” in Proceedings of the 3rd ACM SIGSPATIAL interna-tional workshop on location-based social networks. ACM, 2011, pp.1–8.

[40] P. A. Gloor, S. Niepel, and Y. Li, “Identifying potential suspects bytemporal link analysis,” University of Cologne, 2006.

[41] J. Echeverria and S. Zhou, “Discovery, retrieval, and analysis ofthe’star wars’ botnet in twitter,” in Proceedings of the 2017 IEEE/ACMInternational Conference on Advances in Social Networks Analysis andMining 2017. ACM, 2017, pp. 1–8.

[42] M. Krstajic, E. Bertini, and D. Keim, “Cloudlines: Compact displayof event episodes in multiple time-series,” IEEE transactions onvisualization and computer graphics, vol. 17, no. 12, pp. 2432–2439,2011.

[43] J. Chae, D. Thom, H. Bosch, Y. Jang, R. Maciejewski, D. S. Ebert,and T. Ertl, “Spatiotemporal social media analytics for abnormal eventdetection and examination using seasonal-trend decomposition,” in Vi-sual Analytics Science and Technology (VAST), 2012 IEEE Conferenceon. IEEE, 2012, pp. 143–152.

[44] K. Webga and A. Lu, “Discovery of rating fraud with real-time stream-ing visual analytics,” in Visualization for Cyber Security (VizSec), 2015IEEE Symposium on. IEEE, 2015, pp. 1–8.

[45] J. Sun, Q. Zhu, Z. Liu, X. Liu, J. Lee, Z. Su, L. Shi, L. Huang,and W. Xu, “Fraudvis: Understanding unsupervised fraud detectionalgorithms,” in Pacific Visualization Symposium (PacificVis), 2018IEEE. IEEE, 2018, pp. 170–174.

[46] W. Dou, X. Wang, D. Skau, W. Ribarsky, and M. X. Zhou, “Leadline:Interactive visual analysis of text data through event identification andexploration,” in Visual Analytics Science and Technology (VAST), 2012IEEE Conference on. IEEE, 2012, pp. 93–102.

[47] J. Chae, D. Thom, Y. Jang, S. Kim, T. Ertl, and D. S. Ebert, “Publicbehavior response analysis in disaster events utilizing visual analyticsof microblog data,” Computers & Graphics, vol. 38, pp. 51–60,2014.

[48] Z. Shen and K.-L. Ma, “Mobivis: A visualization system for exploringmobile data,” in Visualization Symposium, 2008. PacificVIS’08. IEEEPacific. IEEE, 2008, pp. 175–182.

[49] C. Li, Y. Wang, P. Resnick, and Q. Mei, “Req-rec: High recall retrievalwith query pooling and interactive classification,” in Proceedings ofthe 37th international ACM SIGIR conference on Research &development in information retrieval. ACM, 2014, pp. 163–172.

[50] P. A. Gloor, R. Laubacher, S. B. Dynes, and Y. Zhao, “Visualization ofcommunication patterns in collaborative innovation networks-analysisof some w3c working groups,” in Proceedings of the twelfth interna-tional conference on Information and knowledge management. ACM,2003, pp. 56–60.

[51] D. Luo, J. Yang, M. Krstajic, W. Ribarsky, and D. Keim, “Eventriver:Visually exploring text collections with temporal references,” IEEETransactions on Visualization and Computer Graphics, vol. 18, no. 1,pp. 93–105, 2012.

[52] N. Cao, Y.-R. Lin, F. Du, and D. Wang, “Episogram: Visual summa-rization of egocentric social interactions,” IEEE computer graphics andapplications, vol. 36, no. 5, pp. 72–81, 2016.

[53] U. Brandes, P. Kenis, J. Lerner, and D. Van Raaij, “Network analysisof collaboration structure in wikipedia,” in Proceedings of the 18thinternational conference on World wide web. ACM, 2009, pp. 731–740.

[54] F. B. Viégas, S. Golder, and J. Donath, “Visualizing email content:portraying relationships from conversational histories,” in Proceedingsof the SIGCHI conference on Human Factors in computing systems.ACM, 2006, pp. 979–988.

[55] W.-J. Li, S. Hershkop, and S. J. Stolfo, “Email archive analysis throughgraphical visualization,” in Proceedings of the 2004 ACM workshop onVisualization and data mining for computer security. ACM, 2004, pp.128–132.

[56] R. Lee and K. Sumiya, “Measuring geographical regularities of crowdbehaviors for twitter-based geo-social event detection,” in Proceedingsof the 2nd ACM SIGSPATIAL international workshop on location basedsocial networks. ACM, 2010, pp. 1–10.

[57] F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski, “Understandingtwitter data with tweetxplorer,” in Proceedings of the 19th ACMSIGKDD international conference on Knowledge discovery and datamining. ACM, 2013, pp. 1482–1485.

[58] F. B. Viégas and M. Smith, “Newsgroup crowds and authorlines:Visualizing the activity of individuals in conversational cyberspaces,”in System Sciences, 2004. Proceedings of the 37th Annual HawaiiInternational Conference on. IEEE, 2004, pp. 10–pp.

[59] N. Cao, C. Lin, Q. Zhu, Y.-R. Lin, X. Teng, and X. Wen, “Voila: Visualanomaly detection and monitoring with streaming spatiotemporal data,”IEEE transactions on visualization and computer graphics, vol. 24,no. 1, pp. 23–33, 2018.

[60] Z. Liao, Y. Yu, and B. Chen, “Anomaly detection in gps data based onvisual analytics,” in Visual Analytics Science and Technology (VAST),2010 IEEE Symposium on. IEEE, 2010, pp. 51–58.

[61] W. Wu, J. Xu, H. Zeng, Y. Zheng, H. Qu, B. Ni, M. Yuan, and L. M. Ni,“Telcovis: Visual exploration of co-occurrence in urban human mobilitybased on telco data,” IEEE transactions on visualization and computergraphics, vol. 22, no. 1, pp. 935–944, 2016.

[62] N. Ferreira, J. Poco, H. T. Vo, J. Freire, and C. T. Silva, “Visualexploration of big spatio-temporal urban data: A study of new yorkcity taxi trips,” IEEE Transactions on Visualization and ComputerGraphics, vol. 19, no. 12, pp. 2149–2158, 2013.

[63] R. Beecham and J. Wood, “Characterising group-cycling journeysusing interactive graphics,” Transportation Research Part C: EmergingTechnologies, vol. 47, pp. 194–206, 2014.

[64] J. Pu, P. Xu, H. Qu, W. Cui, S. Liu, and L. Ni, “Visual analysis ofpeople’s mobility pattern from mobile phone data,” in Proceedings ofthe 2011 Visual Information Communication-International Symposium.ACM, 2011, p. 13.

[65] S. Kim, S. Jeong, I. Woo, Y. Jang, R. Maciejewski, and D. S. Ebert,“Data flow analysis and visualization for spatiotemporal statistical datawithout trajectory information,” IEEE transactions on visualization andcomputer graphics, vol. 24, no. 3, pp. 1287–1300, 2018.

[66] A. Malik, R. Maciejewski, B. Maule, and D. S. Ebert, “A visualanalytics process for maritime resource allocation and risk assessment,”in Visual Analytics Science and Technology (VAST), 2011 IEEE Con-ference on. IEEE, 2011, pp. 221–230.

[67] Z. Liao, L. Kong, X. Wang, Y. Zhao, F. Zhou, Z. Liao, and X. Fan, “Avisual analytics approach for detecting and understanding anomalousresident behaviors in smart healthcare,” Applied Sciences, vol. 7, no. 3,p. 254, 2017.

[68] S. Ko, S. Afzal, S. Walton, Y. Yang, J. Chae, A. Malik, Y. Jang,M. Chen, and D. Ebert, “Analyzing high-dimensional multivariatenetwork links with integrated anomaly detection, highlighting andexploration,” in Visual Analytics Science and Technology (VAST), 2014IEEE Conference on. IEEE, 2014, pp. 83–92.

Page 23: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

[69] T. Von Landesberger, S. Bremm, N. Andrienko, G. Andrienko, andM. Tekusova, “Visual analytics methods for categoric spatio-temporaldata,” in Visual Analytics Science and Technology (VAST), 2012 IEEEConference on. IEEE, 2012, pp. 183–192.

[70] M. Riveiro and G. Falkman, “Interactive visualization of normalbehavioral models and expert rules for maritime anomaly detection,” in2009 Sixth International Conference on Computer Graphics, Imagingand Visualization. IEEE, 2009, pp. 459–466.

[71] R. Maciejewski, S. Rudolph, R. Hafen, A. Abusalah, M. Yakout,M. Ouzzani, W. S. Cleveland, S. J. Grannis, and D. S. Ebert, “A visualanalytics approach to understanding spatiotemporal hotspots,” IEEETransactions on Visualization and Computer Graphics, vol. 16, no. 2,pp. 205–220, 2010.

[72] N. Andrienko and G. Andrienko, “A visual analytics framework forspatio-temporal analysis and modelling,” Data Mining and KnowledgeDiscovery, vol. 27, no. 1, pp. 55–83, 2013.

[73] J. Lin, E. Keogh, and S. Lonardi, “Visualizing and discovering non-trivial patterns in large time series databases,” Information visualiza-tion, vol. 4, no. 2, pp. 61–82, 2005.

[74] S. Foresti, J. Agutter, Y. Livnat, S. Moon, and R. F. Erbacher,“Visual correlation of network alerts,” IEEE Computer Graphics andApplications, vol. 26, pp. 48–59, 2006.

[75] Q. Liao, A. Striegel, and N. Chawla, “Visualizing graph dynamicsand similarity for enterprise network security and management,” inProceedings of the seventh international symposium on visualizationfor cyber security. ACM, 2010, pp. 34–45.

[76] S. T. Teoh, K. Zhang, S.-M. Tseng, K.-L. Ma, and S. F. Wu, “Com-bining visual and automated data mining for near-real-time anomalydetection and analysis in bgp,” in Proceedings of the 2004 ACMworkshop on Visualization and data mining for computer security.ACM, 2004, pp. 35–44.

[77] F. Fischer, J. Fuchs, P.-A. Vervier, F. Mansmann, and O. Thonnard,“Vistracer: a visual analytics tool to investigate routing anomalies intraceroutes,” in Proceedings of the ninth international symposium onvisualization for cyber security. ACM, 2012, pp. 80–87.

[78] X. Yin, W. Yurcik, M. Treaster, Y. Li, and K. Lakkaraju, “Visflowcon-nect: netflow visualizations of link relationships for security situationalawareness,” in Proceedings of the 2004 ACM workshop on Visualizationand data mining for computer security. ACM, 2004, pp. 26–34.

[79] T. Taylor, D. Paterson, J. Glanfield, C. Gates, S. Brooks, andJ. McHugh, “Flovis: Flow visualization system,” in Conference ForHomeland Security, 2009. CATCH’09. Cybersecurity Applications &Technology. IEEE, 2009, pp. 186–198.

[80] Y. Zhao, X. Liang, X. Fan, Y. Wang, M. Yang, and F. Zhou, “Mvsec:multi-perspective and deductive visual analytics on heterogeneousnetwork security data,” Journal of Visualization, vol. 17, no. 3, pp.181–196, 2014.

[81] J. R. Goodall, W. G. Lutters, P. Rheingans, and A. Komlodi, “Pre-serving the big picture: Visual network traffic analysis with tnv,” inVisualization for Computer Security, 2005.(VizSEC 05). IEEE Work-shop on. IEEE, 2005, pp. 47–54.

[82] S. T. Teoh, K.-L. Ma, S. F. Wu, and T. J. Jankun-Kelly, “Detecting flawsand intruders with visual data analysis,” IEEE Computer Graphics andApplications, vol. 24, pp. 27–35, 2004.

[83] F. Fischer and D. A. Keim, “Nstreamaware: Real-time visual analyticsfor data streams to enhance situational awareness,” in Proceedings ofthe Eleventh Workshop on Visualization for Cyber Security. ACM,2014, pp. 65–72.

[84] S. T. Teoh, K. L. Ma, S. F. Wu, and X. Zhao, “Case study: Interactivevisualization for internet security,” in Proceedings of the conference onVisualization’02. IEEE Computer Society, 2002, pp. 505–508.

[85] K. Lakkaraju, W. Yurcik, and A. J. Lee, “Nvisionip: netflow visualiza-tions of system state for security situational awareness,” in Proceedingsof the 2004 ACM workshop on Visualization and data mining forcomputer security. ACM, 2004, pp. 65–72.

[86] Y. Livnat, J. Agutter, S. Moon, R. F. Erbacher, and S. Foresti, “Avisualization paradigm for network intrusion detection,” in InformationAssurance Workshop, 2005. IAW’05. Proceedings from the Sixth AnnualIEEE SMC. IEEE, 2005, pp. 92–99.

[87] E. Bertini, P. Hertzog, and D. Lalanne, “Spiralview: towards securitypolicies assessment through visual correlation of network resourceswith evolution of alarms,” in Visual Analytics Science and Technology,2007. VAST 2007. IEEE Symposium on. IEEE, 2007, pp. 139–146.

[88] F. Mansmann, “Visual analysis of network traffic: interactive monitor-ing, detection, and interpretation of security threats,” 2008.

[89] J. Tao, L. Shi, Z. Zhuang, C. Huang, R. Yu, P. Su, C. Wang, andY. Chen, “Visual analysis of collective anomalies through high-ordercorrelation graph,” in Pacific Visualization Symposium (PacificVis),2018 IEEE. IEEE, 2018, pp. 150–159.

[90] A. D. D’Amico, J. R. Goodall, D. R. Tesone, and J. K. Kopylec, “Visualdiscovery in computer network defense,” IEEE Computer Graphics andApplications, vol. 27, no. 5, 2007.

[91] C. Zheng, L. Ji, D. Pei, J. Wang, and P. Francis, “A light-weightdistributed scheme for detecting ip prefix hijacks in real-time,” in ACMSIGCOMM Computer Communication Review, vol. 37, no. 4. ACM,2007, pp. 277–288.

[92] S. Yoo, J. Jo, B. Kim, and J. Seo, “Longline: Visual analytics systemfor large-scale audit logs,” Visual Informatics, vol. 2, no. 1, pp. 82–97,2018.

[93] A. Boschetti, L. Salgarelli, C. Muelder, and K.-L. Ma, “Tvi: a visualquerying system for network monitoring and anomaly detection,” inProceedings of the 8th international symposium on visualization forcyber security. ACM, 2011, p. 1.

[94] K. Nyarko, T. Capers, C. Scott, and K. Ladeji-Osias, “Networkintrusion visualization with niva, an intrusion detection visual analyzerwith haptic integration,” in Haptic Interfaces for Virtual Environmentand Teleoperator Systems, 2002. HAPTICS 2002. Proceedings. 10thSymposium on. IEEE, 2002, pp. 277–284.

[95] C. Scott, K. Nyarko, T. Capers, and J. Ladeji-Osias, “Network intru-sion visualization with niva, an intrusion detection visual and hapticanalyzer,” Information Visualization, vol. 2, no. 2, pp. 82–94, 2003.

[96] F. Mansmann and S. Vinnik, “Interactive exploration of data trafficwith hierarchical network maps,” IEEE transactions on visualizationand computer graphics, vol. 12, no. 6, pp. 1440–1449, 2006.

[97] M. L. Huang, J. Liang, and Q. V. Nguyen, “A visualization approachfor frauds detection in financial market,” in Information Visualisation,2009 13th International Conference. IEEE, 2009, pp. 197–202.

[98] D. Olszewski, “Fraud detection using self-organizing map visualizingthe user profiles,” Knowledge-Based Systems, vol. 70, pp. 324–334,2014.

[99] M. C. Hao, D. A. Keim, U. Dayal, and J. Schneidewind, “Visimpact:business impact visualization,” in Visualization and Data Analysis2005, vol. 5669. International Society for Optics and Photonics, 2005,pp. 238–250.

[100] M. Suntinger, H. Obweger, J. Schiefer, and M. E. Groller, “Theevent tunnel: Interactive visualization of complex event streams forbusiness process pattern analysis,” in Visualization Symposium, 2008.PacificVIS’08. IEEE Pacific. IEEE, 2008, pp. 111–118.

[101] R. A. Leite, T. Gschwandtner, S. Miksch, S. Kriglstein, M. Pohl,E. Gstrein, and J. Kuntner, “Eva: Visual analytics to identify fraudulentevents,” IEEE Transactions on Visualization & Computer Graphics,no. 1, pp. 1–1, 2018.

[102] M. C. Hao, D. A. Keim, and U. Dayal, “Visbiz: A simplified visu-alization of business operation,” in Proceedings of the conference onVisualization’04. IEEE Computer Society, 2004, pp. 598–1.

[103] Z. Niu, D. Cheng, L. Zhang, and J. Zhang, “Visual analytics fornetworked-guarantee loans risk management,” in Pacific VisualizationSymposium (PacificVis), 2018 IEEE. IEEE, 2018, pp. 160–169.

[104] R. A. Leite, T. Gschwandtner, S. Miksch, E. Gstrein, and J. Kuntner,“Visual analytics for fraud detection: focusing on profile analysis,” inProceedings of the Eurographics/IEEE VGTC Conference on Visual-ization: Posters. Eurographics Association, 2016, pp. 45–47.

[105] M. C. Hao, D. A. Keim, U. Dayal, and J. Schneidewind, “Businessprocess impact visualization and anomaly detection,” Information Vi-sualization, vol. 5, no. 1, pp. 15–27, 2006.

[106] P. A. Legg, “Visualizing the insider threat: challenges and tools foridentifying malicious user activity,” in Visualization for Cyber Security(VizSec), 2015 IEEE Symposium on. IEEE, 2015, pp. 1–7.

[107] W. Didimo, G. Liotta, F. Montecchiani, and P. Palladino, “An advancednetwork visualization system for financial crime detection,” in Visual-ization Symposium (PacificVis), 2011 IEEE Pacific. IEEE, 2011, pp.203–210.

[108] R. Chang, M. Ghoniem, R. Kosara, W. Ribarsky, J. Yang, E. Suma,C. Ziemkiewicz, D. Kern, and A. Sudjianto, “Wirevis: Visualizationof categorical, time-varying data from financial transactions,” in VisualAnalytics Science and Technology, 2007. VAST 2007. IEEE Symposiumon. IEEE, 2007, pp. 155–162.

[109] R. Chang, A. Lee, M. Ghoniem, R. Kosara, W. Ribarsky, J. Yang,E. Suma, C. Ziemkiewicz, D. Kern, and A. Sudjianto, “Scalable andinteractive visual analysis of financial wire transactions for frauddetection,” Information visualization, vol. 7, no. 1, pp. 63–76, 2008.

[110] C. Görg, Z. Liu, J. Kihm, J. Choo, H. Park, and J. Stasko, “Combiningcomputational analyses and interactive visualization for documentexploration and sensemaking in jigsaw,” IEEE Transactions on Visual-ization and Computer Graphics, vol. 19, no. 10, pp. 1646–1663, 2013.

[111] E. N. Argyriou, A. Symvonis, and V. Vassiliou, “A fraud detec-tion visualization system utilizing radial drawings and heat-maps,”in Information Visualization Theory and Applications (IVAPP), 2014International Conference on. IEEE, 2014, pp. 153–160.

[112] M. Schaefer, F. Wanner, F. Mansmann, C. Scheible, V. Stennett,A. T. Hasselrot, and D. A. Keim, “Visual pattern discovery in timedevent data,” in Visualization and Data Analysis 2011, vol. 7868.International Society for Optics and Photonics, 2011, p. 78680K.

Page 24: Visual Analytics of Anomalous User Behaviors: A Survey · 2019-05-22 · Visual Analytics of Anomalous User Behaviors: A Survey Yang Shi1, Yuyin Liu2, Hanghang Tong 3, Jingrui He

[113] J. A. Guerra-Gomez, A. Wilson, J. Liu, D. Davies, P. Jarvis, andE. Bier, “Network explorer: Design, implementation, and real worlddeployment of a large network visualization tool,” in Proceedings ofthe International Working Conference on Advanced Visual Interfaces.ACM, 2016, pp. 108–111.

[114] W. Didimo, L. Giamminonni, G. Liotta, F. Montecchiani, and D. Pagli-uca, “A visual analytics system to support tax evasion discovery,”Decision Support Systems, vol. 110, pp. 71–83, 2018.

[115] E. N. Argyriou, A. A. Sotiraki, and A. Symvonis, “Occupationalfraud detection through visualization,” in Intelligence and SecurityInformatics (ISI), 2013 IEEE International Conference on. IEEE,2013, pp. 4–6.

[116] Y.-a. Kang and J. Stasko, “Examining the use of a visual analyticssystem for sensemaking tasks: Case studies with domain experts,” IEEETransactions on Visualization & Computer Graphics, no. 12, pp. 2869–2878, 2012.

[117] D. Redondo, A. Sallaberry, D. Ienco, F. Zaidi, and P. Poncelet, “Layer-centered approach for multigraphs visualization,” in Information Visu-alisation (iV), 2015 19th International Conference on. IEEE, 2015,pp. 50–55.

[118] H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, “Sybillimit: A near-optimal social network defense against sybil attacks,” in 2008 IEEESymposium on Security and Privacy (sp 2008). IEEE, 2008, pp. 3–17.

[119] R. Heatherly, M. Kantarcioglu, and B. Thuraisingham, “Preventingprivate information inference attacks on social networks,” IEEE Trans-actions on Knowledge and Data Engineering, vol. 25, no. 8, pp. 1849–1862, 2013.

[120] B. Mukherjee, L. T. Heberlein, and K. L. Levitt, “Network intrusiondetection,” IEEE Network, vol. 8, pp. 26–41, 1994.

[121] J. Thomas and J. Kielman, “Challenges for visual analytics,” Informa-tion Visualization, vol. 8, no. 4, pp. 309–314, 2009.

[122] S. Liu, X. Wang, M. Liu, and J. Zhu, “Towards better analysisof machine learning models: A visual analytics perspective,” VisualInformatics, vol. 1, no. 1, pp. 48–56, 2017.

[123] A. Beutel, W. Xu, V. Guruswami, C. Palow, and C. Faloutsos, “Copy-catch: stopping group attacks by spotting lockstep behavior in socialnetworks,” in Proceedings of the 22nd international conference onWorld Wide Web. ACM, 2013, pp. 119–130.

[124] P. Malhotra, A. Ramakrishnan, G. Anand, L. Vig, P. Agarwal, andG. Shroff, “Lstm-based encoder-decoder for multi-sensor anomalydetection,” arXiv preprint arXiv:1607.00148, 2016.

[125] C. Zhang, D. Song, Y. Chen, X. Feng, C. Lumezanu, W. Cheng, J. Ni,B. Zong, H. Chen, and N. V. Chawla, “A deep neural network forunsupervised anomaly detection and diagnosis in multivariate timeseries data,” arXiv preprint arXiv:1811.08055, 2018.

[126] Y. Sun, Y. Tao, G. Yang, and H. Lin, “Visitpedia: Wiki article visit logvisualization for event exploration,” in Computer-Aided Design andComputer Graphics (CAD/Graphics), 2013 International Conferenceon. IEEE, 2013, pp. 282–289.

[127] M. E. Joorabchi, J.-D. Yim, and C. D. Shaw, “Emailtime: Visualanalytics of emails,” in Visual Analytics Science and Technology(VAST), 2010 IEEE Symposium on. IEEE, 2010, pp. 233–234.