Collaborative visualization of engineering processes using tabletop augmented reality

Advances in Engineering Software 55 (2013) 45–55

Contents lists available at SciVerse ScienceDirect

Advances in Engineering Software

journal homepage: www.elsevier .com/locate /advengsoft

Collaborative visualization of engineering processes using tabletopaugmented reality

Suyang Dong a,⇑, Amir H. Behzadan b, Feng Chen a, Vineet R. Kamat a

a Department of Civil and Environmental Engineering, University of Michigan, 2340 G.G. Brown, 2350 Hayward, Ann Arbor, MI 48109-2125, USAb Department of Civil, Environmental, and Construction Engineering, University of Central Florida, 4000 Central Florida Blvd., Orlando, FL 32816-2450, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 27 June 2012Received in revised form 30 August 2012Accepted 23 September 2012Available online 9 November 2012

Keywords:Tabletop augmented realityVisualizationSimulationPlanar tracking libraryValidationCollaboration

0965-9978/$ - see front matter � 2012 Elsevier Ltd. Ahttp://dx.doi.org/10.1016/j.advengsoft.2012.09.001

⇑ Corresponding author. Tel.: +1 734 764 4325; faxE-mail addresses: [email protected] (S. Don

(A.H. Behzadan), [email protected] (V.R. Kamat).

3D computer visualization has emerged as an advanced problem-solving tool for engineering educationand practice. For example in civil engineering, the integration of 3D/4D CAD models in the constructionprocess helps to minimize the misinterpretation of the spatial, temporal, and logical aspects of construc-tion planning information. Yet despite the advances made in visualization, the lack of collaborative prob-lem-solving abilities leaves outstanding challenges that need to be addressed before 3D visualization canbecome widely accepted in the classroom and in professional practice. The ability to smoothly and nat-urally interact in a shared workspace characterizes a collaborative learning process. This paper introducestabletop Augmented Reality to accommodate the need to collaboratively visualize computer-generatedmodels. A new software program named ARVita is developed to validate this idea, where multiple userswearing Head-Mounted Displays and sitting around a table can all observe and interact with dynamicvisual simulations of engineering processes. The applications of collaborative visualization using Aug-mented Reality are reviewed, the technical implementation is covered, and the program’s underlyingtracking libraries are presented.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

During the past several years, engineering systems have beenrapidly growing in terms of complexity, scale, uncertainty, andinterdisciplinary. In this regard, construction systems and projectsare no exception. Most construction projects involve parallelassembly-like complex processes that interact in a dynamic envi-ronment. The circumstances in which these processes take placeoften become more complicated due to unforeseen conditions,deviations from project plans, change orders, and legal issues.

Continuous planning and control of construction thus occursthroughout the lifecycle of construction projects. Constructionplanning and control can take place at the project level and/orthe operation level [1]. At the construction project level, a plannedfacility is broken down into activities, each of which maps to aphysical project component (e.g., Floor 6 Plumbing) or to a majortime-consuming process (e.g., order and install compressor). Atthe operation level, planning and control are concerned with thetechnological methods, number and type of resources, and logicalstrategies that are needed to accomplish an activity or a group ofrelated activities (e.g., install ventilation ducts and connect to com-pressor). Since projects must ultimately be constructed, success in

ll rights reserved.

: +1 734 764 4292.g), [email protected]

projects is entirely dependent on the success of construction at theoperation level. Proper operations planning is therefore a necessityand a challenging task that can be substantially improved by usingtechniques such as Discrete Event Simulation (DES), which is apowerful objective function evaluator that is well suited for the de-sign of construction operations [2]). DES entails the creation ofmodels that represent how construction operations will be per-formed. Once the models are created, the modeled operationscan be simulated and yield numerical measurements which typi-cally include time, cost and production rates. The measurementsusually indicate important parts of the operations with potentialfor improvements that may result in cost or time savings [3].

1.1. Description of the problem

The developer of a computer simulation model often has mis-conceptions about how an actual operation takes place in the field[4]. The process of recognizing and correcting such errors is termedvalidation, which determines whether simulation models accu-rately represent the real-world system under study [5]. Validationexercises are typically carried out by consulting field experts ordecision makers who are intimately proficient at the operationsof the actual system being modeled and simulated.

The participating field experts often do not have the means, thetraining and/or the time to validate simulation models based solely

http://dx.doi.org/10.1016/j.advengsoft.2012.09.001

mailto:[email protected]



http://dx.doi.org/10.1016/j.advengsoft.2012.09.001

http://www.sciencedirect.com/science/journal/09659978

http://www.elsevier.com/locate/advengsoft

46 S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

on numerical and statistical output typically produced by simulationsoftware [6]. Potential practitioners are therefore always skepticabout simulation analyses and have little confidence in their results.This lack of credibility is a major deterrent hindering the widespreaduse of simulation as an operations planning tool in the constructionindustry [7–9].

Operations-level visualization explicitly represents the interac-tion between equipment, labor, materials, and space (e.g. layingbricks, lifting columns). This visualization approach is especiallypowerful when there is a need for model developers to communi-cate the simulated processes to field experts by highlighting oper-ational details such as the maneuverability of trucks and backhoesin excavation areas and the deployment of cranes and materials insteel erection. This visualization paradigm can thus help modeldevelopers and field experts in validating operational concepts,checking for design interferences, and estimating overall construc-tability [10].

Besides its usefulness in validating simulated processes, opera-tions-level visualization is also helpful for educating engineeringstudents about construction processes. According to a student sur-vey (Fig. 1), more than 90% of those who responded indicated that

Fig. 1. A survey of undergraduate civil, environmental, and construction engineering studcurrent instructional methods.

they learn better when the instructor uses 3D representations orvisualization to teach engineering concepts and theories. Althoughinstructional methods that take advantage of visualization tech-niques have been around for several decades, the methods still relyon traditional media and tools. For example, students who take acourse in construction planning may use drawings, schedulingbar charts, sand table models, and more recently, 3D CAD models.However, none of these techniques is capable of effectively con-veying information on every aspect of a project. For instance, 2Dor 3D models do not reflect temporal progress, while schedulingbar charts do not demonstrate the corresponding spatial layout.

There are two major categories of 3D visualization that can beused to reconstruct engineering operations for validation, educa-tion and training purposes, and to facilitate the study of environ-ments in which such operations take place: Augmented Reality(AR) and Virtual Reality (VR). AR is the superimposition of com-puter-generated information over a user’s view of the real world.By presenting contextual information in a textual or graphical for-mat, the user’s view of the real world is enhanced or augmentedbeyond the normal experience [11]. The addition of such contex-tual information spatially located relative to the user can assist

ents revealed that a large percentage of students support the prospect of reforming

Fig. 2. AR supplements the real world with relevant synthetic information.

S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55 47

in the performance of several scientific and engineering tasks. Forthis reason, AR-enabling technologies have been researched in anincreasing number of studies during recent years. AR is differentfrom VR, a visualization technology that has been around for sev-eral decades. As shown in Fig. 2, unlike VR, AR does not completelyreplace the real world, rather the real world is supplemented withrelevant synthetic information, and thus real and virtual objectscoexist in an augmented space [12].

1.2. Main research contributions

One of the distinct advantages of AR is that it can enable thecommunication and discussion of a validation analysis or educa-tional training using a collaborative environment, where the fieldexperts or students can quickly appreciate the visual simulationdisplayed and are able to interactively participate in a discussionthat helps understand, validate and improve the simulated engi-neering processes.

From the point of view of collaboration, however, there are stilloutstanding challenges that need to be addressed in the context ofrealizing the full benefits of graphical visualization in engineeringeducation and practice. These challenges and the associatedknowledge gaps have been the major motivation for the presentedresearch. The developed ideas are implemented in ARVita (acro-nym for Augmented Reality Vitascope) in which multiple userswearing HMDs can observe and interact with dynamic simulatedconstruction activities overlaid or ‘‘placed’’ on the surface of a ta-ble. Users, such as model developers, field experts, or studentscan work across the table face-to-face, shift the focus of sharedworkspace interactively, and jointly analyze and validate dynamicengineering operations. In the following Section, a summary ofprevious efforts to incorporate AR visualization techniques intoconstruction and civil engineering education and practice will bepresented. In each case, the methodological benefits and limita-tions will be also described in an effort to highlight the significanceof the presented work.

2. Importance of the research

2.1. Virtual reality visualization of engineering operations

Visualization of simulation operations can be of significant helpin the validation of discrete-event processes models [4]. This isespecially true in construction where field experts and decisionmakers are supposed to participate in the validation procedurebut are not necessarily familiar with simulation principle and havegreat difficulty in interpreting numerical outcomes. The design andanalysis of construction operations using simulation is of value onlyif the insights gleaned are used in making decisions and increasingunderstanding (i.e., they are credible). Visual simulation at theoperations level can be an effective means of achieving this [5].

The potential of VR visualization in animating simulated con-struction and civil engineering operations has been investigatedbefore. Kamat and Martinez [2] designed the VITASCOPE (acronymfor VIsualizaTion of Simulated Construction OPErations) visualiza-tion system for articulating operations-level construction activitiesusing a comprehensive and extensible authoring language fordepicting modeled processes along a simulated project timeline.The VITASCOPE authoring language is an abstract layer to makethe visualization engine (VE) independent of any particular drivingprocesses (e.g. a specific simulation system). Incoming operationevents and data generated by simulation models, hardware con-trols, or real-time sensors can be adapted to conform to the VITA-SCOPE syntax, and can be fed into the dynamic VE [10]. It is used tointerpret the instruction sets and render the depicted activitiessequentially in the virtual environment, and thus makes VITA-SCOPE capable of visualizing simulated construction operationsin smooth, continuous, and animated 3D virtual worlds.

2.2. Collaborative validation through AR visualization

Despite its efficacy of graphically validating simulated construc-tion processes, additional challenges need to be overcome to makeVITASCOPE user-friendly in a collaborative environment. This workis critical because efficient validation requires model developersand decision makers to interactively communicate details on thesimulation model and the actual operation. Moreover, collabora-tive learning in which individuals are the cornerstones of the learn-ing process has proven to be one of the most effective instructionalmethods. This is also evident from Fig. 1 where, on average, 76% ofstudents surveyed named collaborative learning as their most pre-ferred method of learning.

However, as far as collaborative communication is concerned,the convenience of traditional paper-based discussion is somewhatlost in computer-based VR environments, where users’ discussionis restricted to the scale of the screen. On the other hand, table-based media is a natural collaboration platform that allows peopleto promptly exchange ideas. One kind of table-based media istabletop AR, which by definition better supports the prospect ofcollaborative learning and discussion compared to VR.

Several research studies have been conducted to compare theeffectiveness between AR and VR on collaboration environment.For example, users made significantly less mistakes in inspectionand assembly tasks [13,14], gained stronger spatial cognition andmemory [15], and thus experienced less mental workload [16]within a collaborative AR environment compared with VR.

Group discussion cultivates face-to-face conversation, wherethere is a dynamic and easy interchange of focus between sharedworkspace and speakers’ interpersonal space. The shared work-space is the common task area between collaborators, while theinterpersonal space is the common communication space. The for-mer is usually a subset of the latter [17]. Practitioners can use a


variety of non-verbal cues to quickly shift the focus of sharedworkspace accordingly, and thus work more efficiently.

3. Related research in collaborative AR

The collaborative features of AR visualization have been previ-ously explored. In this context, researchers have investigated thepotential of AR-based education and training in three areas: local-ized collaborative AR, remote collaborative AR, and industrial col-laborative AR.

3.1. Localized collaborative AR

Some early works in localized collaborative AR are found in[17–19]. The TRANSVISION system developed by Rekimoto [18]is a pioneering work in collaborative AR. In it, multiple participantsuse palmtop handheld displays to share computer-generatedgraphics on a table. Collaborative Web Space [17] is an interfacefor people in the same location to view and interact with virtualworld wide web pages floating around them in real space. The Stu-dierstube system [19] mainly targets presentations and education.Each viewer wears a magnetically tracked see-through HMDs, andwalks around to observe 3D scientific data.

Other related works, like collaborative AR game- and task-oriented collaboration, then followed this trend. The Art of Defense[20] is a typical AR board game, in which gamers use handheld de-vices to play social games with physical game pieces on a tabletop.Nilsson et al. [21] did a comparison experiment on cross-organiza-tional collaboration in dynamic emergency response tasks. Actorshold positive attitudes toward AR, and would like to use it for realtasks. Besides the traditional Head Mounted Display (HMD) andHand Held Display (HHD), a number of other AR media exist (e.g.projection table and multi-touch table). The augmented urbanplanning workbench [22] is a multi-layered luminous table for hy-brid presentations like 2D drawings, 3D physical models, and dig-ital simulation overlaid onto the table. The system was used in agraduate course supporting the urban design process. Multi-TouchMixed Reality [23] allows designers to interact with a multi-touchtabletop interface with 2D models, while 3D models are projectedonto their 2D counterparts.

3.2. Remote collaborative AR

In remote collaborative AR systems, avatars are the most neces-sary elements of the visualization environment. WearCom [17] en-ables a user to see remote collaborators as virtual avatars in multi-party face-to-face AR conferencing [24]. The system recreates eachparticipant’s facial appearance in real time, and represents eachparticipant’s upper body and hands above the table as a deformedbillboard [25], thus inventing an interactive metaphor, termed‘‘god-like,’’ for improving the communications of situational andnavigational information between outdoor and indoor AR users.The gestures of indoor users are captured by video-based trackingand shown as ‘‘god-like’’ style guidance to the outdoor users.

3.3. Industrial collaborative AR

Industrial collaborative AR is mainly used in product design andfactory planning. The MagicMeeting system [26] is used in concretetest cases in which experts from the automotive industry meet todiscuss the design of car parts. The collaboration is powered by atangible AR interface. Fata Morgana [27], on the other hand, alsodemonstrates car design, but uses a life-sized model in a BMWshow room. At Siemens Corporate Research, a fully implementedsystem called CyliCon [28] enables users to move around the envi-

ronment and visualize as-built reconstruction on real sites and inindustrial drawings. The Roivis project is another successful exam-ple of factory design and planning at Volkswagen Group Research[29]. This project puts strict demands on system accuracy, e.g. inter-fering edge analysis, aggregation verification, etc.

AR has also been widely studied in construction in areas includ-ing but not limited to operations visualization, computer-aidedoperations, project schedule supervision, and componentinspection. However, there are few examples in the collaborativeAR domain. For instance, Wang and Dunston [30] developed anAR face-to-face design review prototype and conducted test casesfor collaboratively performing error detection. Hammad et al. [31]applied distributed AR for visualizing collaborative constructiontasks (e.g. crane operations) to check spatial and engineeringconstraints in outdoor jobsites.

3.4. Scientific and technical challenges

To the authors’ best knowledge, none of the previous work inthis domain allows users to validate simulated processes and learnfrom the results by collaboratively observing dynamic operationsanimations. There are several scientific and technical challengesto be addressed for realizing the collaborative validation. (1) SinceVITASCOPE and ARVita share the same usefulness in validatingsimulated processes, it is plausible to loosely couple the visualiza-tion of simulated processes and the tracking of planar marker, sothat ARVita can reuse the visualization functionality powered byVITASCOPE and focus on building robust tracking libraries andinteraction features. (2) Given the existence of a variety of planarmarker tracking libraries, ARVita should be scalable enough toaccommodate different schools of tracking libraries, and makethe change in the tracking libraries transparent to the rest of thesoftware structure. (3) Despite its fast tracking speed, fiducial mak-ers’ sensitivity to video coverage gives rise to the necessity ofbuilding a natural marker tracking library which is robust to partialocclusion. (4) In the presence of critical events happening in visualsimulations, ARVita should grant users the ability to switch to asynchronized workspace to avoid any misunderstanding in com-munication caused by the distortion of time or viewpoint.

The rest of this paper includes descriptions and discussionsabout the design requirement and technical implementation ofARVita and is organized as follows: (i) software scheme and itsrealization in OpenSceneGraph (OSG) to reflect the design require-ment about modularity (loose coupling between the visualizationmodule and the tracking module) and scalability (tracking li-braries) (Section 4), (iii) tracking libraries that include both fiducialand natural planar markers (Section 5), and (iv) synchronizationamongst multiple views for analyzing critical events (Section 6).

4. Technical implementation of ARVita

4.1. Model-view-controller software architecture of ARVita

The software architecture of ARVita conforms to the classicalModel-View-Controller (MVC) pattern (Fig. 3), which helps to sep-arate the visualization module powered by VITASCOPE, the track-ing libraries and the interaction functionality.

Since the VITASCOPE visualization engine exposes a list of APIs(Application Programming Interface), granting developers full con-trol of the underlying animation process (e.g. open and close files,start and pause animation, and alter animation speed and time-stamp). The VITASCOPE APIs are wrapped up as a scene node andaffiliated with the Model component, which is responsible for ini-tializing, archiving, and updating the VITASCOPE scene node.

Fig. 3. The software architecture of ARVita conforms to the model-view-controller pattern. The arrow indicates a ‘belongs to’ relationship.


A Controller enables interactions by communicating users’ in-put commands to the VITASCPE scene node affiliated with theModel. The communication channel is powered by FLTK (acronymfor Fast Light Toolkit) that translates and dispatches mouse/keymessages to the Model and the View.

The View contains the tracking libraries, and is also responsiblefor correctly setting up projection and ModelView matrices of theOpenGL cameras. It can thus align the updated Model content withthe planar marker laid on the table. First, a camera projection ma-trix is populated at the start of the program based on the cameracalibration result; this is to make sure that the OpenGL virtual cam-era and real camera share the consistent view volume. Second, theModelView matrix—the pose of the OpenGL camera—is updatedevery frame based on the marker tracking results so that CAD mod-els are transformed to the correct stance relative to the camera.

4.2. Implementation of model-view-controller using OpenSceneGraph

OSG is chosen for the implementation of the MVC pattern de-scribed above. OSG uses an acyclic directional graph (tree) to ex-press the scene holding geometry, state, and transformationnodes. The graph is traversed at each frame for updating, drawing,and culling purposes [32]. More importantly, its update and eventcallbacks mechanism makes it convenient for driving the simu-lated construction operations and performing the tracking proce-dure (Fig. 5). For example, tracking and transforming logic can bepredefined as callbacks, and then executed upon tree traversal atevery frame.

4.2.1. ModelThe VITASCOPE scene node—the core logic of the model—re-

sides at the bottom of the tree (Fig. 4). The vitaProcessTraceFile()function is called up every frame to update the animation logic.Above the scene node is a coordinate transformation node. Sinceall of the tracking algorithms used in ARVita assume the Z axis isup and use the right hand coordinate system, this transformationconverts VITASCOPE’s Y axis up and right hand coordinate systemto ARVita’s default system, so that the jobsite model is laid hori-zontally above the marker.

4.2.2. ViewThe core of the View is the FLTK_OSGViewer node that inherits

methods from both the FLTK window class and osgViewer class,

and thus functions as the glue between the FLTK and OSG. Underits hood are the ModelView transformation node and video streamdisplay nodes. The ModelView matrix is updated frame to frame bythe tracking event callbacks. This approach follows osgART’s exam-ple (OSG ARToolkit) that uses the ‘Tracker and Marker’ updatingmechanism to bundle the tracking procedure, e.g. ARToolkit, andOSG together. Both Tracker and Marker are attached as event call-backs to the respective node in the graph (Fig. 5).

The Tracker reads updated video frames and stores the detectedphysical marker descriptor in the Marker. Consequently, the Mar-ker calculates the camera’s pose in the world coordinate systembased on the descriptor, and updates the ModelView transforma-tion node. ARVita chooses to comply with this ‘Tracker and Marker’mechanism because it is an abstract layer to separate the trackingand rendering logic. For example, as will be shown later, this mech-anism is versatile in accommodating new tracking procedures, andmaking the change in the Tracker transparent to the rest of thesoftware structure.

The video resource is pasted as dynamic texture on the back-ground and the eagle window. Despite the stability of the trackersused in ARVita, all of them require the majority of the marker, oreven the entire thing, to be covered in the video frame. For exam-ple, in the ARToolkit, the CAD models could immediately disappearas soon as a tiny corner of the marker is lost. This limitation ismuch more severe when the animated jobsite covers the majorityof the screen, which makes it very difficult to cover the markerwithin the camera’s view volume. The eagle window is thus valu-able for mitigating these flaws. It can be toggled on and off by theuser. When the user moves the camera to look for a vantage point,the eagle window can be toggled on such that the user is aware ofthe visibility of the marker. When the camera is set to static, andthe user is paying attention to the animation, the eagle windowcan be toggled off so it won’t affect the observation.

4.2.3. ControllerFLTK plays the role of the Controller in ARVita, and it translates

and dispatches a user’s interaction/input messages to the system.The motivation behind ARVita is to allow multiple users to observethe animation from different perspectives, and promptly shift thefocus of shared working space in a natural approach. These naturalinteractions include rotating marker to find vantage points, andpointing at the model to attract others’ attention (Fig. 6). Giventhat the scale of the model may prevent users from getting close

Fig. 4. The realization of the model-view-controller model with OpenSceneGraph.


enough to interesting regions, ARVita provides users with basiczooming and panning functionalities.

The focus of shared working space cannot only be switched spa-tially, but also temporally. Users can choose to observe the anima-tion at different speeds, or jump instantaneously along thetimeline (Fig. 7). The ARVita Controller wraps the existing VITA-SCOPE APIs like vitaExecuteViewRatioChange() and vitaExecute-TimeJump() in a user-friendly interface as most media playersdo—using fast-forward buttons, a progress bar, etc.

5. Synchronized view mode in ARVita

5.1. Implementation of synchronized view

ARVita allows users to launch several instances simultaneously,and each user can thus individually explore, interact and validate avisual simulation. However, if users expect to collaborate on vali-dating a visual simulation, then a synchronized view mode shouldbe turned on. This can happen, for example, when a user identifiesa critical event happening during a certain time period, and s/hecan draw the attention of partners by switching to the synchro-nized view mode and asking everyone to look at the same part ofthe model. This will ensure that all of the participants share a con-sistent view of the ongoing visual simulation, and eliminate anypotential misunderstanding caused by the chronological or spatialmisalignment. In the synchronized mode, when one user interactswith the model—rotating the marker, zooming, or dragging theprogress bar—all of these spatial or temporal updates will be re-

flected in all of the other users’ augmented spaces, so that a consis-tent dynamic model is shared among all users.

The OSGCompositeViewer class is the key to upgrading the indi-vidual view mode to the synchronized view mode. CompositeViewers is a container of multiple views; it keeps the views syn-chronized correctly and threaded safely. Each view plays the samerole as the FLTK_OSGViewer does in Fig. 4, and independentlymaintains its own video, tracker, marker resources, and Model-View matrix. However, these views share the same VITASCOPEscene node (Fig. 8) for two reasons: (1) to synchronize animationacross different views, and (2) to save memory space by only main-taining one copy of each scene node.

5.2. Limitations of synchronized views on a single computer

The current version of ARVita supports running multiple syn-chronized views on a single computer, indirectly limiting the max-imum number of participants. As more ‘‘viewers’’ join, thecomputer gets quickly overloaded by maintaining too many videoresources and tracking procedures. The authors are currently pur-suing an alternate distributed computing approach to overcomethis limitation. As a generic architecture for distributed computersimulation systems, HLA (acronym for High Level Architecture)cannot only integrate heterogeneous simulation software and datasources, but also communicate between computers, even plat-forms. HLA thus presents itself as a promising solution for a distrib-uted ARVita.

Fig. 6. Two users are observing the animation lying on the table.

Fig. 7. Steel erection activity at different timestamps.

Fig. 5. osgART’s tracker and marker updating mechanism.


Fig. 8. All of the views possess their own video, track, and marker objects, but point to the same VITASCOPE scene node.

Fig. 9. The taxonomy of tracking methods.


6. Planar tracking methods for collaborative AR

As noted earlier, the ‘Tracker and Marker’ mechanism makesARVita versatile in accommodating different tracking procedures.It currently comes with two available trackers. The first one,ARToolkit [33], is a widely used fiducial marker tracking library.The second one, KEG [34], was developed at the University of Mich-igan, and is a natural marker tracking library. This section willarticulate the importance of tracking in AR, and describe main-stream tracking approaches and those used in ARVita. It will alsohighlight the KEG tracking algorithm.

6.1. Principle of tracking

Tracking, which is also referred to as registration in the AR com-munity, is the procedure of aligning real and virtual objects prop-erly in an augmented world to create the illusion that they coexistacross time and space. To be more specific, since real objects areshown by the physical camera, and virtual objects are shown bythe OpenGL virtual camera, the coexistence of heterogeneous ob-jects implies that these two cameras share the same pose. Pose is

the translation and rotation of the camera relative to the originof the world coordinate system. A physical camera’s pose needsto be tracked continuously and used to alter the ModelView matrixof the OpenGL camera accordingly. Tracking is acknowledged asthe fundamental challenge in the AR community, and as such,has been well studied for decades [12].

6.2. Taxonomy of tracking methods

There are a variety of tracking methodologies depending on theapplication. For example, in the outdoor environment, a GPS andelectronic compass are usually employed together to track a cam-era’s position and orientation [35]. However in the indoor environ-ment, where a GPS signal is blocked, visual tracking methods areusually preferred (Fig. 9). Based on visual tracking libraries’assumptions about the environment, they can be classified asknown and unknown environment tracking methods. SLAM (acro-nym for simultaneous location and mapping) [36] is the represen-tative of an unknown environment tracking method that imposesfew assumptions about the environment. The other school of track-ing method assumes a known environment and can be further di-


vided into non-planar and planar environments. The former workswith 3D structures that have known visual features (usually CADmodels) [37]. Despite both SLAM and non-planar requiring a highcomputational budget, they can be listed as promising candidatesfor the future version of ARVita, as their loose restriction on theenvironment could grant users more flexibility when observinganimations.

The planar marker tracking method is simpler compared to theprevious tracking methods, but is sufficient given the applicationcontext of ARVita, where the working space is usually a large flattable laid with markers; users can seat themselves around the ta-ble and search for vantage points. The planar marker’s branchesconsist of a fiducial marker and natural marker tracking method,and tracker options in ARVita include both.

6.3. Trackers available in ARVita

6.3.1. The fiducial marker tracking method, ARToolkit as an exampleThe fiducial marker tracking method is efficient and fast. This is

because the fiducial marker is usually composed as an artificial pic-ture that contains ‘easy to extract’ visual features like a set of blackand white patterns. The extraction of these patterns—straight lines,sharp corners, and circles, for example—is fast and reliable. ARTool-kit is one of the oldest fiducial marker tracking methods, and iswidely considered a benchmark in the AR community. ARToolkit’sfiducial marker is a logo bounded by a thick black frame (Fig. 4).The four corners of the frame are used to compute the camera pose,and the center logo is used to interpret the identity of the marker.Because of its simplicity and fast tracking speed, ARToolkit has beenpopular for over a decade in numerous applications. However, italso suffers from the common shortcomings of a fiducial marker,and requires all of its four corners to be visible to the camera so thatthe camera pose can be computed. This could cause frustrationwhen the user navigates through the animated jobsite only to findthe animation graphics blinking on and off due to loss of tracking.This disadvantage motivated the authors to look into natural mark-ers, which are more flexible about the coverage of the marker.

Fig. 10. The natural marker tracking lib

6.3.2. The natural marker tracking method, KEG as an exampleBesides the advantage of partial coverage (Fig. 10), the natural

marker offers the advantage of not depending on special prede-fined visual features, like those found in the fiducial marker. Inother words, the features can be points, corners, edges, and blobsthat appear in a natural image. The extraction of these features isvital in establishing correspondence between the observed imageby the camera and the marker image, and in estimating the cam-era’s pose. Therefore, robust estimation usually requires the estab-lishment of ample correspondence, which is a challenging issue.

There are two schools for tracking the natural markers, depend-ing on whether they treat observed images independently or con-secutively. The former is referred to as a detection-based method,such as SIFT (Scale Invariant Feature Transform) [38] or FERNs [39].The latter is addressed as a tracking-based method such as Ka-nade–Lucas–Tomasi (KLT) [40,41]. Each school has its own prosand cons. A robust detection-based method often demands a highcomputational budget, and can hardly meet real-time frame rates.On the other hand, accumulated errors in the tracking-based meth-od can be carried forward along consecutive frames, and thus com-promise the tracking quality. The KEG tracker used in ARVitainherits merits from both the detection-based and tracking-basedmethods. The following sections will briefly cover these two meth-ods, and how they are combined by the proposed global appear-ance and geometric constraints.

6.3.2.1. Detection-based method. The correspondence relation-between the marker image and the captured image can be expressedas a transformation matrix, called homography, that maps points onthe observed image to their corresponding points on the markerimage.

H is a 3 � 3 matrix that represents the homography matrix, and sis a scale factor. p0 (x0,y0) and p (x,y) is a certain pair of correspondingpoints on the marker and observed images. Furthermore, H encodesthe camera’s pose, i.e. rotation and translation, because (x,y) can alsobe transformed to (x0,y0) through rotation (R), translation (T), and

rary is flexible on marker visibility.

Table 1Comparison between two natural marker approaches.

Approach Detection-based Tracking-based

Principle Identify matching feature points oneach new frame

Follow up the existing feature points from frame to frame

Relation betweenconsecutiveframe

Independent Current frame is correlated with previous one

Pros Failure of estimation in one framewill not affect the next one at all

Fast

Cons Time-consuming Error of estimation in one frame will be carried forward, and theaccumulated error will eventually lead to loss of track

Fig. 11. The algorithm flowchart of the KEG tracker.


camera calibration (K) matrices. In other words, if K can be foundthrough camera calibration, R and T can be decomposed from H.

Notice that Eq. (1) can be converted to p0i �Hpi ¼ 0, and thus Hcan be solved given multiple matching pairs of < p0i; pi >. Becausemost existing algorithms for finding matching points like SIFTand FERNs only inspect the local appearance, mismatching is inev-itable in some cases. Accounting for the existence of outliers/mis-matching, RANSAC (RANdom Sample Consensus) [42] is used toestimate parameters of H. In the KEG tracker, one of RANSAC’s vari-ants [43] is used.

6.3.2.2. Tracking-based method. A tracking-based method is basedon the premise that the difference between two consecutive ob-served images is small, and thus the feature points can be trackedby checking the points’ local neighborhood patches. Apparentlythis approach costs less than the detection-based method, whereprior knowledge is discarded in comparing the current capturedimage with the marker image. Once the correspondence is estab-lished at the initial stage by using one of the detection methodsmentioned in Section 6.3.2.1, those feature points can be trackedon the following observed images, and the initial detection costis amortized in the tracking stage. The tracking library used inKEG is Kanade–Lucas–Tomasi (KLT).

6.3.2.3. Global appearance and geometric constraints. Table 1 sum-marizes the profiles of detection-based and tracking-based meth-ods. The framework of the KEG tracker integrates these twomethods and enhances the overall performance by using globalappearances and geometric constraints. In Fig. 11, black boxes im-ply feature points on the marker image is identified only oncewhen the program starts, and grey boxes and dashed lines meanthe detection-based method is only employed at the initial phaseor the loss of track case. Once matching feature points are found,the path is shifted to the tracking-based method (solid lines),where the known feature points are tracked by KLT. The estimatedhomography using RANSAC is addressed as coarse H, and may stillcontain systematic and random errors.

The first enhancement introduced by KEG is called the globalappearance constraint, which processes coarse H under ESM [44]as a second-order approximation. The yield refined H can thus rec-tify the captured image similar to the appearance of the markerimage. The second enhancement is called the global geometricconstraint. Since the refined H is assumed to be theoretically freeof systematic and random errors, the updated feature points�xupdated in Eq. (2) to be tracked at the next frame do not inheritthe accumulated errors. Here xmarker refers to the feature pointsfound on the marker image. These two enhancements not onlyboost the tracking accuracy and stability, but also increase thetracking speed, which is attributed to the global refinement thatreduces the iterations required by the KLT algorithm.

6.3.2.4. The introduction of AprilTag. Besides tracking the pose of thecamera, another necessity is to recognize the identity of the naturalmarker. Currently, KEG takes advantage of the coding system in theAprilTags [45] tracking library to identify the associated informa-tion with a certain marker. Even though AprilTags, itself, is part ofthe fiducial tracking family, it does not need to be fully coveredby the camera once the identity of the marker is confirmed (Fig. 10).

7. Conclusion and future work

3D visualization is one of the most effective tools for exposingengineering students to the latest trends of emerging technologyadvancements in the classroom. Statistics also indicate that stu-dents learn better when their instructors use 3D representationsor visualization to teach engineering concepts and theories. Eventhough 3D visualization methods in the form of 3D/4D CAD modelshave existed for decades, they have not yet largely replaced tradi-tional media and tools. One reason for this is that traditional visu-alization tools offer less opportunity for collaboration [19].

As an effort to enable collaborative learning through 3D visualiza-tion, we introduce a software program termed ARVita for collabora-tively visualizing dynamic 3D simulated construction operations.Users sitting across a table can have a face-to-face discussion about


3D animations ‘‘laid on’’ the table surface. Interaction functionalitiesare designed to assist users in moving smoothly between the focus ofshared workspaces. A live demo of ARVita can be found on theauthors’ website <http://pathfinder.engin.umich.edu/videos.htm>.

The next generation of ARVita will focus on offering students moreflexibility and space when they observe a 3D animation model. Twothrusts of improvements can be made. The first is to enable the track-ing library to function in the unknown environment, a.k.a SLAM, sothat a user’s observation range is no longer limited by the visibilityof the marker. In other words, any flat table with an ample amountof features could be a tracking plane. The second area for improve-ment relates to the efforts that are being made to make ARVita complywith the rules of HLA. This compliance will allow ARVita to be distrib-uted and synchronized across computers. When this happens, stu-dents can run multiple instances of ARVita on their own computersbut collaborate on the synchronized model. The current version ofARVita software and its source code can be found on the authors’ web-site <http://pathfinder.engin.umich.edu/software.htm>.

References

[1] Halpin DW, Riggs LS. Planning and analysis of construction operations; 1992.[2] Kamat VR, Martinez JC. Visualizing simulated construction operations in 3D. J

Comput Civil Eng 2001;15:329–37.[3] Martinez J. STROBOSCOPE state and resource based simulation of construction

processes. Ann Arbro: University of Michigan; 1996.[4] Law A, Kelton DW. simulation modeling and, analysis; 2000.[5] Rohrer MW. Seeing is believing: the importance of visualization in

manufacturing simulation. In: Proceedings of the 2000 winter simulationconference, San Diego; 2000. p. 1211–6.

[6] Henriksen JO. General-purpose concurrent and post-processed animation withproof TM. In: Proceedings of the 1999 winter simulation conference; 1999. p.176–81.

[7] Huang R-Y, Grigoriadis AM, Halpin DW. Simulation of cable-stayed bridgesusing disco. In: Proceedings of the 1999 winter simulation conference; 1994. p.1130–6.

[8] Tucker SN, Lawrence PJ, Rahilly M. Discrete-event simulation in analysis ofconstruction processes. CIDAC simulation paper; 1998.

[9] Halpin DW, Martinez LH. Real world applications of construction processsimulation. In: Proceedings of the 1999 winter simulation conference. SanDiego, CA: Society for Computer Simulation; 1999. p. 956–62.

[10] Kamat VR, Martinez JC. Automated generation of dynamic, operations levelvirtual construction scenarios. Electron J Inf Technol Construct (ITcon)2003;8:65–84.

[11] Behzadan AH, Kamat VR. Visualization of construction graphics in outdooraugmented reality. In: Proceedings of the 2005 winter simulation conferencePiscataway. NJ: Institute of Electrical and Electronics Engineers (IEEE); 2005.

[12] Azuma R. A survey of augmented reality. Teleoper Virtual Environ 1997:355–85.

[13] Baird KM. Evaluating the effectiveness of augmented reality and wearablecomputing for a manufacturing assembly task; 1999.

[14] Wang X, Dunston PS. Comparative effectiveness of mixed reality-based virtualenvironments in collaborative design. IEEE Trans Syst Man Cybern2011;41:284–96.

[15] Biocca F, Tang A, Lamas D, Gregg J, Brady R, Gai P. How do users organizevirtual tools around their body in immersive virtual and augmentedenvironment? An exploratory study of egocentric spatial mapping of virtualtools in the mobile infosphere. East Lansing: Media Interface and NetworkDesign Labs, Michigan State University; 2001.

[16] Tang A, Owen C, Biocca F, Mou W. Comparative effectiveness of augmentedreality in object assembly. In: Proceedings of the SIGCHI conference on Humanfactors in, computing systems; 2003. p. 73–80.

[17] Billinghurst M, Kato H. Collaborative mixed reality. In: Proceedings of the 1999international symposium on mixed reality; 1999.

[18] Rekimoto J. Transvision: a hand-held augmented reality system forcollaborative design. In: Proceeding of 1996 virtual systems and multimedia,Gifu, Japan; 1996.

[19] Szalavári Z, Schmalstieg D, Fuhrmann A, Gervautz M. ‘Studierstube’ anenvironment for collaboration in augmented reality; 1997.

[20] Huynh TD, Raveendran K, Xu Y, Spreen K, MacIntyre B. Art of defense: acollaborative handheld augmented reality board game. In: Proceedings of the2009 ACM SIGGRAPH symposium on video games; 2009. p. 135–42.

[21] Nilsson S, Johansson B, Jonsson A. Using AR to support cross-organisationalcollaboration in dynamic tasks. In: Proceedings of the 2009 IEEE and ACMinternational symposium on mixed and augmented reality Orlando, FL; 2009.p. 3–12.

[22] Ishii H, Underkoffler J, Chak D, Piper B. Augmented urban planning workbench:overlaying drawings, physical models and digital simulation. In: Proceedingsof the international symposium on mixed and augmented reality; 2002.

[23] Wei D, Zhou ZS, Xie D. MTMR: a conceptual interior design frameworkintegrating mixed reality with the multi-touch tabletop interface. In:Proceedings of the 2010 IEEE and ACM international symposium on mixedand augmented reality, Seoul, Korea; 2010. p. 279–80.

[24] Minatani S, Kitahara I, Kameda Y, Ohta Y. Face-to-face tabletop remotecollaboration in mixed reality. In: Proceedings of 2007 IEEE conference oncomputer vision and pattern recognition, Nara, Japan; 2007. p. 43–6.

[25] Stafford A, Piekarski W, Thomas HB. Implementation of God-like interactiontechniques for supporting collaboration between outdoor AR and indoortabletop users. In: Proceedings of the 2006 IEEE and ACM internationalsymposium on mixed and augmented reality, Santa Barbara, CA; 2006. p. 165–72.

[26] Regenbrecht HT, Wagner MT, Baratoff G. MagicMeeting: a collaborativetangible augmented reality system. Virtual Reality: SpingerLink; 2006.

[27] Klinker G, Dutoit AH, Bauer M. Fata Morgana – a presentation system forproduct design. In: Proceedings of the 2002 IEEE and ACM internationalsymposium on mixed and augmented reality, Darmstadt, Germany; 2002. p.76–85.

[28] Navab N. Industrial augmented reality (IAR): challenges in design andcommercialization of killer apps. In: Proceedings of the 2003 IEEE and ACMinternational symposium on mixed and augmented reality, Tokyo, Japan;2003. p. 2–6.

[29] Pentenrieder K, Bade C, Doil F, Meier P. Augmented reality-based factoryplanning – an application tailored to industrial needs. In: Proceedings of the2007 IEEE and ACM international symposium on mixed and augmentedreality, Nara, Japan; 2007. p. 1–9.

[30] Wang X, Dunston P. User perspectives on mixed reality tabletop visualization forface-to-face collaborative design review. J Autom Construct 2008;17:399–412.

[31] Hammad A, Wang H, Mudur SP. Distributed augmented reality for visualizingcollaborative construction tasks. J Comput Civil Eng 2009;23:418–27.

[32] Martz P. OpenSceneGraph quick start guide; 2007.[33] Hirokazu K, Billinghurst M. Marker tracking and HMD calibration for a video-

based augmented reality conferencing system. In: Proceedings of the 1999IEEE and ACM international workshop on augmented reality (IWAR 99); 1999.

[34] Feng C, Kamat VR. KEG plane tracker for AEC automation applications. The2012 international symposium on automation and robotics in construction,mining and petroleum; 2012.

[35] Feiner S, Macintyre B, Hollerer T. A touring machine: prototyping 3D mobileaugmented reality systems for exploring the urban environment. In:Proceedings of 1997 international symposium on wearable computers; 1997.p. 74–81.

[36] Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In:Proceedings of the 2007 IEEE and ACM international symposium on mixed andaugmented reality, Nara, Japan; 2007.

[37] Drummond T, Cipolla R. Real-time visual tracking of complex structures. IEEETrans Pattern Anal Machine Intell 2002;24:932–46.

[38] Lowe DG. Distinctive image features from scale-invariant keypoints. Int JComp Vision; 2004.

[39] Ozuysal M, Fua P, Lepetit V. Fast keypoint recognition in ten lines of code. In:Proceedings of 2007 IEEE conference on computer vision and patternrecognition minneapolis, MN; 2007. p. 1–8.

[40] Lucas BD, Kanade T. An iterative image registration technique with anapplication to stereo vision. In: Proceedings of the 7th international jointconference on, artificial intelligence; 1981.

[41] Shi J, Tomasi C. Good features to track. In: Proceedings of 1994 IEEE conferenceon computer vision and pattern recognition; 1994. p. 593–600.

[42] Fischler M, Bolles R. Random sample consensus: a paradigm for model fittingwith applications to image analysis and automated cartography. CommunACM 1981;24:381–95.

[43] Simon G, Fitzgibbon AW, Zisserman A. Markerless tracking using planarstructures in the scene. In: Proceedings of the 2000 international symposiumon augmented reality; 2000. p. 120–8.

[44] Benhimane S, Malis E. Real-time image-based tracking of planes using efficientsecond-order minimization. In: Proceedings of 2004 international conferenceon intelligent robots and systems INRIA, France IEEE; 2004. p. 943–8.

[45] Olson E. AprilTag: a robust and flexible visual fiducial system. In: Proceedingsof the IEEE international conference on robotics and automation (ICRA); 2011.

http://pathfinder.engin.umich.edu/videos.htm

http://pathfinder.engin.umich.edu/software.htm

Documents

Collaborative visualization of engineering processes using tabletop augmented reality