1
Towards a Smart Control Room for Crisis Response Using Visual Perception of Users Joris Ijsselmuiden, Florian van de Camp, Alexander Schick, Michael Voit, Rainer Stiefelhagen {iss, ca, sci, vt, stiefe}@iitb.fraunhofer.de Fraunhofer IITB, Karlsruhe INTRODUCTION Due to ever increasing challenges and complexity, there is a high demand for new human-machine interaction approaches in crisis response scenarios. We aim at building a smart crisis response control room, in which vision-based perception of users will be used to facilitate innovative user interfaces and to support teamwork. Our control room is equipped with several cameras and has a videowall as the main interaction device. Using real-time computer vision, we can track and identify the users in the room, and estimate their head orientations and pointing gestures. In the near future, the room will also be equiped with speech recognition. In order to build a useful smart control room for crisis response, we are currently focusing on situation modeling for such rooms and we are investigating the target crisis response scenarios. Our smart control room laboratory, containing videowall and cameras GOALS Develop new ways of interacting with computers and support interaction between humans using: tracking, identification, head pose, gestures, speech, and situation/user modeling [1,2] Conduct user studies to find multimodal system setups that improve computer supported cooperative work in a crisis response control room Improve expressive power, ease of use, intuitiveness, speed, reliability, adaptability, and cooperation while reducing physical and mental workload Create intelligent, context dependant user interfaces through situation modeling and user modeling Challenges in crisis response control rooms include: team-based operation, limits to mental workload, high cost of failure, time pressure, dense/complex information, and the user acceptance problem PERCEPTION Tracking and identification [3] Head pose and visual focus of attention [4] Gestures and bodypose [5] Speech recognition (future work) A camera image and its corresponding segmentation and 3D voxel representation INTERACTION 1. Identities are obtained through face recognition (in operation) 2. User models are used to generate personal user interfaces, obeying the user’s preferences, current tasks, and specialized knowledge (future work) 3. Using person tracking, interfaces are displayed close to the user (in operation) 4. Objects on the videowall are manipulated using pointing gestures and directing ones visual attention (in operation) 5. This can be combined with speech recognition and a range of different hand gestures (future work) 6. Head pose is employed to analyze the interaction of the team, for example to determine who has been talking to whom (in operation) 7. User-specific information can be displayed on the videowall, at the user’s current focus of attention and we can make people aware of what they haven’t seen yet (future work) Person tracking, head pose estimation, and gesture recognition SYSTEM ARCHITECTURE All components run in parallel and in real time We use several computers, with multithreading and GPU programming to obtain sufficient computational power Our custom-built middleware takes care of network communication A centralized situation model (blackboard) is kept, describing the situation in the room and the objects and users in it All perceptual components can read and write in this situation model and a logic engine uses it to deduce higher level facts about the situation [6] In the near future, our control room laboratory will be extended using some of the following: speech recognition, standard workstations, a digital situation table [7], tablet PCs, sound, and synthesized speech Example interaction with the videowall and a digital situation table in operation This work is supported by the FhG Internal Programs under Grant No. 692 026 (Fraunhofer Attract). It is a collaboration between the Fraunhofer Institute for Information and Data Processing; Business Unit Interactive Analysis and Diagnosis and the University of Karlsruhe (TH); Faculty of Computer Science, in the framework of the five-year Fraunhofer internal project “Computer Vision for Human- Computer Interaction – Interaction in and with attentive rooms”. REFERENCES 1. Project Webpage (2009) www.iitb.fraunhofer.de/?20718 2. Stiefelhagen, Bernardin, Ekenel, Voit (2008) Tracking Identities and Attention in Smart Environments - Contributions and Progress in the CHIL Project IEEE International Conference on Face and Gesture Recognition 3. Bernardin, Van de Camp, Stiefelhagen (2007) Automatic Person Detection and Tracking using Fuzzy Controlled Active Cameras IEEE International Conference on Computer Vision and Pattern Recognition 4. Voit, Stiefelhagen (2008) Deducing the Visual Focus of Attention from Head Pose Estimation in Dynamic Multi-view Meeting Scenarios 10th International Conference on Multimodal Interfaces 5. Nickel, Stiefelhagen (2007) Visual Recognition of Pointing Gestures for Human- Robot Interaction Image and Vision Computing 6. Brdiczka, Crowley, Curín, Kleindienst (2009) Chapter: Situation Modeling, in Waibel, Stiefelhagen (Eds.) Computers in the Human Interaction Loop 7. Bader, Meissner, Tschnerney (2008) Digital Map Table with Fovea-Tablett®: Smart Furniture for Emergency Operation Centers 5th International Conference on Information Systems for Crisis Response and Management

Towards a Smart Control Room for Crisis Response Using Visual Perception of Users

Embed Size (px)

Citation preview

Page 1: Towards a Smart Control Room for Crisis Response Using Visual Perception of Users

Towards a Smart Control Room for Crisis Response Using Visual Perception of Users

Joris Ijsselmuiden, Florian van de Camp, Alexander Schick, Michael Voit, Rainer Stiefelhagen{iss, ca, sci, vt, stiefe}@iitb.fraunhofer.de

Fraunhofer IITB, Karlsruhe

INTRODUCTIONDue to ever increasing challenges and complexity, there is a high demand fornew human-machine interaction approaches in crisis response scenarios. Weaim at building a smart crisis response control room, in which vision-basedperception of users will be used to facilitate innovative user interfaces and tosupport teamwork. Our control room is equipped with several cameras and has avideowall as the main interaction device. Using real-time computer vision, we cantrack and identify the users in the room, and estimate their head orientations andpointing gestures. In the near future, the room will also be equiped with speechrecognition. In order to build a useful smart control room for crisis response, weare currently focusing on situation modeling for such rooms and we areinvestigating the target crisis response scenarios.

Our smart control room laboratory, containing videowall and cameras

GOALS• Develop new ways of interacting with computers and support interaction between

humans using: tracking, identification, head pose, gestures, speech, andsituation/user modeling [1,2]

• Conduct user studies to find multimodal system setups that improve computersupported cooperative work in a crisis response control room

• Improve expressive power, ease of use, intuitiveness, speed, reliability,adaptability, and cooperation while reducing physical and mental workload

• Create intelligent, context dependant user interfaces through situation modelingand user modeling

• Challenges in crisis response control rooms include: team-based operation, limitsto mental workload, high cost of failure, time pressure, dense/complexinformation, and the user acceptance problem

PERCEPTION• Tracking and identification [3]• Head pose and visual focus of attention [4]• Gestures and bodypose [5]• Speech recognition (future work)

A camera image and its corresponding segmentation and 3D voxel representation

INTERACTION1. Identities are obtained through face recognition (in operation)2. User models are used to generate personal user interfaces, obeying the user’s

preferences, current tasks, and specialized knowledge (future work)3. Using person tracking, interfaces are displayed close to the user (in operation)4. Objects on the videowall are manipulated using pointing gestures and directing

ones visual attention (in operation)5. This can be combined with speech recognition and a range of different hand

gestures (future work)6. Head pose is employed to analyze the interaction of the team, for example to

determine who has been talking to whom (in operation)7. User-specific information can be displayed on the videowall, at the user’s current

focus of attention and we can make people aware of what they haven’t seen yet(future work)

Person tracking, head pose estimation, and gesture recognition

SYSTEM ARCHITECTURE• All components run in parallel and in real time• We use several computers, with multithreading and GPU programming to obtain

sufficient computational power• Our custom-built middleware takes care of network communication• A centralized situation model (blackboard) is kept, describing the situation in the

room and the objects and users in it• All perceptual components can read and write in this situation model and a logic

engine uses it to deduce higher level facts about the situation [6]• In the near future, our control room laboratory will be extended using some of the

following: speech recognition, standard workstations, a digital situation table [7],tablet PCs, sound, and synthesized speech

Example interaction with the videowall and a digital situation table in operation

This work is supported by the FhG Internal Programs under Grant No. 692 026(Fraunhofer Attract). It is a collaboration between the Fraunhofer Institute forInformation and Data Processing; Business Unit Interactive Analysis and Diagnosisand the University of Karlsruhe (TH); Faculty of Computer Science, in theframework of the five-year Fraunhofer internal project “Computer Vision for Human-Computer Interaction – Interaction in and with attentive rooms”.

REFERENCES1. Project Webpage (2009) www.iitb.fraunhofer.de/?207182. Stiefelhagen, Bernardin, Ekenel, Voit (2008) Tracking Identities and Attention in

Smart Environments - Contributions and Progress in the CHIL Project IEEEInternational Conference on Face and Gesture Recognition

3. Bernardin, Van de Camp, Stiefelhagen (2007) Automatic Person Detection andTracking using Fuzzy Controlled Active Cameras IEEE International Conferenceon Computer Vision and Pattern Recognition

4. Voit, Stiefelhagen (2008) Deducing the Visual Focus of Attention from Head PoseEstimation in Dynamic Multi-view Meeting Scenarios 10th InternationalConference on Multimodal Interfaces

5. Nickel, Stiefelhagen (2007) Visual Recognition of Pointing Gestures for Human-Robot Interaction Image and Vision Computing

6. Brdiczka, Crowley, Curín, Kleindienst (2009) Chapter: Situation Modeling, inWaibel, Stiefelhagen (Eds.) Computers in the Human Interaction Loop

7. Bader, Meissner, Tschnerney (2008) Digital Map Table with Fovea-Tablett®:Smart Furniture for Emergency Operation Centers 5th International Conferenceon Information Systems for Crisis Response and Management