CAMEO: Meeting Understanding Prof. Manuela M. Veloso, Prof. Takeo Kanade Dr. Paul E. Rybski, Dr....

Preview:

Citation preview

CAMEO: Meeting Understanding

Prof. Manuela M. Veloso, Prof. Takeo Kanade

Dr. Paul E. Rybski, Dr. Fernando de la Torre,

Dr. Brett Browning, Raju Patil, Carlos Vallespi,

Scott Lenser, Betsy Ricker, Francesco Tamburrino,

Colin McMillen, Sonia Chernova

CALO: Physical Awareness

Computer Science Department /The Robotics Institute

School of Computer Science

Carnegie Mellon University

CAMEO: Camera Assisted Meeting Event Observer

CAMEO : Camera Assisted Meeting Event Observer

• Robust multi-person PA capture device

• Contributions• Mosaic generation• Person tracking• Face recognition• Activity recognition• Logging/modeling

Must effectively operate in unstructured environments.

Each camera is hand-calibrated only once to compensate for radial distortion

Video Mosaic

Person Tracking : Mean Shift Based Color Tracking

Register New Person: Person ID, Face histogramFace Center (x,y), Face width, Face height

• Additional filtering based on shape and color templates

“Omega” head and shoulder template

Henry Schneiderman. “Feature-Centric Evaluation for Cascaded Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004.

Henry Schneiderman. “Learning on Restricted Bayesian Network for Object Detection.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004.

Face Recognition: Training

1) Capture visual data2) Normalize for

geometry and illumination

3) Cluster the most discriminating face examples

Multiple face discrimination Real-time performance

classes

i

Trj

ri

classes

ij Cr Cr

ri

rj

rj

ri

ri

rj

ri

rj

T

i j

tr1

1111)))())(((( 21

1 2

21211212 BμμΣΣμμΣΣΣΣB

Face Recognition: Non-Linear Oriented Discriminant Analysis

Find transformation matrix B that MAXIMIZES the Kullback-Leibler divergence between clusters among classes

classes

ii

Ti

TtrJ1

1 ))()(()( BABBΣBBUse Iterative Majorization to approximate B

Project clustered images into a lower-dimensional subspace to speed recognition

Research challenge: Face subspace is multi-modal

Face Recognition: Results

• Each new face is projected into subspace and compared against the trained examples

• Closest match, via Mahalanobis distance, determines class membership

• 95% recognition rate with training database of 11 subjects

Inferring Activity from Observation

Person action sequences can be represented as a simple finite state machine.

Face tracker captures the (x,y) positions of faces in the image over time.

Global meeting state is inferred from aggregate of person activity.

Inferred state from classifier

•State transitions are encoded as a dynamic Bayesian network in a HMM structure.

•Current person state is a function of observed human activity and previous state.

Logging/Replay/Towards Learning

Tracked person data is recorded for off-line activity analysis and learning of dynamics.

The recorded logs can be replayed back through CAMEO.

Model-based simulation generates high-level state descriptions of group activies.

Data-based simulation generates low-level “frame-by-frame” individual person activity state descriptions.

bring [carlos, computer]bring [carlos, cameo]set_up [carlos, computer]use [carlos, computer]set_up [carlos, cameo]use [carlos, cameo]give_demo [carlos, face_recognition]ask_question [fernando, face_recognition]answer_question [carlos, fernando, face_recognition]give_demo [raju, tracking]ask_question [carlos, tracking]answer_question [raju, carlos, tracking]give_demo [carlos, face_detection]ask_question [raju, face_detection]answer_question [carlos, raju, face_detection]ask_question [fernando, face_detection]answer_question [raju, fernando, face_detection]remove [carlos, computer]remove [carlos, cameo]leave [jon]leave [raju]leave [fernando]leave [carlos]leave [daniel]

Recommended