View
4
Download
0
Category
Preview:
Citation preview
Outer loopA Metropolis-Hasting algorithm for latent sub-event parsingDynamics: splitting/merging/relabeling
Learning Social Affordance for Human-Robot InteractionTianmin Shu1 M. S. Ryoo2 Song-Chun Zhu1
1Center for Vision, Cognition, Learning, and Autonomy, UCLA 2Indiana University, Bloomington
Objective:Learning explainable knowledge from the noisy observation of humaninteractions in RGB-D videos to enable human-robot interactions.Key idea:Beyond traditional object and scene affordances, we propose a weaklysupervised learning of social affordances for HRI.Contributions:• First formulation and hierarchical representation of social affordance• Weakly supervised learning from noisy skeleton input• Efficient motion synthesis based on learned hierarchical affordances
IntroductionGoal:Obtain the optimal joint selection and grouping and interaction parsingresults by maximizing the joint probability.Algorithm:
Learning
Model
...
c
s1 sK
...
s2
p(c)
t 2 T1J t
p(Z)
J t
p(Z)
......
Joints Selectionand Grouping Interaction Parsing
...
t 2 T2
Zs2
Zs1
Shake HandsCurrent frame Affordance
For one instance of category c:
Inner loop A Gibbs sampling for our modified CRP
Initialization Skeleton clustering for initial sub-event parsing
Motion SynthesisGoal: Given the initial 10 frames (25 fps), synthesize the motion of an agentgiven the motion of the other agent and the interaction type.Algorithm:At time ,1) Estimate the current sub-event by DP2) Predict the ending time and the corresponding joint positions3) Obtain the joint positions at through interpolation
Experiment
Hand Over a CupThrow and CatchHigh-Five#50 #80 #110 #30 #80 #130 #15 #45 #75 #15 #40 #65 #20 #70 #120Shake Hands Pull Up
GT
Our
sHM
M
Exp 2: User study (14 subjects)Q1: Successful? Q2: Natural? Q3: Human vs. robot?From 1 (worst) to 5 (best)
0
0.2
0.4
0.6
0.8
1
GTOurs
Exp 1: Average joint distance in meters (compared with GT skeletons)
Examples of discovered latent sub-events and their sub-goals
Synthesis Examples
Frequencies of highscores (4 or 5) for Q3
AcknowledgmentThis research has beensponsored by grants DARPASIMPLEX project N66001-15-C-4035 and ONR MURI projectN00014-16-1-2007.
Scantovisitourprojectwebsite
p(G|Zc) /Y
k
p({J t}t2Tk |Zsk , sk, c)
| {z }likelihood
· p(c)|{z}interaction prior
·KY
k=2
p(sk|sk�1
, c)
| {z }sub-event transition
·KY
k=1
p(sk|c)
| {z }sub-event prior
For N training examples of category c ( ) :G = {Gn}n=1,...,N
p(Zc) =Y
s2Sp(Zs|c)
p({J t}t2T |Zs, s, c) = g({J t}t2T , Zs, s) m({J t}t2T , Z
s, s)
p(G, Zc) = p(Zc)QN
n p(Gn|Zc)
G = {Gn}n=1,...,N
Zc
p(zsai|Zs�ai) =
8>>><
>>>:
��
M � 1 + �if zsai > 0,Mzs
ai= 0
�Ms
zai
M � 1 + �if zsai > 0,Mzs
ai> 0
1� � if zsai = 0
t+ 5
t
t0
Human agentSynthesized agent
t+ 5 t0t
s2s1
UCLA Human-Human-Object Interaction Dataset• Five types of interactions; on average, 23.6 instances per interaction
performed by totally 8 actors. Each lasts 2-7 s presented at 10-15 fps.• RGB-D videos, skeletons and annotations are available:
http://www.stat.ucla.edu/~tianmin.shu/SocialAffordance
Pull Up
1 2 3 1 4 55 61 2 1 2
Hand Over a CupThrow and CatchHigh-FiveShake Hands
zsai ⇠ p(G|Z 0c)p(z
sai|Zs
�ai)
p(G,Zc) = p(G|Zc)p(Zc)
Recommended