Task-Oriented Planning for Manipulating Articulated ...maxim/files/planformanipuunderuncertainty_icra1… · types and choose actions that reduce the entropy in the belief. Additionally,

Task-Oriented Planning for Manipulating Articulated Mechanismsunder Model Uncertainty

Venkatraman Narayanan† and Maxim Likhachev†

Abstract— Personal robots need to manipulate a variety ofarticulated mechanisms as part of day-to-day tasks. These tasksare often specific, goal-driven, and permit very little bootstraptime for learning the articulation type. In this work, we ad-dress the problem of purposefully manipulating an articulatedobject, with uncertainty in the type of articulation. To thisend, we provide two primary contributions: first, an efficientplanning algorithm that, given a set of candidate articulationmodels, is able to correctly identify the underlying model andsimultaneously complete a task; and second, a representationfor articulated objects called the Generalized Kinematic Graph(GK-Graph), that allows for modeling complex mechanismswhose articulation varies as a function of the state space. Fi-nally, we provide a practical method to auto-generate candidatearticulation models from RGB-D data and present extensiveresults on the PR2 robot to demonstrate the utility of ourrepresentation and the efficiency of our planner.

I. INTRODUCTION

A typical domestic task such as preparing a meal wouldrequire a robot to open cabinets in the kitchen, fetch platesfrom drawers, move items on the countertop, and openthe sink tap. Many of these subtasks require the robotto understand the kinematic structure of the object beingmanipulated. Since these are commonly encountered tasks,it is natural to expect manufacturers of personal robots topreload them with a set of typical articulation models toaccelerate planning. For instance, when the robot is requiredto open a door, it might already deduce from the instructionthat the mechanism is a revolute joint, and only needs toreason if the door needs to be pushed or pulled, withouthaving to learn the articulation from scratch.

Much of the recent work on articulated objects deals exclu-sively with learning the articulation type given observationsof the object over time [7, 14], and taking actions that enablequick identification [2, 11]. While it would be useful toidentify the complete articulation structure by interactingwith the object, it would be inefficient for personal robotsto fully characterize a model when a specific task needs tobe accomplished. On the other hand, the planning processwould be more efficient when bootstrapped with a set ofcandidate models.

This paper provides several contributions towards task-oriented manipulation of articulated objects under uncer-tainty in their model types. As a first contribution, we developan efficient planner that can reason with uncertainty over aset of candidate models, to achieve a specific task. While the

This research was sponsored by ARL, under the Robotics CTA programgrant W911NF-10-2-0016.†Robotics Institute, Carnegie Mellon University, Pittsburgh, USA{venkatraman,maxim} at cs.cmu.edu

full planning problem with uncertainty in articulation modelis an intractable POMDP (also noted in [2]), we show thatwe can reduce it to a manageable belief MDP under certainassumptions. This belief MDP can then be solved in realtime using an MDP solver, making our approach practicallyattractive. The particular planner we use for solving thebelief MDP is based on LAO* [4], an efficient heuristicsearch technique for solving MDPs. To briefly summarize,LAO* interleaves forward exploration of the state spaceusing a heuristic, such as the one used by A* search [5], anddynamic programming backup updates, a la value iteration.The efficiency of the algorithm arises from carefully orderingthe sequence of dynamic updates, and by ignoring parts ofthe state space that would never be reached from the startstate under an optimal policy.

The second contribution of the paper is in extending therepresentation of articulated objects in order to allow forefficient planning under uncertainty in the mechanism type.Consider again the task of opening a door. While it istempting to dismiss it as a simple revolute mechanism, it hasits own complexities. More often than not, the door handle,which by itself is another revolute mechanism, controls thearticulation of the door. With the door handle unturned,the door remains rigidly fixed, while turning the handleimmediately activates the other revolute degree of freedom.To plan for manipulation of such objects, it is imperativethat the representation used by the planner captures suchcomplexities. The representation that we provide builds upona commonly used representation in literature, the kinematicgraph [14, 3]. The edges in the graph model the articu-lation between rigid components of an object, which arerepresented by the vertices of the graph. Consequently, itis sufficient to learn the parameters of the kinematic graphto understand the articulation of the object. We extend thekinematic graph to a Generalized Kinematic Graph (GK-Graph), which allows for state-dependent variation in theedge parameters of the graph. Using this representation, itnow becomes possible to capture non-trivial mechanisms,such as the kinematic relation between the door and the wall,with dependency on the state of the handle.

Finally, even with a database of articulation models, itmay be necessary to select relevant models for a particulartask given only the sensor data. Further, once the candidatemodels are selected, they need to be converted into the repre-sentation used by the planner. For this, we present a practicalmethod to auto-generate the candidate representations to beinput to the planner, by extracting kinematically relevantvisual features from sensor data, and then using a graph

segmentation algorithm to build the GK-Graph.In addition to the contributions discussed thus far, this

work, to our knowledge, is the first to formulate the problemof manipulating unknown articulated objects as a planningunder uncertainty problem. We demonstrate the end-to-endpipeline on the PR2 robot, a mobile manipulation platform,and provide experimental results that validate our claims.

II. RELATED WORK

Recently, there has been widespread interest in articulatedobjects in the robotics community. Katz et al. [7] identify thekinematic structure of planar articulated objects by trackingfeatures on the object. They cluster feature trajectories froma moving object into rigid components and identify jointrelationships between each pair by studying the rigid bodytransform of one with respect to the other. The same groupextended their work to handle identification of 3D articulatedobjects in [8]. Sturm et al. [14] presented a comprehensiveprobabilistic framework for identifying the articulation of anobject from an observed data sequence. They formulate theproblem as finding the most likely kinematic graph [3] thatexplains the observations. This is done by maximizing theposterior likelihood over possible kinematic graphs, whileusing Bayesian Information Criterion to restrict model com-plexity of the fit.

In [9], Katz et al. use a relational representation for actionsand formulate a relational Markov Decision Process (MDP)to compute a policy that would minimize the number ofactions needed to discover the degrees of freedom of anarticulated object. Barragan et al. [2] formulate the problemof identifying the articulation of an object as that of requiringthe entropy in the belief over the possible model types tobe less than a particular value. They use a Bayesian filterwhich can assimilate information from a variety of sensingsources to maintain the belief over the articulation modeltypes and choose actions that reduce the entropy in thebelief. Additionally, their formulation allows for modelingarticulated objects which are not just serial mechanisms.On a similar note, Otte et al. [11] formulate the problemof identifying an object’s degrees of freedom as that ofminimizing entropy in the current belief about the degreesof freedom.

Our work is fundamentally different from the two campsof work described above in that we are interested in pur-posefully manipulating an articulated object for a specifictask, and not in learning the kinematic graph structure, orminimizing the number of actions required to identify themechanism type. The proposed formulation as a planningproblem allows the robot to learn only as much as isrequired about the object’s kinematic structure to completea desired task, in contrast to existing learning formulationsthat ignore the task at hand. However, there are similaritiesin the representation we use for our planning problem withthose used in literature. While Sturm et al. [14] use thekinematic graph as a graphical model to infer the articulationstructure and parameters, we use an extension of it as arepresentation for our planner. In [2], a general formulation

v1

v2

v3

v4

v5

v6

v7

v8

v9

PRISMATIC

RIGID

REVOLUTE

nprism=(1,0,0)

nrev=(0,0,1)crev=(0,0,0):

G

Fig. 1: Illustration depicting a GK-Graph containing 4 rigidedges (solid lines), 2 revolute edges (dotted lines) and 3prismatic edges (dashed lines). Each vertex in the modelrepresents the 6-DoF position and orientation of a localcoordinate frame centered at that vertex. In the figure, nprismis the prismatic axis direction, nrev is the revolute axisdirection and crev is a point on the revolute axis.

is used that allows for mechanisms to switch parametersover time. However, this generalization is not groundedin the perceptual capabilities of a robot. The GeneralizedKinematic Graph that we use for our planner representationis an extension of the kinematic graph that provides thegeneralization described in [2], while at the same time, canbe grounded in the perceptual capabilities—such as by usingfiducial markers in the same vein as [14].

In another class of work, Jain and Kemp [6] propose theuse of shared haptic data to identify specific instances ofan object type while manipulating it. They demonstrate theirmethod on the task of identifying a specific door instance,such as a cabinet door or a refrigerator door, by using a data-driven approach. The same authors, collaborating with Sturmet al. [15], present a feedback approach for learning thearticulation model based on a robot’s end-effector trajectory,and then predicting the trajectory based on the most likelymodel estimated by the learning framework. This approach isadopted in works by the same group [12] for operating doorsand drawers. Again, our work differs from these methods asour goal is not identification of articulation models, but thatof planning to complete a given task as quickly as possibleunder uncertainty over model types.

III. GENERALIZED KINEMATIC GRAPH

The Generalized Kinematic Graph (GK-Graph) is a di-rected graph that represents the kinematic relationship be-tween features of interest or rigid components of an object.It is based on the kinematic graph [3], but extends it in thatit allows for modeling mechanisms where the relationshipbetween two rigid components can change as a function oftheir poses. Vertices vi in the GK-Graph correspond to 6-DoFposes of features on the articulated object, and the edgesparameterize the kinematic relationship between a pair ofvertices. An edge e from vi to vj is a tuple 〈m,β〉, where mis the kinematic relation type (such as prismatic or revolute),and β is a vector of parameters needed to describe m. Forexample, if v1 were a feature point on the wall and v2a feature point on the door, then the edge from v1 to v2has m = revolute and β = {~naxis,~cpoint, θmin, θmax}. All theparameters specified by β, in this case the axis of rotation,a point on the axis, and rotation limits, are in the local

G1 G2

G4G3

v1

v2

v3

v4

v5

v6

v7

v8

F = -1

F = 3F = 1

F = 1

Fig. 2: GK-Graph states for a 1-D drawer. Solid lines inthe GK-Graph represent rigid edges, while the dashed linesrepresent prismatic edges. The transition forces are exertedat vertex v7, with positive forces corresponding to openingthe drawer, and negative ones closing the drawer.

coordinate frame of v1. Fig. 1 shows another example ofa GK-Graph.

The GK-Graph differs from the kinematic graph becausewe allow the edge tuple 〈m,β〉 to be a function of thecurrent coordinates of the vertices f(V ), where V ={v1, v2, . . . , vk}. This makes the GK-Graph a complete statefor all mechanism models whose articulation varies only asa function of its configuration. That is, if we knew f(V ),then a future state of the object can be computed given thecurrent one and the action applied. This addition increasesthe expressiveness of the GK-Graph, and allows it to capturecomplex articulated mechanisms. For example, the positionsof the vertices that represent a door handle affect whetherthe relationship between the door and the wall is a rigid orrevolute one.

Fig. 2 shows an example of how the GK-Graph representsa state for a 1-D drawer. GK-graph G1 is the initial state ofthe object, with 8 vertices and 10 edges. The edges betweenv4 → v7, and v5 → v8 are prismatic, while all others arerigid. Applying a force of −1 on v7 pushes the drawer in,and results in the state G2, while a positive force of 1 opensthe drawer, producing the state G3. Applying a unit force onpoint v7 in G3, and a force of 3 on point v7 in G2 both resultin the same configuration G4. While simplistic, this exampleindicates how a GK-Graph might be useful for planning.Additionally, the actions that affect the GK-Graph state couldbe more sophisticated than simple forces applied on contactpoints on the object. The degree to which we can computethe resulting state depends on the fidelity of the simulationused and the input action types it supports. Since the focusof this work is on efficient planning, we will use a basic rigidbody simulator that supports articulation constraints, whilenot worrying about such details as friction.

IV. PLANNING UNDER MODEL UNCERTAINTY

A. Problem Formulation

First, we assume that there are N candidate mechanismmodels, out of which one represents the true mechanismof the object. The planning problem then is to completea specific task, whose goal is expressed as a particularconfiguration of the GK-Graph. This configuration could also

be partially specified—for instance, the goal might be to havethe GK-Graph vertex corresponding to the door handle tobe at a particular 6-DoF pose in the workspace. We willuse x ∈ X to denote the set of vertices in the GK-Graph.Each candidate mechanism model θ ∈ {1, 2, . . . , N} definesa function fθ(x) which specifies the edge tuples in the GK-Graph as a function of the vertices x. Let A denote the spaceof possible actions that can be applied on the object.

Since θ is unobserved, the planning problem as formulatedis a POMDP with states s ∈ S denoting the tuple 〈x, θ〉. Letb ∈ B be a belief state representing a probability distributionover s. Solving the problem as a POMDP would require us tocompute the optimal policy π∗ : B → A. A naive approachwould be inefficient since the dimensionality of B is N |X |,in addition to b being continuous.

We make two simplifying assumptions that reduce theproblem to a tractable MDP on the belief state that can thenbe solved efficiently. First, we assume that the uncertaintylies only in θ, which is equivalent to saying that the verticesof the GK-Graph are fully observable—i.e, the set of 6-DoF poses on the object can be tracked without uncertainty.Second, we assume that the GK-graph transition model isdeterministic. That is, given θ, the vertices x and action a, thenext state x′ is completely determined by x′ = SIM(x, θ, a),where SIM is the simulator for the GK-Graph.

Since x is completely observed, the POMDP can be re-written as a Mixed-Observability MDP, or MOMDP [10].Specifically, we only need to keep track of beliefs over θ fora given x. Let this be denoted as bx(θ). The dimensionalityof this new belief space is only N , the number of candidatemechanism models. Following [10], we can factorize thetransition update for the MOMDP after executing action aat bx(θ) as:

b′x′(θ′) = η∑θ

p(x′|x, θ, a)p(θ′|x, θ, a, x′)bx(θ), (1)

where η is a normalization constant. Because the transitionmodel for s was assumed to be deterministic, and θ doesnot change over the planning horizon (i.e, θ has no motionmodel), the above expression simplifies to

b′x′(θ′) = η∑θ

1x′(SIM(x, θ, a))1θ′(θ)bx(θ)

= η1x′(SIM(x, θ′, a))bx(θ′)

=

{ηbx(θ′) if x′ = SIM(x, θ′, a)

0 otherwise(2)

where 1(·) is the indicator function. A key result from theabove is that the number of successors for a belief statebx(θ), given an action a, is finite, and bounded by the numberof mechanism models. This greatly simplifies the planningproblem. Note that we do not include observations explicitlyin updating the belief. This is because our only observationis x, the 6-DoF poses which are already part of the state.

B. Planning on the Belief MDPNow that we have a tractable belief MDP (b-MDP)

where the number of successors for a given belief state is

Algorithm 1 LAO* for b-MDPs1: procedure COMPUTEPOLICY(bstart)2: G← {bstart}3: G∗ ← {bstart}4: while G∗ has non-terminal states do5: T ← DFSTRAVERSAL(G∗)6: for all b ∈ T do7: if b is not yet expanded then8: [S, P,C]← GETSUCCESSORS(b)9: b′ ← Si

10: for all i ∈ {1, 2, . . . , |S|} do11: v(b′)← GETHEURISTIC(b′)12: G← G ∪ {b′}13: abest(b)← argmina∈A(b)

(cost(b, a)

+∑

b′∈SUCC(b,a) p(b′|b, a)v(b′)

)14: v(s)← cost(b, abest(b))

+∑

b′∈SUCC(b,abesti(b))p(b′|b, a)v(b′)

15: update the best partial solution graph G∗ from G

16: IMPROVEVALUES(G∗)17: return G∗

bounded by the number of candidate models, we can useany MDP solver to obtain a policy over the b-MDP. Fornotational convenience, we will represent bx(θ) as b andassume that b has two components: b.x, the 6-DoF posesand b.θ, an N -vector representing the probability distributionover θ. Given a start state bstart = 〈xstart, θstart〉, theplanning problem now is to find the optimal policy π∗b :REACHABLE(bstart) → A, where REACHABLE(b) is theset of all belief states that can be reached starting from b,under the optimal policy. Since the successors of a state aredetermined easily for a given belief state b, LAO* (shownin Algorithm 1) is an attractive algorithm for solving thisproblem as it efficiently interleaves forward expansions anddynamic backup updates while ensuring that we never lookat the states unreachable from the start under an optimalpolicy [4]. LAO* maintains a partial solution graph G∗ thatstores all the belief states reachable from bstart under thecurrent policy, and the corresponding best action to take ateach belief state. Then, it iteratively performs the followingtwo operations until there are no non-terminal states in G∗:first, a depth-first traversal of G∗ is obtained, and for everystate in the traversal order, it is expanded (Algorithm 2) ifnot a terminal state, and then a backup update is performed(lines 13 and 14 in Algorithm 1). The heuristic for a leafstate (line 11) is an admissible estimate of the cost-to-go forthat state. Next, the best solution graph G∗ is updated basedon the newly computed v-values (cost-to-go), and the entireprocess is repeated again. Once there are no non-terminalstates in G∗, value iteration is performed on the best solutiongraph (again in the DFS traversal order), until the v-valueshave converged to some ε-tolerance. The final solution graphG∗ is returned as the policy to be executed.

C. Practical Considerations

Although we compute a full policy by solving the beliefMDP, we do so under the assumption that we will visit statesonly under the optimal policy. While this is fine in general,

Algorithm 2 Successor Generation for b-MDP States1: procedure GETSUCCESSORS(b)2: S = {} // successors3: P = {} // transition probabilities4: C = {} // costs5: x← b.x; θ ← b.θ6: for all a ∈ A(x) do7: for all i ∈ {1, 2, . . . , n} do8: x′i = SIM(x, θ, a)

9: for all i such that xi is unique do10: θ′ = zeros[N ]11: p← 012: for all j such that xi = xj do13: θ′[j] = θ[j]14: p← p+ θ[j]

15: θ′ ← NORMALIZE(θ′)16: b′ ← 〈x′i, θ′〉17: S ← S ∪ {b′}18: P ← P ∪ {p}19: C ← C ∪ {cost(x, a, x′i)}

return {S, P,C}

Algorithm 3 Sense-Plan-Act Cycle1: procedure MAIN()2: while not SATISFIESGOAL(bstart) do3: π ← COMPUTEPOLICY(bstart)4: BEGINPOLICYEXECUTION(π)5: wait for new observation z6: bstart ← UPDATEBELIEF(z, bstart, πexecuted)

this would never be true in practice. There could be multiplesources of error, either from the simulation (the transitionstochasticity) of the GK-Graph, or from the sensing, bothof which were assumed to be non-stochastic in the planningphase. A practical approach to address this is to assume thatstochasticity in observations received during execution arisesin fact from the transition model, and then updating our beliefbefore replanning again. From Eq. 1, we write the transitionupdate for the belief

b′z(θ′) = η

∑θ

p(z|x, θ, a)p(θ′|x, θ, a, x′)bx(θ)

= η∑θ

p(z|x, θ, a)1θ′(θ)bx(θ)

= ηp(z|x, θ′, a)bx(θ′) (3)

where this time, we have not replaced p(x′|x, θ′, a) withthe indicator function. On the other hand, it would bereasonable to use a stochastic transition model here to ensurethat the belief updates capture uncertainty to some extent.For instance, the particular belief update we use in ourexperiments is

b′z(θ′) = ηN (z|SIM(x, θ′, a),Σmotion)bx(θ′) (4)

where Σmotion is a covariance matrix that captures thetransition uncertainty. Once the belief update is done, wereplan with the new start belief state b′z(θ).

(a) Planar segmentation (b) Rectangle detection

(c) Generating candidate revoluteand prismatic axes

(d) Assigning edge tuples to theGK-Graph

Fig. 3: The perception pipeline for auto-generating candidatearticulation models.

The main loop for replanning and updating the belief isshown in Algorithm 3.

V. EXPERIMENTS

We tested our planner on Willow Garage’s PR2 robot formanipulating commonly encountered articulated objects. Forthese experiments, we used our perception system to auto-generate candidate mechanisms for each object, to providean input to the planner. We first describe the details of ourperception pipeline.

A. Perception Pipeline for Auto-generating Models

Since the focus of this work is on planning, we use fiducialmarkers on the objects to track the vertices of the GK-Graph.These fiducial markers are placed randomly on the object,with no attention to articulation details. The only requirementis that there is at least one pair of fiducial markers that canexplain the underlying articulation. The tracked 6-DoF posesof the fiducial markers directly serve as the vertices of theGK-Graph. To compute the edges in the graph, we use anearest neighbor method to connect every vertex with thosewithin a radius of 50cm.

The next step is to generate candidate edge tuples for theGK-Graph. This is done as a two step process:

Estimating articulation axes: We use the Point CloudLibrary [1] to segment out planes in the RGB-D sensordata. Next, for each plane segmentation, we run a RANSACmethod for detecting rectangles in the contours of the plane.For each rectangle generated, we compute 5 articulation axes:4 revolute axes, one for each edge of the rectangle, and 1prismatic axis normal to the rectangle. These steps are shownin Fig. 3.Mincut for GK-graph segmentation: Once the articulationaxes are estimated, they need to be translated into edge tuplesfor the GK-Graph, given the arbitrary positions of the fiducial

markers. This is done using a mincut operation on the GK-Graph [13], with edge weights computed as follows: Let xiand xj be the cartesian 3D locations of the vertices vi and vj .Let dij = ‖xi−xj‖ and wij be the weight assigned for edgeeij to compute the mincut. We define a model specific angleθ, which for prismatic models is the angle between xi− xj ,the edge and the prismatic axis. For revolute models, θ isthe angle between the lever arms of x1 to the revolute axis,and x2 to the revolute axis.

The weights for the prismatic and revolute cases areassigned as follows:

wij =

{exp(−(α · dij + β · cos(θ))) if prismaticexp(−(α · dij + β · sin(θ))) if revolute

where α and β are weight factors controlling the tradeoffbetween spatial and kinematic components. The edges inthe computed mincut are then assigned the correspondingkinematic model parameters (obtained from the rectanglesegmentation), and the rest are assigned rigid model param-eters. Intuitively, the spatial component encourages edgesbetween vertices close to each other to have rigid modelassignments, while the kinematic component encouragesedges on a plane surface to have prismatic assignments forthe prismatic case, and edges on a plane surface to have rigidassignments for the revolute case.

B. PR2 Experiments

We tested our closed loop perception-planning system onWillow Garage’s PR2 robot, for manipulating a variety ofobjects such as drawers and cabinets in offices, and shelves,drawers and a refrigerator in the kitchen. Fig. 4 shows someof the snapshots from these experiments. The GK-Graph wassetup using fiducial markers on the object, and candidatekinematic mechanisms were auto-generated as discussed inSec. V-A. Grasp locations on the objects were specifiedusing a simple virtual fixture, as done in the recent DarpaRobotics Challenge. Once a grasp location is provided, anew vertex is automatically added to the GK-Graph andconnected to the nearest existing vertex with rigid modelparameters. The initial belief state of the robot was set asa uniform prior over the candidate models. Since the taskwas to open these objects, the goal state was defined as aGK-Graph state in which the end-effector of the robot (orthe grasp vertex in the GK-Graph) had moved a particulardistance dgoal away from its initial position. While this seemsto be an atypical way of specifying the goal, it approximateswell a real-life task goal such as ‘open the object until anitem of interest inside is visible or reachable’. Our action setconsisted of a set of 20 forces sampled uniformly on a unitsphere. The heuristic used for LAO* encouraged states withgrasp vertices far from the start state to be expanded first.Specifically, for a belief state b, the heuristic was computedas h(b) = min (0, dgoal − ‖b.x[vgrasp]− bstart.x[vgrasp]‖).Finally, to execute an action returned by the policy, we usean inverse kinematics controller to move the end-effector ofthe robot from the grasp vertex of the current belief state, tothat of the next.

Fig. 4: The PR2 robot opening a variety of articulated objects in an office space and a kitchen.

Fig. 5: Illustration of the end-to-end pipeline for manipulating articulated objects. The robot first encounters an unknownarticulated object, a kitchen cabinet in this case. The perception system described in Sec. V-A generates candidate articulationmodels, which the planner then uses to generate a policy. After executing the first action, the robot observes the GK-Graphvertices (the fiducial markers) and updates its belief over the candidate models. In particular for this example, the robotbelieves that the revolute axis on the right is more likely than the one on the left after taking the first action. The planneragain computes a new policy using this updated belief, and the process repeats until the task is completed

.

Using the approach described above, the PR2 robot couldopen every object shown in Fig. 4. While the system wassuccessful in most of the trials, we observed two commonfailure modes. First, the planning does not include thekinematics of the robot manipulator. This leads to partialplans (cartesian trajectory of 6-DoF end-effector poses) thatare infeasible for execution, thereby causing a standstill. Thecorrect solution to this would be to plan in the combinedbelief space and configuration space of the robot. However,this increases the dimensionality of the problem dramaticallyand is perhaps unnecessarily complicated. Instead, a practicalsolution would be to obtain the cartesian end-effector trajec-tory in the usual way, and then use a full-body planner forthe robot (joint planning in arm and base motions) to find aplan for the robot that is feasible for execution. The secondfailure mode arises when the fiducial markers on the objectgo out of the field of view, or are occluded by the robotduring execution of a plan. Once again, while the correctsolution would be to model the GK-Graph state as partiallyobservable, a more practical solution would be to run anindependent state estimator on the tracked GK-Graph.

The associated video contains trials of the PR2 openinga variety of articulated objects, including those in Fig. 4. In

Fig. 5, we show the end-to-end pipeline for manipulating anunknown articulated object.

C. Planner Efficiency Tests

To test the robustness of our method and quantify theplanner’s efficiency, we ran 25 trials each on an office cabinetand drawer, with the task being to open the object startingfrom a closed state. For the efficiency tests, the goals werespecified as a particular end-effector locations that must beattained by the robot. The heuristic for LAO* is then simplythe Euclidean distance between the 3D coordinates of thegrasp vertex at the current state and the goal state. Thesuccess rate was 100% on these efficiency test trials. Fig. 6shows the average planning time and expansions for the trialson opening drawers and cabinets, as a function of the replannumber. In our trials, the robot had to replan up to 4 timesduring execution based on its observations. The statistics arecomputed over trials that had the same number of replans,and ‘expansions’ refers to the number of times the LAO*planner called for a simulation of a GK-Graph state. Whilethe average number of expansions decreases monotonicallywith the replan number, the planning time does not. Thisis a consequence of the fact that although the underlyingGK-Graph state may be the same, the time taken for value

Replan Number Replan Number31 2 31 2

5

0

10

15

20

0

20

40

60

80

100

120

140

Avg. Planning Time (s)

Avg. Expansions

(a) Statistics for opening drawers

Replan Number Replan Number

1

2

3 4 1 2 3 40

1 2

3

4

5

6

7

8

9

0

20

40

60

80

100

120

(b) Statistics for opening cabinets

Fig. 6: Planning statistics for trials on opening drawers andcabinets.

iteration to converge on the partial solution graph dependson the initial belief state over the models. Since there isstochasticity in our observations over the trials, it is expectedthat the time taken for the v-values to converge will differin each trial. This can also be seen from the error bars onthe planning times.

VI. CONCLUSIONSAs a primary contribution, we have presented an efficient

planner for manipulating articulated objects under uncer-tainty in the true mechanism of the object. The efficiencyarises from assuming a deterministic transition model forthe object given the true mechanism type, and a completelyobservable object state, which then reduce the original in-tractable POMDP planning problem to a belief MDP with afinite number of successor states. By using a heuristic MDPsolver (LAO*) that discards extraneous parts of the beliefspace, we can run a real-time planning and execution loop.We also provide an expressive representation of articulatedobjects, namely the Generalized Kinematic Graph, for mod-eling complex kinematic mechanisms, and a heuristic methodbased on graph segmentation for auto-generating candidatearticulation models from RGB-D data. Extensive trials anddemonstrations on the PR2 robot showcase the capability ofour planner, and its real-time applicability in personal robotmanipulation tasks.

A natural extension and future work is to handle stochastictransition models for the object and sensing uncertainty.While this would be expected to improve planner perfor-mance and success rates, it is unclear if there is structure in

the problem that can be exploited for solving the completePOMDP. Also, an interesting direction for future work wouldbe to handle candidate models that are only approximate orunder-specified. This now introduces a transition model forthe unobserved part of the state. Although this would increasethe planning complexity, there is still structure in the problemsince the object state is completely observed.

REFERENCES

[1] Aitor Aldoma, Zoltan-Csaba Marton, Federico Tombari, Wal-ter Wohlkinger, Christian Potthast, Bernhard Zeisl, Radu Bog-dan Rusu, Suat Gedikli, and Markus Vincze. Pointcloud library. IEEE Robotics & Automation Magazine,1070(9932/12), 2012.

[2] Patrick R Barragan, Leslie Pack Kaelbling, and TomasLozano-Perez. Interactive bayesian identification of kinematicmechanisms.

[3] Roy Featherstone and David Orin. Dynamics. siciliano, brunoand khatib, oussama (eds.), handbook of robotics. 2008.

[4] Eric A Hansen and Shlomo Zilberstein. Lao: A heuristicsearch algorithm that finds solutions with loops. ArtificialIntelligence, 129(1):35–62, 2001.

[5] Peter E Hart, Nils J Nilsson, and Bertram Raphael. Aformal basis for the heuristic determination of minimum costpaths. Systems Science and Cybernetics, IEEE Transactionson, 4(2):100–107, 1968.

[6] Advait Jain and Charles C Kemp. Improving robot manipula-tion with data-driven object-centric models of everyday forces.Autonomous Robots, 35(2-3):143–159, 2013.

[7] Dov Katz and Oliver Brock. Manipulating articulated objectswith interactive perception. In Robotics and Automation, 2008.ICRA 2008. IEEE International Conference on, pages 272–277. IEEE, 2008.

[8] Dov Katz, Moslem Kazemi, J Andrew Bagnell, and AnthonyStentz. Interactive segmentation, tracking, and kinematicmodeling of unknown 3d articulated objects. 2013.

[9] Dov Katz, Yuri Pyuro, and Oliver Brock. Learning tomanipulate articulated objects in unstructured environmentsusing a grounded relational representation. In In Robotics:Science and Systems. Citeseer, 2008.

[10] Sylvie CW Ong, Shao Wei Png, David Hsu, and Wee SunLee. Planning under uncertainty for robotic tasks with mixedobservability. The International Journal of Robotics Research,29(8):1053–1068, 2010.

[11] Stefan Otte, Johannes Kulick, Marc Toussaint, and OliverBrock. Entropy-based strategies for physical exploration ofthe environments degrees of freedom.

[12] Thomas Ruhr, Jurgen Sturm, Dejan Pangercic, Michael Beetz,and Daniel Cremers. A generalized framework for openingdoors and drawers in kitchen environments. In Robotics andAutomation (ICRA), 2012 IEEE International Conference on,pages 3852–3858. IEEE, 2012.

[13] Mechthild Stoer and Frank Wagner. A simple min-cut algo-rithm. Journal of the ACM (JACM), 44(4):585–591, 1997.

[14] Jurgen Sturm. Learning kinematic models of articulatedobjects. In Approaches to Probabilistic Model Learning forMobile Manipulation Robots, pages 65–111. Springer, 2013.

[15] Jurgen Sturm, Advait Jain, Cyrill Stachniss, Charles C Kemp,and Wolfram Burgard. Operating articulated objects basedon experience. In Intelligent Robots and Systems (IROS),2010 IEEE/RSJ International Conference on, pages 2739–2744. IEEE, 2010.

Documents

Task-Oriented Planning for Manipulating Articulated ...maxim/files/planformanipuunderuncertainty_icra1… · types and choose actions that reduce the entropy in the belief. Additionally,