Data-Driven Model Predictive Control for the Contact …Data-Driven Model Predictive Control for the Contact-Rich Task of Food Cutting Ioanna Mitsioni 1, Yiannis Karayiannidis2, Johannes

Data-Driven Model Predictive Control for the Contact-Rich Task ofFood Cutting

Ioanna Mitsioni1, Yiannis Karayiannidis2, Johannes A. Stork3 and Danica Kragic1

Abstract— Modelling of contact-rich tasks is challenging andcannot be entirely solved using classical control approachesdue to the difficulty of constructing an analytic descriptionof the contact dynamics. Additionally, in a manipulation tasklike food-cutting, purely learning-based methods such as Rein-forcement Learning, require either a vast amount of data thatis expensive to collect on a real robot, or a highly realisticsimulation environment, which is currently not available. Thispaper presents a data-driven control approach that employs arecurrent neural network to model the dynamics for a ModelPredictive Controller. We build upon earlier work limited totorque-controlled robots and redefine it for velocity controlledones. We incorporate force/torque sensor measurements, refor-mulate and further extend the control problem formulation. Weevaluate the performance on objects used for training, as wellas on unknown objects, by means of the cutting rates achievedand demonstrate that the method can efficiently treat differentcases with only one dynamic model. Finally we investigate thebehavior of the system during force-critical instances of cuttingand illustrate its adaptive behavior in difficult cases.

I. INTRODUCTION

In contact-rich manipulation tasks, such as almost anyphysical interaction with objects, the contact dynamics ex-hibit a high degree of variation based on the type of task,as well as physical properties of the objects. Humans arehighly compliant and able to gracefully manipulate differentobjects and tools by adjusting the exerted force based on theirmulti-modal sensory feedback, extensive training and longinteraction experience. Depending on the task and the typeof objects one interacts with, there are various parametersthat affect the interaction and require different types ofcontrol. For example, different types of knives need to becontrolled differently depending on their properties such asweight or material. Even when the same knife is used forcutting various types of vegetables, the controller needs totake into account the size and hardness properties of eachtype. Furthermore, even for the same vegetable class, forexample a pepper, we cannot assume that the hardness iseither constant or uniform, given the various pepper types orfor a single pepper, given the empty core.

1 Division of Robotics, Perception and Learning (RPL), CAS,EECS, KTH Royal Institute of Technology, Stockholm, Swedenmitsioni,[email protected]

2Division of Systems and Control, Dept. of Electrical Engi-neering, Chalmers University of Technology, Gothenburg, [email protected]

3Center for Applied Autonomous Sensor Systems (AASS), Orebro Uni-versity, Orebro, Sweden. [email protected]

This work was supported by the Swedish Foundation for StrategicResearch project GMT14-0082 FACT and by the Knut and Alice WallenbergFoundation projects WASP and IPSYS.

(a) Data collection

(b) Online deployment

Fig. 1: System overview during data collection and online deployment

In robotic manipulation, to address such parameter vari-ations, classical force control approaches require a lot ofexplicit tuning. On the other hand, data-driven learningapproaches require vast amounts of data that may be consid-erably expensive or impossible to acquire in a real setting.Finally, all these parameters cannot be simulated in a realisticmanner given the available simulators. A popular controlscheme that does not require a lot of fine tuning is ModelPredictive Control (MPC) [1]. Controllers of this type areable to optimize the control inputs by considering the long-term performance based on an accurate model of systemdynamics. However, for tasks exhibiting a large degree ofvariation in the dynamics, providing such a detailed analyticmodel can be challenging or even practically unattainable.

In this work, we address a challenging task, that of foodcutting, by combining an MPC with a data-driven approachas shown in Fig.1. Our system employs deep learning andrecurrent neural networks to act as the basis for the controller.As a result, we obtain a solution that leverages the modelingcapability of deep learning and combines it with the objectindependent control of MPC to address the aforementionedvariations. In contrast to open-loop policies, where the com-plete control sequence depends only on the initial state,receding horizon control schemes, such as MPC, implementa closed loop policy with the control action depending onthe measured state at every timestep. This way, potentialmodel/plant mismatches, disturbances or discontinuities inthe dynamics can be accounted for at the next samplinginstant, preventing the system from exhibiting unstable be-haviour. Additionally, the model and the controller itself

arX

iv:1

903.

0383

1v2

[cs

.RO

] 2

6 Se

p 20

19

are not coupled since MPC does not require training, so ifa different behaviour needs to be generated, the user cansimply change the cost function parameters to accommodatefor it.

We chose to examine the potential of the data-drivenMPC approach in the context of robotic food-cutting as itsintrinsic dynamics provide an excellent test-bed. Althoughthe problem has previously been addressed in the seminalwork of Lenz et al. [2], it has not been revisited with adata-driven approach. We believe that this is due to thecomplexity of the task itself, as well as the challenge ofadjusting the original method of [2] to different robotic set-tings considering the differences in technical aspects such assensory information or payloads. It should be noted that mostrobots used in similar research topics are torque-controlledand inherently compliant, unlike the majority of widely-usedrobots that are either position, or velocity controlled. Thisadds another level of complexity as the state variables cannotinclude joint torques and the control laws need to be resolvedeither through position or velocity. With that in mind, weare taking advantage of force/torque sensors to implement areactive controller that encodes the desired behavior whilealso allowing for a task-space control scheme.

The main motivation behind our work is to find a balancebetween classical control approaches that require tuning andan analytic description of the task on one side, and purelylearning approaches that require a vast amount of data, onthe other. In this work, we thus reformulate [2] to velocitycontrolled manipulators with no access to joint torque mea-surements, but that are instead equipped with force sensing.We illustrate the effectiveness of our method in learning thedynamics of the manipulation task in a series of experimentsand additionally demonstrate its generalization to unknownobjects. Finally, we further evaluate its adaptation ability indifficult, force-critical cases where the dynamics might leadto a failed cut.

II. RELATED WORK

The goal of our work is to develop a data-driven controlapproach that allows for online adaptation of the system’sbehavior in a setting with complex dynamics. Towards thisaim, we learn the physical dynamics through a recurrent deepnetwork that allows the prediction of future system stateswhile performing online calculations for a Model PredictiveController. Fig. 1 presents schematically the control processduring data collection (Fig. 1a) and online deployment (Fig.1b). Below, we summarize relevant work both in terms ofclassical control and data-driven approaches.

Robotic manipulation has been mainly treated throughforce control [3]–[5]. Despite their robustness and goodperformance in simple, well-defined tasks, force controllerslack the ability to discern long-term interactions. In a com-plicated task that displays a lot of variations in the contactdynamics, that translates to tedious tuning in order to makethe controller effective [6]. Adaptive control [7], [8] offersa mechanism for adjusting such parameters online and canaccount for some degree of parametric uncertainty in the

system. However, it requires much simpler models and eventhen, results in a greedy policy as opposed to the, locally,optimal ones we address. Parallel and hybrid position/forceapproaches [9], [10] provide efficient formulations for aplethora of tasks but are still only applicable to simplerinteraction models where the geometries can be accuratelydefined by hand-picking the corresponding position/forcecontrolled axes through the selection matrices. Finally, opti-mal control approaches such as LQR [11] and its variants,although capable of adapting to the observed dynamics, tendto require linear or linearized models and quadratic costs thatinevitable hinders the complexity they can handle.

In an attempt to overcome the difficulties of classicalcontrol, several categories of data-driven approaches haveemerged in manipulation. Some of them, employ humandemonstrations in order to learn the force profiles needed tosuccessfully complete complex tasks [12]–[17]. However, inour scenario, recording a demonstration by leading the robotthrough the motion, would prove problematic since it wouldbe impossible to distinguish the wrench applied by the humanfrom the one applied by the object without having to resortin external tracking solutions as the one in [18]. Consider-able progress has been made by the reinforcement learningcommunity [19], [20] but a large body of these methods ismodel-free which is limiting due to sample complexity. Onthe other hand, model-based methods [21]–[25] require lesssamples but employ simpler approximators such as linearrepresentations, which are are not appropriate for highlynon-linear dynamics or Gaussian process that cannot handlediscontinuities. In addition, reinforcement learning requireseither online exploration, which would be dangerous in thissetting, or training in simulation, which is not realistic as it isinfeasible to accurately model the dynamics during cutting.

Similar to our work, several researchers have used neuralnetworks to learn the dynamics model for an MPC. In [26],[27] however, the authors’ focus on tasks with a narrowset of dynamics and the final goal is to improve an RLpolicy through real world trials. Lastly, in [28], [29] theauthors combine deep networks and MPC in a manner akin toours but in the distinct fields of self-tuning optical systemsand robot-assisted dressing which contains simpler contactdynamics.

III. PROBLEM FORMULATION

Consider a robotic manipulator equipped with force sens-ing. Let p ∈ R3 denote the translation part of the end-effectorpose in the world frame and Fs ∈ R3 the sensor’s forcemeasurements. Let further pd, pd denote a desired positionand velocity of a trajectory, Fr,Fd the reference and desiredforce and u a velocity control input. If the goal is to follow apredefined trajectory in a compliant manner, we can employa variation of admittance control, called the inverse dampingcontrol law

u = Ka(Fs − Fr). (1)

We can then define the desired compliant behavior as

Fr = Fd −K−1a (pd −Kpep) (2)

FTS

z

yxFig. 2: World frame orientation. The cutting axis corresponds to Z andthe sawing to Y. FTS denotes the force/torque sensors.

where Kp, Ka ∈ R3×3 are the stiffness and compliancegain matrices and ep = p − pd is the position error.Substituting Eq. (2) in Eq. (1) and noting that the controlinput corresponds to the Cartesian velocity, results in thedesired dynamic behavior

ep +Kpep = Kaef (3)

where ep = p− pd and ef = Fs−Fd ∈ R3 are the velocityand force errors.

The variations in the contact properties during a cuttingtask make it impossible to globally define a fixed trajectorypd, pd or a desired force Fd. For instance, for objects ofdifferent size, different types of motion are required to ensurethat the knife will make initial contact. Moreover, a fastmotion with large velocity that is appropriate for less stiffobjects, will not be applicable to stiffer ones. Nonetheless,we can produce the desired behavior by determining areference force Fr for the axes that are involved in thecutting motion in Eq. (3). Since the geometry and thedynamics of the contact are unknown, we model them as adiscrete-time dynamics function pt+1 = f(pt,Fst,Frt) anddetermine an optimal reference force such that it minimizesa cost C(pt,Frt) over a time horizon T , by solving theoptimization problem

F∗r = argmin

Fr

T∑k=0

C(pt+k,Frt+k

). (4)

We parametrize the dynamics function f(pt,Fst,Frt) as adeep recurrent network that receives the current positions,measured and reference forces, and outputs the estimated fu-ture positions. We define the model’s state as the augmentedstate vector xt = {pt,Fst} for brevity and denote vt = Frt,resulting in the formulation

pt+1 = f(xt,vt). (5)

Incorporating the force measurements in the augmented statevector and defining the transition function is such a way,allows this method to be used with velocity-controlled robots.However, if a torque interface is available, substituting Fs

with the current end-effector wrench provided by the robot’sjoints and Fr for the future control input, will result in theformulation proposed in [2].

We simplify the problem by considering only the transla-tional components of the motion, namely Y and Z as seen inFig.2, as the state of the MPC. Since we do not require anytrajectories for these axes and F∗

r acts as Fd during onlinedeployment, the respective gains are set to zero. Accordingly,

Fig. 3: Network architecture for implementing Eq. (5).

the rest of the axes are stabilized through a set-point stiffnesscontrol law by setting xd and the corresponding compliancegains to zero.

IV. METHOD

In this section, we describe the architecture of the networkimplementing Eq. (5) and the training procedure in Sec. IV-A. In Sec. IV-B we explain the data collection process andfinally, in Sec. IV-C we present the procedure for the onlinedeployment of the method.

A. Network Architecture and Training

The network architecture is depicted in Fig. 3 and consistsof 6 fully connected (FCi) layers and 2 recurrent layers with30 units each (RNNlatent). Since the network serves as adynamic model, it is imperative that i) the representationcaptures the most relevant parts of the input, and ii) that itis able to handle the task’s time-varying nature. Regardingthe former, in cases when contact tasks are addressed,positions and forces carry redundant information that maycompromise the network’s performance. By embedding themeasurements in a lower-dimensional latent representation,the most salient features of the data are kept, thus allowinglearning to be more effective. To accomplish this, the systemstate is initially processed by 2 fully connected layers thatare structured as an Encoder (FCen) and provide the latentrepresentation of the dynamics by learning a mapping fromthe initial 6-dimensional input to a 3-dimensional latentfeature vector.

To address the time dependency of the dynamics, weemploy a recurrent layer (RNNlatent) that aims at modellinglong-term relationships in the latent representation beforethe output layer. Accordingly, the immediate dependen-cies on the current state and control input are treated byFCstate, FCinput and concatenated at the output layer.

Before feeding the data to the network, we applied thesame preprocessing suggested by [2], namely, instead ofconsidering single points in time, we built non-overlappingblocks consisting of M timesteps of positions and forces.This allows the input vector to be encoded as a sequence thatmodels the actual state in a more appropriate manner thana single timestep would. Every block b then, correspondsto the sampling interval

[bM, (b + 1)M − 1

]leading to

xb ∈ RM×6 and vb, pb ∈ RM×3. The position part of xb

is expressed relatively to the previous block’s last position.In that way, a relative displacement over time is acquiredwhich serves as a generalized velocity. This intuitively agreeswith our goal of learning interaction control as a mappingbetween velocities and forces by drawing a parallel to thesystem’s mechanical impedance. Although we could haveused measured Cartesian velocities immediately, in a realrobotic setting it is not advisable as the measurements areusually very noisy and not easy to acquire. Finally, bothrelative positions and forces in every block are normalizedby removing the mean and scaled to unit variance to makesure the network weighs them equally.

Since we use relative positions that form time blocks, thesign indicates the direction of the motion. Depending on thematerial at hand or the tool used, the forward and backwardmotions might not have the same dynamics, e.g. using aserrated knife results in completely different force profilesfor the two motions. To facilitate learning and ensure thatnegative values are represented properly, the non-linearitiesof the network are hyperbolic tangents that do not thresholdat zero.

The final network is trained to predict a sequence ofThorizon timesteps (or Hb = Thorizon/M time-blocks) in thefuture by iteratively using its previous predicted outputs asthe new position inputs. During training, the correspondingforce input blocks are the forces of the respective futureblocks and during online deployment the actual force mea-surements. The MPC cost can then be calculated as

C = C(pb,ub) +

Hb∑i=1

C(pb+i,vb+i). (6)

To address the common problem of error accumulation inlong-term predictions, we utilize the three-stage trainingapproach that proved to be superior to random initializationaccording to [2]. Before attempting to predict the nextstate, the FCen layer is first initialized by training it asan AutoEncoder [30] by minimizing the L2 reconstructionerror for the current input. During this stage, only the currentstate is considered and the goal is to ensure the latentrepresentation has captured the dynamics of the task beforeit is asked to predict the future positions. Next, the rest of thenetwork, excluding the RNN layers, is trained for a single-step prediction into the future by minimizing the L2 norm forthe immediate next step. Finally, the weights of that modelare loaded to the full network that provides the positions forthe entirety of the prediction horizon. The lack of a torqueinterface does not allow us to record control forces duringdata collection. However, we assume that controller in Eq.(1) is capable of tracking the desired force immediately. Inconsequence, for training purposes we can use the sensorforces as ub, in order to enable the network to directlyencapsulate the effect of measured forces on the next states,instead of solely relying on their latent representation.

Fig. 4: Training set consisting of fruits, vegetables and a cake. Theseobjects offer a wide range of interactions during cutting, spanning fromhomogeneous, easy to cut cases (cake, cucumber) that do not offer a lot ofresistance and do not display high friction in the lateral axis of motion, tostiff objects with viscous properties (zucchinis, cheese) and finally, to objectsdisplaying high variation in their dynamics and stiffness (bell peppers, redpeppers, lemons, unpeeled bananas).

B. Dataset Collection

In order to collect data, we employ the controller describedin Eq. (3) as seen in Fig. 1a. Our training set consisted ofcutting trials on fruits, vegetables, cheese and cake, see Fig. 4and had a total of 513587 data-points.

For every trial, a 5th order polynomial trajectory thatsimulated the cutting motion was commanded as pd, pd andthe stiffness and compliance gains were set manually by theoperator to cover a wide range of interaction modes for everyobject. Force data were recorded from the sensor placed onthe wrist and Cartesian positions were acquired through therobot’s forward kinematics. The prescribed motion consistedof a periodic triangular trajectory for the sawing axis and asteady descent for the cutting one. The sawing range waskept constant for every trial but we changed the number ofrepetitions for thicker objects to ensure a complete cut.

During data collection, we used clamps to stabilize theobjects on the table. Although our robot is bi-manual, im-mobilizing the object with a set-point stiffness controller onthe second arm would make that arm part of the dynamicalsystem through the restoring forces it exerts. This wouldintroduce additional dynamics due to the relative motionbetween the object and the knife, which can either enableor impede the cutting itself.

C. Online Model Predictive Control

During online deployment, we follow the process shownin Fig.1b. In that setting, we first allow a small motion drivenby the controller that was used for data collection in orderto make contact with the object and initialize the system.

Solving Eq. (4) for non-linear systems is not trivial even inthe case of quadratic costs as the time the optimizer requiresto converge might not allow for real-time control. Instead, weemploy a shooting method [26], [29], [31] and demonstratethat it can achieve effective cutting rates for different objectclasses. For every iteration, shooting methods generate Kpotential control inputs that act as the feasible forces for thisoptimization round. The solution associated with the lowestcost is then chosen as F∗

r . When the state space increases,

Fig. 5: Cutting rates obtained with a fixed trajectory controller and thismethod

shooting methods might be slower than the optimizationapproach but in our case, this method is preferable as it ismuch simpler and allows to implicitly set input constraintsby modifying the distributions that we sample inputs from.

After the initial motion, for every MPC iteration, thenumber of feasible actions for the shooting method is gen-erated by sampling from a uniform distribution that limitsthe amplitude of the generated forces to 8N. This range waschosen heuristically to be appropriate both for the robot’slimits and for cutting most objects. To make the solutioncomputationally more tractable, every time we generate anaction, we assume that it will stay constant during the predic-tion horizon. The network is queried to predict the next statesresulting from these actions, giving us the next Cartesianpositions for which the accumulated cost is calculated andthe best one is chosen as the next control input. The costfunction used comprised two basic terms that encouraged thesawing motion around the center point pcenter of the sawingrange while moving downwards until the knife reached thetable’s height ptable, a terminal cost and the norm of thecontrol inputs, namely for every Hb-block horizon the costwas given by

C(p,v) = ccut

Hb∑k=1

(pzk − ptable)

2+ pz

Hb(7)

+ csaw

Hb∑k=1

(pyk − pcenter)

2+ cv

Hb∑k=1

‖vk‖2

where ccut, csaw are positive constants weighting the con-tribution of the costs associated with cutting and sawingactions respectively to the total cost while cu is the weightingconstant for the control input quadratic term.

V. EVALUATION

In the experiments, we used a YuMi-IRB 14000 collabo-rative robot manufactured by ABB with an OptoForce 6axisForce/Torque sensor mounted on its wrist. The MPC operatedat a rate of 10 Hz.

A. Cutting Rate

To evaluate the proposed method, we conducted over ahundred experiments where the goal was to cut throughthe object completely. We deliberately included objects that

(a) MPC

(b) Fixed trajectory

Fig. 6: Performance while cutting a potato. The fixed trajectory controllerfails to overcome friction in order to complete completely cut through theobject and stop at a height of approx 0.185 cm.

offered a wide range of interactions during cutting, spanningfrom homogeneous easy to cut cases (cake, cucumber) thatdo not offer a lot of resistance and do not display highfriction in the lateral axis of motion, to stiff objects withviscous properties (zucchinis, cheese) and finally, to objectsdisplaying high variation in their dynamics and stiffness(bell peppers, red peppers, lemons, unpeeled bananas). Allof these types had been encountered in training, but weadditionally included the difficult case of an unseen object,i.e. potatoes, to investigate the system’s behavior underunknown dynamics. For every object class we conducted 5trials.

As a proof of concept, the performance was evaluated,similarly to [2], in terms of the cutting rate achieved by theproposed method as compared to the ones from the standardtrajectory-tracking controller we used for data collection. Tomake the comparison fair, we tuned the baseline controllerfor each class separately by modifying both the stiffness andthe trajectory followed to increase the cutting rate as much aspossible for every object type. From the experimental resultsseen in Fig. 5 it is evident that our method outperforms thebaseline, except in the case of cheese where the smoothermotions produced by the fixed trajectory controller weremore effective than their aggressive MPC counterparts tobreak friction. The difference in rates achieved is morepalpable for the red peppers and the potatoes. In the formercase, the object at hand is not homogeneous with consistentdensity so cutting through the skin and moving to the hollowinterior directly alleviates the resistance, thus allowing for

ab

cd

e

a) b) c) d) e)

Fig. 7: Evaluation at force-critical points. The letters correspond to the respective cutting instances denoted in the figure.

an immediate downward motion. Contrary, in the latter case,the MPC could overcome the resistance induced by the knifebeing fully embedded in the flesh of the potato by switchingto more high-frequency sawing motions (Fig. 6a), as opposedto the fixed trajectory controller (Fig. 6b).

B. Force-Critical Points

The ultimate goal in this work is to develop a flexiblemanipulation scheme that is able to adjust its behavior andrespect its limitations in order to successfully complete theobjective. To accomplish this, it is necessary to come upwith an evaluation criterion that encompasses both of theseaspects and is indicative of the behavior we are seeking. Forinstance, although the dynamics model’s prediction accuracyis an intuitive measure to this end, a small error is not enoughto reflect the network’s ability to discern the dynamic be-havior. Even with the normalization of the input vectors, thekinematic features have a more easily discernible structurewhen compared to the noisy, instant-dependent forces, whichcan lead the network to concentrate more on them. This canprove problematic in force-critical cases as simply movingtowards the goal when progress is halted by a very stiffmaterial, can potentially lead to failure due to joint torquelimits.

Additionally, despite that cutting rate is also an intuitivemeasure for this task, its applicability and expressivenessare strongly correlated to the robot’s potential at exertingforces and the controller used as baseline. Arguably, it ispossible to design control laws that maximize the cuttingrate, albeit with tuning between executions, thus confirmingour first goal. However, consider the extreme case of a strongindustrial manipulator. A far faster cutting rate could be

achieved simply by using a position controller since theresistance from the object would never exceed the arm’spotential but in our setting, this is not relevant primarily dueto the hardware at hand but mostly as this criterion fails todepict the aforementioned desired behavior.

To this end, we focus at force critical points where thedynamics are imperative and would lead to a failed cut.To evaluate this, we chose to use carrots as they werenot encountered during training and their core is almostimpossible to cut through with our robotic setting. As seenin Fig. 7, the normal cutting strategy fails the moment theknife hits the thicker center part of the carrot so it retreatsuntil the tip is no longer in contact with that part. Thecorresponding figures in Fig. 7 show the controller’s outputsand the resulting positions during the same test. From t = 0 suntil t = 4.5 s we observe an intense sawing motion tobreak the friction which enables the downwards motion untilapproximately t = 6 s where the knife reaches the stiff part,the controller tries to lift it to re-apply pressure while onlycommanding positive forces on Y that pull the knife awayfrom the force-critical point in order to minimize the cost.

VI. CONCLUSIONS AND FUTURE WORK

In this article, we presented a data-driven MPC ap-proach for the challenging task of food-cutting. We reformu-lated an older work by incorporating measurements from aforce/torque sensor and restated the control problem. Insteadof solely considering previously encountered objects duringevaluation, we demonstrated that, with the same model, thismethod can generalize to unseen objects and treat differ-ent object types by adjusting its behavior. We investigatedthe system’s failure modes by introducing an object that

warranted an unsuccessful cut and observed the resultingtrajectory. Although the outcome is a sensible solution, thedirect effect that the learned dynamics model has on thisbehavior is unclear and will be further investigated in futurework.

Robotic food cutting is however only one example ofnumerous possible applications for this type of controllers.Integrating a complex, non-linear data-driven model intoMPC, can potentially allow for elegant solutions that areable to adapt without the need for fine-tuning. This wouldadd significant flexibility in difficult manipulation scenariossuch as ones involving deformable objects where the contactdynamics are also challenging to model.

An interesting addition to our work, would be to incor-porate the second arm as an explicit part of the system andexplore the dynamics of the coupled motion. By activelycontrolling the second arm, we can induce more motionwhen necessary to move the knife faster out of areas withconsiderable friction in the lateral axis, or stabilize theobject better to accelerate the motion. However, by doingso, the control task becomes even more complicated andthe state space will increase accordingly. As a consequence,shooting will be insufficient and there will be need for a moresophisticated solution such as the one presented in [32], thatoffers guarantees of global optimality.

REFERENCES

[1] D. Q. Mayne and H. Michalska, “Receding horizon control of nonlin-ear systems,” IEEE Transactions on Automatic Control, vol. 35, no. 7,pp. 814–824, July 1990.

[2] I. Lenz, R. A. Knepper, and A. Saxena, “Deepmpc: Learning deeplatent features for model predictive control.” in Robotics: Science andSystems, 2015.

[3] N. Hogan, “Impedance control: An approach to manipulation,” in 1984American Control Conference, June 1984, pp. 304–313.

[4] B. Siciliano and L. Villani, Robot Force Control, 1st ed. Norwell,MA, USA: Kluwer Academic Publishers, 2000.

[5] J. D. Schutter and H. V. Brussel, “Compliant robot motion i. aformalism for specifying compliant motion tasks,” The InternationalJournal of Robotics Research, vol. 7, no. 4, pp. 3–17, 1988. [Online].Available: https://doi.org/10.1177/027836498800700401

[6] Y. Karayiannidis and Z. Doulgeri, “An adaptive law for slope iden-tification and force position regulation using motion variables,” vol.2006, 06 2006, pp. 3538 – 3543.

[7] D. Zhang and B. Wei, “A review on model reference adaptive controlof robotic manipulators,” Annual Reviews in Control, vol. 43, pp.188–198, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1367578816301110

[8] Y. Karayiannidis, C. Smith, F. E. V. Barrientos, P. Ogren, andD. Kragic, “An adaptive control approach for opening doors anddrawers under uncertainties,” IEEE Transactions on Robotics, vol. 32,no. 1, pp. 161–175, Feb 2016.

[9] S. Chiaverini and L. Sciavicco, “The parallel approach toforce/position control of robotic manipulators,” IEEE Transactions onRobotics and Automation, vol. 9, no. 4, pp. 361–373, Aug 1993.

[10] M. H. Raibert and J. J. Craig, “Hybrid position/force control ofmanipulator,” Journal of Dynamic Systems Measurement and Control,vol. 103, 12 1980.

[11] V. Mehrmann, “The autonomous linear quadratic control problem :Theory and numerical solution,” vol. 163, 01 1991.

[12] K. Kronander and A. Billard, “Learning compliant manipulationthrough kinesthetic and tactile human-robot interaction,” Haptics,IEEE Transactions on, vol. 7, pp. 367–380, 07 2014.

[13] M. Li, H. Yin, K. Tahara, and A. Billard, “Learning object-levelimpedance control for robust grasping and dexterous manipulation,”in 2014 IEEE International Conference on Robotics and Automation(ICRA), May 2014, pp. 6784–6791.

[14] A. X. Lee, H. Lu, A. Gupta, S. Levine, and P. Abbeel, “Learningforce-based manipulation of deformable objects from multiple demon-strations,” in 2015 IEEE International Conference on Robotics andAutomation (ICRA), May 2015, pp. 177–184.

[15] S. Manschitz, M. Gienger, J. Kober, and J. Peters, “Mixture ofattractors: A novel movement primitive representation for learningmotor skills from demonstrations,” IEEE Robotics and AutomationLetters, vol. 3, no. 2, pp. 926–933, April 2018.

[16] B. Akgun and A. Thomaz, “Simultaneously learning actions and goalsfrom demonstration,” Auton. Robots, vol. 40, no. 2, pp. 211–227, Feb.2016. [Online]. Available: https://doi.org/10.1007/s10514-015-9448-x

[17] B. Huang, M. Li, R. L. De Souza, J. J. Bryson, and A. Billard, “Amodular approach to learning manipulation strategies from humandemonstration,” Autonomous Robots, vol. 40, no. 5, pp. 903–927, Jun2016. [Online]. Available: https://doi.org/10.1007/s10514-015-9501-9

[18] T. Tang, H.-C. Lin, and M. Tomizuka, “A learning-based frameworkfor robot peg-hole-insertion,” in ASME 2015 Dynamic Systems andControl Conference, 10 2015, p. V002T27A002.

[19] D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang,D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke, andS. Levine, “Scalable deep reinforcement learning for vision-basedrobotic manipulation,” in Proceedings of The 2nd Conference onRobot Learning, ser. Proceedings of Machine Learning Research,A. Billard, A. Dragan, J. Peters, and J. Morimoto, Eds., vol. 87.PMLR, 29–31 Oct 2018, pp. 651–673. [Online]. Available:http://proceedings.mlr.press/v87/kalashnikov18a.html

[20] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-endtraining of deep visuomotor policies,” J. Mach. Learn. Res.,vol. 17, no. 1, pp. 1334–1373, Jan. 2016. [Online]. Available:http://dl.acm.org/citation.cfm?id=2946645.2946684

[21] S. Levine and V. Koltun, “Learning complex neural network policieswith trajectory optimization,” vol. 3, 06 2014.

[22] J. Boedecker, J. T. Springenberg, J. Wlfing, and M. Riedmiller,“Approximate real-time optimal control based on sparse gaussianprocess models,” in 2014 IEEE Symposium on Adaptive DynamicProgramming and Reinforcement Learning (ADPRL), Dec 2014, pp.1–8.

[23] R. Lioutikov, A. Paraschos, J. Peters, and G. Neumann, “Sample-based informationl-theoretic stochastic optimal control,” in 2014 IEEEInternational Conference on Robotics and Automation (ICRA), May2014, pp. 3896–3902.

[24] S. Levine and P. Abbeel, “Learning neural network policieswith guided policy search under unknown dynamics,” inAdvances in Neural Information Processing Systems 27,Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014,pp. 1071–1079. [Online]. Available: http://papers.nips.cc/paper/5444-learning-neural-network-policies-with-guided-policy-search-under-unknown-dynamics.pdf

[25] M. Deisenroth and C. Edward Rasmussen, “Pilco: A model-based anddata-efficient approach to policy search.” 01 2011, pp. 465–472.

[26] A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural networkdynamics for model-based deep reinforcement learning with model-free fine-tuning,” arXiv preprint arXiv:1708.02596, 2017.

[27] G. Williams, N. Wagener, B. Goldfain, P. Drews, J. M. Rehg, B. Boots,and E. A. Theodorou, “Information theoretic MPC for model-basedreinforcement learning,” Proceedings - IEEE International Conferenceon Robotics and Automation, pp. 1714–1721, 2017.

[28] T. Baumeister, S. L. Brunton, and J. N. Kutz, “Deep Learning andModel Predictive Control for Self-Tuning Mode-Locked Lasers,” pp.1–9, 2017.

[29] Z. Erickson, H. M. Clever, G. Turk, C. K. Liu, and C. C. Kemp,“Deep haptic model predictive control for robot-assisted dressing,”arXiv preprint arXiv:1709.09735, 2017.

[30] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in Advances in Neural InformationProcessing Systems 19, B. Scholkopf, J. C. Platt, and T. Hoffman, Eds.MIT Press, 2007, pp. 153–160. [Online]. Available: http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf

[31] A. Rao, “A survey of numerical methods for optimal control,” Ad-vances in the Astronautical Sciences, vol. 135, 01 2010.

[32] Y. Chen, Y. Shi, and B. Zhang, “Optimal Control Via NeuralNetworks: A Convex Approach,” pp. 1–26, 2018. [Online]. Available:http://arxiv.org/abs/1805.11835

https://doi.org/10.1177/027836498800700401

http://www.sciencedirect.com/science/article/pii/S1367578816301110

http://www.sciencedirect.com/science/article/pii/S1367578816301110

https://doi.org/10.1007/s10514-015-9448-x

https://doi.org/10.1007/s10514-015-9501-9

http://proceedings.mlr.press/v87/kalashnikov18a.html

http://dl.acm.org/citation.cfm?id=2946645.2946684

http://papers.nips.cc/paper/5444-learning-neural-network-policies-with-guided-policy-search-under-unknown-dynamics.pdf



http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf

http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf

http://arxiv.org/abs/1805.11835

Documents

Data-Driven Model Predictive Control for the Contact …Data-Driven Model Predictive Control for the Contact-Rich Task of Food Cutting Ioanna Mitsioni 1, Yiannis Karayiannidis2, Johannes