Generative adversarial networks as integrated forward and inverse …1169168/... · 2017-12-22 · could, in the future, be used to test ideas proposed by the theories of human motor

IN THE FIELD OF TECHNOLOGYDEGREE PROJECT ENGINEERING PHYSICSAND THE MAIN FIELD OF STUDYCOMPUTER SCIENCE AND ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2017

Generative adversarial networks as integrated forward and inverse model for motor control

MOVITZ LENNINGER

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Generative adversarialnetworks as integratedforward and inverse model formotor control

MOVITZ LENNINGER

Master in Machine LearningDate: December 22, 2017Supervisor: Hansol Choi (University of Freiburg),Jeanette Hellgren Kotaleski (KTH)Examiner: Erik FransénSwedish title: Generativa konkurrerande nätverk som integreradframåtriktad och invers modell för rörelsekontrollSchool of Computer Science and Communication

Abstract

Internal models are believed to be crucial components in human motor control. It hasbeen suggested that the central nervous system (CNS) uses forward and inverse mod-els as internal representations of the motor systems. However, it is still unclear howthe CNS implements the high-dimensional control of our movements. In this project,generative adversarial networks (GAN) are studied as a generative model of move-ment data. It is shown that, for a relatively small number of effectors, it is possibleto train a GAN which produces new movement samples that are plausible given asimulator environment. It is believed that these models can be extended to generatehigh-dimensional movement data. Furthermore, this project investigates the possi-bility to use a trained GAN as an integrated forward and inverse model for motorcontrol.

iii

Sammanfattning

Interna modeller tros vara en viktig del av mänsklig rörelsekontroll. Det har föreslagitsatt det centrala nervsystemet (CNS) använder sig av framåtriktade modeller och inver-sa modeller för intern representation av motorsystemen. Dock är det fortfarande okänthur det centrala nervsystemet implementerar denna högdimensionella kontroll. Dettaexamensarbete undersöker användningen av generativa konkurrerande nätverk somgenerativ modell av rörelsedata. Experiment visar att dessa nätverk kan tränas till attgenerera ny rörelsedata av en tvådelad arm och att den genererade datan efterliknarträningsdatan. Vi tror att nätverken även kan modellera mer högdimensionell rörel-sedata. I projektet undersöks även användningen av dessa nätverk som en integreradframåtriktad och invers modell.

iv

Contents

1 Introduction 11.1 Introduction of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Ethical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Motor control 42.1 Coordination of movement . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Action and perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Motor control theory . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Predictive coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Generative models 133.1 Representation learning and Deep learning . . . . . . . . . . . . . . . . . 133.2 Generative models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.1 Popular generative models . . . . . . . . . . . . . . . . . . . . . . 153.3 Generative Adversarial Networks - GAN . . . . . . . . . . . . . . . . . . 16

3.3.1 Training GANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3.2 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.3 GAN - A special case of f -GAN . . . . . . . . . . . . . . . . . . . . 193.3.4 Wasserstein GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.5 Inference in latent space - Reconstructing missing elements . . . 24

4 Related work 27

5 Method 285.1 Simulator environment - a toy model . . . . . . . . . . . . . . . . . . . . . 28

5.1.1 Training data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Training GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.1 Wasserstein GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

v

5.2.2 Quantifying training progression . . . . . . . . . . . . . . . . . . . 325.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.3.1 Forward model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3.2 Inverse model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.3.3 Inverse model - Selecting minimal action . . . . . . . . . . . . . . 355.3.4 Exploring the latent space . . . . . . . . . . . . . . . . . . . . . . . 35

6 Experimental results 366.1 Training phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2 Forward model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.3 Inverse model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.4 Choosing generative model . . . . . . . . . . . . . . . . . . . . . . . . . . 446.5 Inverse model - Selecting minimal action . . . . . . . . . . . . . . . . . . 456.6 Exploring the latent space . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 Discussion and Conclusions 507.1 Discussion of experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 507.2 Discussion of implementation . . . . . . . . . . . . . . . . . . . . . . . . . 517.3 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527.4 Connection to Optimal control? . . . . . . . . . . . . . . . . . . . . . . . . 537.5 Future development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Bibliography 56

vi

Chapter 1

Introduction

The human body consists of many joints and limbs, which our motor systems haveto coordinate across both space and time. Thus, the problem of motor control is in-herently high-dimensional [9]. Suggestions have been raised that the brain uses di-mensionality reduction to obtain a low-dimensional control space, however the prin-ciple of such a dimensionality reduction is unknown [7, 58]. Additionally, the useof forward and inverse models in human motor control has been extensively studied[61, 42, 21]. Forward models are specialized at learning the causal dynamics of thebodily interaction with the external world in order to predict the future state of thebody. Inverse models, on the other hand, are the forward processes in reverse. Givena desired future state of the body, an inverse model produces motor commands whichguides the body towards the desired state. Although it is conjectured that the brainmakes use of forward and inverse models to handle the high-dimensional coordina-tion of the body, it is not known how the brain implements these models even at analgorithmic level.

In machine learning, generative adversarial networks (GANs) [29] was initially de-veloped as a high-dimensional image generation model. By providing a large amountof digital images, a generator network can be taught to generate new, original imagessimilar to those in the training set. In addition, studies have shown that these gen-erative networks can be used to reconstruct corrupted images. However, the GANframework is not limited to images and has been applied to other domains, such asastrophysics [45] and NLP [62], as well.

1

1.1 Introduction of the project

Motor control suffers from several problems, including the problem of degrees-of-freedom. Some of these problems could be solved by GAN. GAN has demonstratedthe ability to generate high-dimensional data and to perform inference, properties thatmay be related to the analysis-by-synthesis hypothesis. This master thesis aims to in-vestigate whether GANs can be trained to generate new, plausible movement data andsubsequently be used as an integrated forward and inverse model for motor control.In this proof-of-concept study, GANs were trained using movement data from a two-linked arm in a two-dimensional task space. The training data consisted of samplesof random movement, thus resembling motor babbling. Importantly, the training wasconducted without providing the networks with any prior information of the prob-lem at hand. The networks’ capacities as integrated forward and inverse models weretested by reconstructing masked (i.e. corrupted) movement samples. The forwardand inverse models were defined as reconstruction tasks where the networks had tore-create the full movement samples from partially masked samples. The forwardand inverse tasks were distinguished by masking different domains of the movementsamples.

1.2 Contribution

This project contributes to the field of computational neuroscience by suggesting anew approach to model the generation of movement. Although this project only dealswith relatively low-dimensional movement data generated from a toy environment,the method of this project could be extended to model both data of higher dimensionsand data generated from more complex environments. The hope is that this approachcould, in the future, be used to test ideas proposed by the theories of human motorcontrol, or to be compared with experiments conducted with human subjects.

From a machine learning perspective, the thesis demonstrates a new domain in whichthe GAN framework could be useful. Although the GAN framework has been mostintensely researched as an image generation tool, the number of possible areas of ap-plication is believed to be huge. In addition, high-dimensional control also poses aproblem in robotics, and the results of the project could be of equal interest for roboticapplications.

2

1.3 Delimitations

In this project, only small networks consisting of fully connected layers were trainedand tested. Many standard machine learning methods, such as batch normalizationetc., were not required. The intention was to keep the neural networks as simple aspossible as this was a proof-of-concept study. However, in future research, larger andmore sophisticated networks might be necessary if the task complexity is raised.

All training data were sampled from a toy environment with linear dynamics withoutany external forces. Of course, the movements of the human body is more complexbut it is left as further research to model more complex data.

1.4 Ethical considerations

The project was in its entirety carried out using computer experiments. The resultsof this project could potentially be of interest for robotic applications. Accurate high-dimensional control could enable robots to perform new tasks, aiding humans both inprivate life and via industry. Of course, robots could be designed to perform taskswhich are of morally dubious character. These are, naturally, difficult issues thatshould not be ignored. However, the benefits of robots and artificial intelligenceshould not be ignored either. In addition, a deeper understanding of high-dimensionalmotor control could aid computational neuroscience in understanding human motorcontrol. An understanding which could also benefit individuals and society at large.

1.5 Acknowledgment

I would like to thank everyone who have helped and supported me during this project.A special thank to: Hansol Choi and Carsten Mehring at University of Freiburg fortheir invaluable help and advice as supervisor and principle of this project, JeanetteHällgren Kotaleski and Erik Fransén for their work as supervisor and examiner atKTH, Sara Lenninger for proofreading the report, and Joschka Boedecker and ManuelWatter at the Machine Learning Lab at University of Freiburg for their helpful com-ments and interest.

3

Chapter 2

Motor control

In the field of Neuroscience: motor control, motor adaptation and action selectionhave long been areas of strong interest for researchers. The human body consists ofmany limbs, joints and muscles, collectively and interchangeably referred to as effec-tors. These effectors need to be coordinated to perform various tasks. Additionally,both the body and the surrounding environment are in constant change. Age and in-juries are examples of changes that affect any animal, and the environment changes invarious ways, too. Animals constantly need to adapt to such variations. Furthermore,movements should be a part of a controlled behavior. In a natural environment, ananimal is presented to various demands and opportunities for action. Sometimes, theavailable actions are mutually exclusive, and a decision has to be made. Thus, anotherfundamental issue in the study of motor control is that of action selection [15]. Allanimals have to adapt their actions to fit the environment in which they are situated.Any action requires the use of sensory information to acquire knowledge about spatialrelations between the surrounding objects and between the objects and self. For exam-ple, grasping a cup requires information about the spatial relations between the cupand the hand. Clearly, the spatial information related to the cup has to be meaningfulrelative to the animal’s body and hand.

In traditional cognitive science, the question of movement initialization was proposedto be resolved by a series of distinct cognitive processes [15]. First, perceptual systemsprocess incoming sensory information and represents the information through an in-ternal model of the external world. Thus, sensory information is used to update theinternal model of the current state of the animal, in relation to the environment. Then,this information is combined with some "desires" to decide the next action. Lastly,when an action has been selected, motor commands are generated to guide the subse-quent movements, see Fig 2.1. In more recently proposed cognitive theories, however,this serialized pattern has come into question. Instead, these processes are believed to

4

be more mutually dependent, as will be discussed in 2.2.

Sensory input

Cortex

Motor output

Figure 2.1: Traditional view of movement initialization

2.1 Coordination of movement

Movement coordination requires multiple effectors to corporate to achieve a certaingoal. The coordination must occur at multiple levels, individual muscles as well aslarge groups of muscles need to work in synergy to create the desired movement[18]. Understanding the process of coordination has engaged researchers from variousfields, from sports to medicine [65]. However, despite the efforts, a uniform theory formovement control is still missing. The complexity of the problem partly arises fromthe vast number of effectors involved, even in basic movements. The dimensional-ity of the configuration space, i.e. the numbers of effectors, often greatly exceed thatof the task space, which is most often confined to the three spatial dimensions [18].Thus, there is a redundancy in the configuration space, and any single point-to-pointmovement in task space can be achieved by several different trajectories through theconfiguration space.

The method to investigate and model movement coordination varies. When inves-tigating cortical control of arm movements, studies have either primarily focused oninferring single neuron responses as a function of movement parameters or on in-ferring movement responses as function of neuron population activity [54]. That is,

5

either describing neural activity from a set movement parameters or describing mus-cle activations from a set of neural activities. The former focus is motivated by the factthat certain movement parameters, such as direction, distance and target position, arecorrelated with the discharge of individual neurons [24, 31]. Thus, it is assumed thatneural activities mostly encode visiospatial movement parameters. Therefore, thesestudies try to find a set of movement parameters and relations, which can be used topredict the firing of individual neurons. Thus, they commonly try to derive a relation-ship on the form

rn(t) = fn(parameter1(t), parameter2(t), ...)

where rn(t) is the firing of the n:th neuron. However, this type of study does notgive any answers to how the cortex actually generates the motor output. In contrast,the later type of study instead tries to model muscle activities, m(t), as a result of adynamical system of neuron population activities, r(t). In this view, the motor cortexis a dynamical system which controls and generates movement [13]. Such a systemcan be described by

m(t) = G[r(t)], (2.1)

where r(t) is subject toτ r(t) = h(r(t)) + u(t). (2.2)

Here,G(.) incorporates all processing steps until the neural spikes finally drive muscleactivations, τ is a scaling constant, h(.) is some unknown function and u(t) representsexternal inputs [54]. By adopting this view, the firing of single neurons follows thedynamics of the population and the correlation with movement parameters is not nec-essarily regarded as a representative code [14].

A completely different approach is taken by optimal control theory, which aims to de-scribe motions without explicitly modeling the neural activity. Instead, the idea is thatevery movement can be described and understood as the result of an optimizationprocess [18]. Furthermore, it is the movement itself that is being optimized - not someinternal neurological state. Optimal control theory is not only applicable to studies ofcoordination control among animals, but have a wide range of applications in math-ematics, engineering, economics etc [55]. In general, a first-order, non-linear optimalcontrol problem can be posed as finding the solution which minimizes some cost func-tion,

J = Θ(x(tf ), tf ) +

∫ tf

t0

L(x, u, t)dt (2.3)

where x ∈ Rn, u ∈ Rm, Θ : Rn×R1 → R1 and L : Rn×Rm×R1 → R1. The minimizationis often constrained by a dynamic system of equations as

x = f(x, u, t), x(t0) = x0 (2.4)

6

where f : Rn × Rm × R1 → Rn [55]. Mathematically, minimizing Eq. 2.3 while subjectto Eq. 2.4 requires the use of variational calculus.

To model movement coordination using optimal control theory, every trajectory is as-signed a value from a cost function, J . The cost function encodes an external goal ofthe movement; it can also incorporate a penalizing term to limit the effects of somefeature of movement. Of course, the performance of this model depends on how wellthe cost function captures the true behavior. In addition to the cost function, a differ-ential equation is also formulated, which formalizes the relation between the systemoutputs (e.g. end-positions, joint angles etc) and the system inputs (e.g. neural activ-ity or muscle activations). In early optimal control studies, the integral of the "jerk" -i.e. the change of acceleration - over time was used as cost measure [18, 32]. This costfunction was motivated by it resulting in smooth movements. However, in more re-cent developments, the integrated jerk has been replaced by the sum of squared motorcommands.

Optimal control constitutes a shift of focus - from a neurological basis of movementto the movement itself. In these models, coordination depend on the task at handand on the body, but focus less on the internal Central Nervous System (CNS) [18].An optimal control policy can be incorporated into and utilized by an internal model,which will be discussed later.

2.2 Action and perception

There has long been an interest in the link between action and perception. Arguably,it has been of interest since the 19th century and the ideomotor principle [11]. In recentyears, the enactive view, that perception and action are interconnected, has becomemore prominent. In the enactive view, perception acts as a guide for action. Further-more, motor ability is believed to be crucial for our ability to learn how to interpretsensory sensation into the form of representation we call perception [47]. Sensory sen-sation, by itself, is arguably not enough to cause proper perception, i.e. a meaningfulrepresentation of the external world. Instead, perception requires sensation in com-bination with sensorimotor knowledge. Talking about how sensorimotor knowledgeinfluences perception, Noë writes: "Our experiences of the roundness of a plate de-pends on our mastery of the plates sensorimotor profile, i.e. how the looks vary aswe move around it or rotate the plate" (Noë, 2004, pg. 85). Thus, Noë [47] argues thatit is the relation between sensory sensation and sensorimotor knowledge that causesus, for instance, to perceive a plate as round although, from our perspective, the plateshould be perceived as elliptic.

7

In the past three decades, the enactive view has been intensively discussed. It has nowcome to incorporate not only action and perception, but also other cognitive functionsand abilities. It is not a homogeneous view, and a variety of different theories hasbeen proposed, with many accounts partly overlapping each other [25]. The accountsoriginate in different disciplines and often aim to describe different mechanisms in thebrain. Crudely, however, these accounts can be classified into 3 broad "families": (1)Common Coding, (2) Internal Models and (3) Simulation Theory. Although the theo-ries operate at different levels of the cognitive hierarchy and aim to explain differentcognitive functions; they all share some underlying idea of a common representationof perception, action and cognition. These "families" can, of course, be further subdi-vided into more specific theories. This project aims at investigating the possible use ofinternal models in motor control, and therefore, predominantly, the focus will be putupon these accounts. Naturally, there are many ideas of how internal models functionand how they could be implemented in the CNS. In the context of this project, it ismost natural to focus on two accounts: motor control theory and predictive coding.

2.2.1 Motor control theory

Motor control theory is grounded in studies of human posture control and movementtrajectory predictions [25]. In early 1990s, late 1980s, there was a growing numberof studies implying that the CNS might implement internal models for motor controlstrategies [61, 42, 21]. The theory relies on two separate systems for action and percep-tion, both of which are dependent on the motor system and the current motor signals.According to this view, the forward model represents the causal relation between motorsignals and resulting sensory input (both proprioceptive and exterioceptive). It doesso by learning the causal dynamics of the interaction between the body’s motor sys-tem and the environment [42]. The forward model does not itself engage in any motoractivity, but receives a copy of the motor command, an efference copy, and produce anestimation of sensory feedback, a corollary discharge. The forward model thus assumesthat there are two separate systems, both using the motor command as input: the ac-tual motor plant (carrying out the movement) and the forward model (estimating theresulting state from that motor command). Thus, both systems share the same inputbut produce separate outputs. To avoid later confusions, this type of forward modelwill be referred to as auxiliary forward model when there can be mistakes. In the task ofarm control, the input to the auxiliary forward model would be the current state of thebody (e.g. angular positions and velocities) and a motor command. The output of themodel is the predicted future state of the body, given the chosen motor command. Theforward model can therefore be seen as a transition mapping between states. Due tothe redundancy of the configuration space (see discussion above), this mapping is notnecessarily a one-to-one mapping but could potentially be a many-to-one mapping.

8

That is, the same end-state can be reached through a multitude of states trajectories.

The forward model gives an account of how the CNS could predict future states ofthe body. However, the forward model does not generate the motor commands it-self. Instead, the theory also predicts the existence of a distinct inverse model, whichinverts the process of the forward model. The inverse model uses the informationabout the current state of the body together with a desired future state to generateappropriate motor commands. The inverse model thus defines the control policy ofthe system. Optimal control theory (see Sec 2.1) proposes that the control policy is asolution to a minimization of some task-dependent cost function J [18]. Note, againdue to the redundancy of the configurations space, the solution to the problem mightnot be unique but a one-to-many mapping. The same state can be reached by a man-ifold of paths through the configuration space. The system can also be augmentedwith a feedback loop, integrating the predicted future state, the corollary discharge,with temporally delayed sensory feedback. A schematic illustration of a possible im-plementation of a motor control system using separate plant and forward model, i.e.an auxiliary forward model, can be seen in Fig. 2.2a.

2.2.2 Predictive coding

Already during the 1950s, the analysis-by-synthesis theory [40, 11] depicted a process inwhich sensory information is constantly integrated with current knowledge to makeinference about the cause of sensation. Originating from the field of psychology, theanalysis-by-synthesis idea suggests that the brain does not form its understanding bythe sole accumulation of sensory data but rather through the continuous combinationof matching bottom-up processing of the incoming sensory information with high-level, abstract models of the causes of the sensory input. Thus, the brain predicts andinterprets the sensory input based on the currently most suitable high-level model.Low-level sensory information is often noisy, ambiguous and complex, thus interpret-ing the information on the basis of a high-level understanding of the cause of the inputwould ease the computational load. Thus, the analysis-by-synthesis strategy has beensuggested as an efficient mechanism for inference and has been applied to, for exam-ple, computer vision tasks [64].

Predictive coding theories are rooted in the theory about perception. However, theyhave later been extended to incorporate motor systems through the idea of action-oriented predictive coding. Predictive coding has in the recent years become one of themost influential ideas about action and perception [11]. It is believed to be a funda-mental process which drives the cognitive domains. In this view, perception is itselfa form of prediction. The core function of perceptual systems is to match the incom-

9

Inverse Model

Forward Model

Motor Plant

Efference copy

SensorySystem

Corollary dischargey’

Motor commandu

Motion trajectory

Prediction error𝜖

Desired goal state

Sensory information

y

Sensory Integration

(a) The auxiliary forward model. Requires separate inverse and forward models.

Motor Plant Responses

Prediction errors𝜖

Desired goal state Integral Forward

Model

Descending predictions

Proprioceptive predictions – ‘motor commands’Exteroceptive predictions – ‘corollary discharge’

(b) Integral forward model. No separation of inverse and forward model required.

Figure 2.2

10

ing sensory inputs with top-down predictions [16]. Predictive processing refers toany cognitive process which requires not only predictions about the future state of thesystem, but also about the current state. This information about the states has to begenerated by the system itself, thus requiring a generative model. With generative mod-els generating predictions of some current and future state, the brain’s core functionis to minimize the prediction error, or surprisal, caused by the mismatch between theprediction and actual sensory activation.

The brain is, to some extent, hierarchically and multi-relationally organized [49]. Thepredictive coding theories detail top-down predictions, where higher level systemspredict the activation of lower level systems. This type of prediction process is thenrepeated and iterated, reaching lower and lower level systems. For a generative modelto make accurate predictions, it must capture some informative statistical structure ofthe external world. Thus, the brain has made a crucial assumption about the regular-ities of events. That is, certain events are connected by deterministic or probabilisticrelations. These regularities allow the brain to combine prior knowledge with currentinformation to predict some future events [11]. It is believed that this utilization of ex-pectation is what allows the brain to construct a coherent and stable representation ofthe external world, despite the ever continual presence of noisy and incomplete data.Thus, inferring these relations is a crucial task for a generative model. The core idea isthat an accurate model yields accurate predictions. The predictions are "virtual" sen-sory sensations and are formed by the generative models understanding of the causalinteraction with the environment. The goal of the generative model is to, as accuratelyas possible, predict and thus "explain away" incoming sensory information. Only thesurprisal is then needed to be propagated forward through the systems. This has beenshown, for instance in computer science, to be an effective encoding and data com-pression strategy [16].

The predictive coding theories can be augmented with an "action-oriented predic-tive processing" model, where the process also encompasses the generation of motorcommands. Predictions about future proprioception sensations, caused by intendedmovement, give rise to error signals which propagate top-down and are subsequentlyminimized by actual movement. In this account, sometimes referred to as active infer-ence, the predictions of future proprioceptive states function as the motor commandsdriving our movements. Movements are then caused to fit our predictions concern-ing the sensations those very movements are expected to bring. Thus, it is the ideaof movement that causes movement [50]. In contrast to the theory of auxiliary for-ward models, there are no separate systems for implementing forward and inversemodels. Instead, both the prediction of future states and action control are the resultof multi-level predictions, see Fig. 2.2b. This type of motor control implementationcan be referred to as an Integral forward model. In this perspective, perception and ac-

11

tion selection is driven using the same core logic. However, whereas in the perceptionmodels the brain changes internal states to match the incoming sensory information,in the action-oriented models the brain can also elicit motor commands which activelychanges the incoming sensory information to match the predictions [16]. One exam-ple of a predictive coding theory is the free-energy formulation [23, 22]. According to thefree-energy formulation, the fundamental process of the brain is to reduce "the freeenergy". In this active inference formulation, free energy is a measure of how well themodel represents the actual world. The formulation depicts a process that explainshow an agent can maintain a stable internal organization and representation of theworld despite the disorganizing effects of the second law of thermodynamics [50].

12

Chapter 3

Generative models

3.1 Representation learning and Deep learning

Representation, the structuring and systematization of data, is a crucial task for anyintelligent system. In complex environments, observations are often caused by manyindependent explanatory factors. Therefore, any (intelligent) interaction with the envi-ronment requires any situated agent to sufficiently separate the different explanatoryfactors. In addition, it is often assumed that these factors are hierarchically organized.In other words, abstract concepts can be described by, or broken down to, other lessabstract concepts. Thus, more abstract concepts are described higher up in the hier-archy and are defined by the composition of lower level concepts. The performanceof modern machine learning algorithms strongly depends on the capability of find-ing a suitable representation of available data [5]. It is conjectured that the successof these models depends on their ability of finding a representation which allows forthe different factors of variation to be disentangled. For instance, both deep learningapproaches and Bayesian approaches to computer vision exploit the fact that complexand abstract objects, such as faces, can be (iteratively) decomposed into collections ofsimpler objects and even into collections of shapes [35, 57].

Another important assumption about most real-world data is that the data’s proba-bility mass resides in a low dimensional manifoldM. Thus, the dimensionality ofM,dM, is much smaller than that of the original space, Rdx , in which the data are mea-sured (quantized) and represented [5]. This is referred to as the manifold hypothesis. Inunsupervised learning, the primary task can be seen as learning an efficient represen-tation of the structure of the manifold which supports the observed data. By focusingon this manifold, it is possible to reduce the model complexity yet still capture thecomplexity of the variations of the data. The representation can be identified as an "in-

13

trinsic" coordinate system of the manifold. A well-known example of such a methodis Principle Component Analysis (PCA) [10].

In deep learning, the model itself is hierarchically organized through the compositionof layers of non-linear transformations stacked upon each other, thus forming a deepneural network [28]. The structure of the architecture (e.g. the number of layers) plays akey role in the function of the neural network. Apart from defining the capacity of thenetwork, the composition of layers potentially leads to a hierarchical representationof the data. Lastly, it is common to assume that, in a good representation, the differ-ent factors are in simple, linear relations to each other. This is assumed in most deeplearning architectures when a linear predictor is added as the last layer of the network.

3.2 Generative models

Often in machine learning, the crucial task is to predict some target values given anobservation. In other words, given observation x (a vector of input features) the taskis to predict some target vector c [6]. From a probability perspective, this impliesmodeling the conditional probability distribution p(c|x) (i.e. the probability of c beingthe target vector for input x). Frequently, such a model is obtained by fitting a set ofmodel parameters, θmodel, from a set of data pairs {ci, xi}. If c is continuous, the taskis referred to as a regression task - if c is discrete, then it is referred to as a classificationtask. However, it is not necessary to restrict ourself to only model the conditional dis-tribution p(c|x). Instead, it is possible to model the joint distribution of the variables,p(c, x), or simply p(x) if no distinction between target and input is necessary. Thesetypes of models are referred to as generative models. A generative model is also capableof synthesizing new data points, xg, similar to the observations in the training set -thereby the name generative model. The task of a generative model is not restricted topredict some target vector. Instead, a generative model aims to infer, i.e. ’uncover’,the underlying probability distribution of some events. If these events are governedby a probability distribution P , then the goal of a generative model is to estimate theprobability distribution P as accurately as possible, see Fig. 3.1. Note that some gen-erative models can, too, be used for classification/regression tasks by evaluating theconditional probability p(c|x) given the model of p(c, x) and observation x.

14

Figure 3.1: Schematic illustration of the task of a generative model. The task of a gen-erative model is to estimate a distribution Q which minimizes the divergence, shownin red, between Q and the true distribution P . The ellipse indicates all possible esti-mations, "Qs", under the selected model.

In general, the true probability distribution P , which governs the observations, is un-known. However, using some set of observations, X, a generative model can estimateP . There is a wide variety of generative models, and they all have different properties.For example, some are modeling the probability distribution by explicit representa-tion - others not. For a comprehensive overview of different generative models usedin deep learning, see [28]. Many generative models, such as (deep) Boltzmann ma-chines or MLE, rely on a parametric family being priorly specified, which effectivelylimits the functional form of the estimated probability distribution. Furthermore, suchmodels often lack tractable likelihood functions and therefore require numerous ap-proximations [29]. Other models, sometimes called generative machines, do not explic-itly represent the likelihood function. By not expressing the likelihood explicitly, themodels are often less limited in their functional form and can more freely adapt tothe distribution of the observed data. Of course, in some applications, modeling thelikelihood explicitly is desirable. However, note that the generative machines are stillcapable of generating new data - thus implicitly representing the likelihood function.

3.2.1 Popular generative models

Three currently popular generative models are, among others, fully visible belief net-works (FVBN), variational autoencoders (VAE) and generative adversarial networks(GAN) [27]. These models are all used in current AI developments, however for thisproject only GANs will be considered and will therefore be extensively covered in thefollowing sections. In short, FVBN models the joint probability of x = {x1, ..., xT}

15

through the factorization of conditional probabilities

p(x) =T∏t=1

p(xt|x1, ..., xt−1) (3.1)

This type of model has been used to create DeepMind’s WaveNet, for example [59].However, the disadvantage of such a model is that the factorization can not be com-puted in parallel. Instead, factor p(xt|x1, ..., xt−1) has to be computed and xt sam-pled before starting on p(xt+1|x1, ..., xt) [27]. For high-dimensional data, this task willtake a long time. Thus, this type of model is not suitable for online control of high-dimensional movement data.

Another popular generative model is the variational autoencoder (VAE) [33]. Sim-plified, the VAE is an autoencoder with extra constraints put on the distribution ofthe latent space. As in autoencoders, the VAE consists of two networks: an inferencemodel (an encoder network) and a generative model (a decoder network). The infer-ence model learns to encode data samples to some hidden (i.e. latent) representation,whereas the generative model learns to reconstruct the data samples from their hidden(latent) representation. The distribution of the latent space is penalized if it deviatesfrom a prior distribution (the standard Normal distribution). The VAE is thus trainedusing the following loss function

L = −Ez∼qθ(z|x)[log pφ(x|z)] +KL(qθ(z|x)||p(z)), (3.2)

where qθ is the inference network with parameters θ and pφ is the generator networkwith parameters φ.

3.3 Generative Adversarial Networks - GAN

In 2014, Goodfellow et al. presented a novel framework for training generative mod-els [29]. The framework is called generative adversarial nets (networks), or GAN forshort. They proposed an adversarial training process, where two networks are simul-taneously trained. A generator network g is trained to capture the distribution of somedata set, X, and a discriminator network D is trained to distinguish the samples pro-duced by the generator from those belonging to the data set X. If it is assumed that thedistribution which gave rise to the observed data, Pdata, is continuous over some inputspace χ, then it also has a density function pdata. Through the GAN training process,the ultimate goal is to learn g a distribution pg which mimics pdata.

The generator g and discriminator D can be implemented as any neural networks.In the original paper, they were implemented using multilayer perceptrons (MLP). As

16

input to the generator g, a random noise vector z is sampled from a probability dis-tribution pz, defined over some low dimensional space Z . The generator networktherefore defines a function g(z) : Z → χ. Thus, the generator g also implicitly de-fines an probability distribution pg through the mapping of g(Z). Again, neither thedistribution pg nor pdata is explicitly available to us, but both can be approximated ifa large number of samples from each class are available. Therefore, the GAN frame-work builds upon the idea of generative machines discussed previously. The input tothe discriminator is either a sample from the data set or a sample produced by thegenerator. Given a sample, the discriminator estimates the probability of that samplecoming from the data set (i.e. being sampled from pdata). Thus, the discriminator de-fines a mapping D(x) : χ→ [0, 1].

3.3.1 Training GANs

In the training process, D is trained to minimize its mistakes when classifying a sampleas “real” or “fake”, and g is trained to “fool” D into making mistakes. This schemeresults in a minimax two-player game. The minimax game can be characterized bythe following value function V(g,D)

ming

maxD

V (g,D) = ming

maxD

Ex∼pdata(x)[logD(x)] + Ez∼pz(z)[log(1−D(g(z)))]. (3.3)

According to Eq. 3.3, the discriminator D should be trained optimally given every in-stance of generator g. However, directly optimizing Eq. 3.3 is not feasible and would,using finite datasets, result in overfitting D - as it might simply memorize the samplesin the training set. Instead, Goodfellow et al. suggested to alternate between trainingD and g. By optimizing D for k steps and then optimizing g once, the hope is to main-tain D close to the optimum solution while still providing meaningful gradients to g.The proposed algorithm thus becomes

17

Algorithm 1 GAN: original algorithm. Minibatch stochastic gradient descent trainingof the generative adversarial networks. Note, in original paper k= 1 for all experi-ments.

for number of training iterations dofor k steps do• Sample minibatch of m noise samples {z(1), ..., z(m)} from noise prior pz(z).• Sample minibatch of m data samples {x(1), ..., x(m)} from the data distribution pdata(x).• Update discriminator by ascending its stochastic gradient:

∇θd1

m

m∑i=1

[logD(x(i)) + log(1−D(g(z(i))))]

end for• Sample minibatch of m noise samples {z(1), ..., z(m)} from noise prior pg(z).• Update generator by descending its stochastic gradient:

∇θg1

m

m∑i=1

[log(1−D(g(z(i))))]

end for

Thus, in practice, training GAN according to Alg. 1 aims to obtain a Nash equilibrium ofa non-convex, two-player game with high-dimensional, continuous parameters [53].In this two-player game, each player is assigned its own cost function, J (D)(θ(D), θ(g))

for the discriminator and J (g)(θ(D), θ(g)) for the generator. A Nash equilibrium pointis a point (θ(D), θ(g)) such that J (D) is minimized w.r.t θ(D) and J (g) is minimized w.r.tθ(g). Hence, a Nash equilibrium is reached when both players has an optimum strat-egy given the other player’s strategy. Unfortunately, this equilibrium is difficult toreach using backpropagation. Because, updating θ(g) to reach closer to the optimum ofJ (g)(θ(D), θ(g)) most often changes the optimum configuration of θ(D) for J (D)(θ(D), θ(g)).

3.3.2 Optimality

This section aims to show that, if trained optimally, the generator will capture the truedistribution of the training data, i.e. pg = pdata.

First, note that for a fixed generator g, the optimum discriminator Doptg is

Doptg (x) =

pdata(x)

pdata(x) + pg(x). (3.4)

We now seek to show that the value function in Eq. 3.3 has a global optimum point forpg = pdata. From the generators point of view, it is possible to reformulate Eq. 3.3 as

18

C(g) = maxD

V (g,D) = Ex∼pdata(x)[logDoptG (x)] + Ez∼pz(z)[log(1−Dopt

g (g(z)))]

= Ex∼pdata(x)

[log

pdata(x)

pdata(x) + pg(x)

]+ Ex∼pg

[log

pg(x)

pdata(x) + pg(x)

].

(3.5)

Next, it is possible to rewrite Eq. 3.5 using the Kullbeck-Leibler divergence and ultimatelythe Jensen-Shannon divergence

C(g) = − log(4) +KL

(pdata

∣∣∣∣∣∣∣∣ pdata + pg2

)+KL

(pg

∣∣∣∣∣∣∣∣ pdata + pg2

)= − log(4) + 2 · JSD

(pdata

∣∣∣∣∣∣∣∣ pg). (3.6)

The Jensen-Shannon divergence is a symmetrical, non-negative measure of the diver-gence between two probability distributions p and q [37]. The divergence is zero if andonly if p = q. Thus, C(g) has a global minimum of − log(4) if and only if pg = pdata.At this optimum, even the optimal discriminator Dopt

g cannot separate "fake" samplesfrom "real" - and will therefore return Dopt

g (x) = 12

for any sample x (see Eq. 3.4).

This is an important result, it shows that the GAN framework provides a trainingstrategy that has a unique global minimum when the generator has learned to per-fectly mimic the data distribution. In other words, the generator g has learned toembed pdata into some low dimensional manifold Z . However, as we will see, obtain-ing this global optimum has unfortunately proved to be a difficult task. Furthermore,note that these results require the discriminator D to constantly be optimal w.r.t cur-rent generator g. This might not be the case in practice. Because, for example, thecomputational power of any neural network is limited or because Dg is not traineduntil it converge (see Alg. 1). Additionally, this assumes a infinite data set, as other-wise there is no guarantee that the generator mimics the entire state space of possibledata correctly.

3.3.3 GAN - A special case of f -GAN

In [48], Nowozin et al. showed how the GAN algorithm can be seen as a more gen-eral variational divergence estimation principle. They then extended the GAN frame-work to minimize any divergence measure belonging to a class of divergences calledf -divergences.

Given two probability distributions P and Q, with continuous density functions p andq defined over a domain χ, f -divergence is defined as:

Df (P || Q) =

∫χq(x)f

(p(x)

q(x)

)dx (3.7)

19

where f : R+ → R is a convex function [36]. Clearly, to this class of divergences be-longs, for example, the Kullbeck-Leibler divergance. For a list of different f -divergences,see [48]. [46, 48] showed that the f -divergences are bounded below by

Df (P || Q) ≥ supT∈T

{Ex∼P [T (x)]− Ex∼Q[f ∗(T (x))]

}(3.8)

where f ∗ is the Fenchel conjugate of the convex function f and T is any class of func-tions T : χ → domf∗ . Consider Q as the distribution being encoded by a generativemodel and P as the distribution from which the training data is drawn. Each instanceof the generative model is parametrized by some vector θ, thus encodes a distributionQθ. The variational function, T , is parametrized by a vector ω - thus is symbolized asTω. Led by Eq. 3.8, a value function F (θ, ω) can be defined as

F (θ, ω) = Ex∼P [Tω(x)]− Ex∼Qθ [f∗(Tω(x))]. (3.9)

To ensure that the variational function Tω respects the domain of f ∗ for any choice off -divergence, Nowosin et al. chose to represent Tω as the composition Tω = gf (Vω(x)).Here, gf is an output activation function specific to the choice of f -divergence. Usingthis decomposition of Tω, Eq. 3.9 can be re-written as

F (θ, ω) = Ex∼P [gf (Vω(x))] + Ex∼Qθ [−f ∗(gf (Vω(x)))]. (3.10)

By maximizing Eq. 3.10 w.r.t ω, an approximation of the greatest lower bound ofDf (P || Q) (see Eq. 3.8) is achieved

Df (P || Q) ≥ supT∈T

{Ex∼P [T (x)]− Ex∼Q[f ∗(T (x))]

}= max

ω

{Ex∼P [gf (Vω(x))] + Ex∼Qθ [−f ∗(gf (Vω(x)))]

}= max

ωF (θ, ω) = C(θ).

(3.11)

To see how this relates to the GAN framework, let the generative model Qθ and thefunction Vω be implemented by neural networks. From Eq. 3.6, it is clear that the GANalgorithm decreases the following divergence

DGAN(P || Q) = − log(4) + 2 · JSD(P || Q)

= − log(4) +

∫x

p(x) log

(p(x)

p(x)+q(x)2

)+ q(x) log

(q(x)

p(x)+q(x)2

)dx.

(3.12)

For this choice of f -divergence, the convex function f and the corresponding Fenchelconjugate f ∗ becomes [48]

f(u) = u log u− (u+ 1) log(u+ 1)

f ∗(t) = − log(1− exp(t)).

20

By assuming the activation function gf = − log(1 + exp(−v)) and identifying the dis-criminator D as the sigmoid D(x) = Dω(x) = 1

1+e−Vω(x) , the GAN value function (Eq.3.3) can be recovered

V (θ,D) = Ex∼pdata(x)[logD(x)] + Ex∼Qθ [log(1−D(x)))].

In conclusion, the training algorithm proposed by Goodfellow et al. can be seen as aspecial case of a more general GAN framework. In this view, the discriminator strivesto estimate a lower bound of the divergence measure specific to the choice of loss func-tion. The generator, on the other hand, tries to minimize this estimated divergence.Nowozin et al. showed that many different choices of f -divergences are possible. Toeach choice of f -divergence, there is a corresponding loss function. This result doesnot concern the training scheme of GANs, but shows that one can easily adapt theoriginal GAN formulation to minimize other divergences than the Jensen-Shannondivergence. In the next section, the GAN algorithm will be extended to not only min-imize f -divergences but instead focus on minimizing the Wasserstein metric betweentwo distributions.

3.3.4 Wasserstein GAN

It was shown, see above, that GAN can utilize many different loss functions. This leadsnaturally to the question how the choice of loss function affects the generative model?First, remember that the goal of GAN is to recover the true distribution of the data,Pdata. Thus, intuitively, the GAN algorithm needs to "guide" the generative model’sdistribution Pg towards Pdata. The algorithm therefore needs to define a measure ofthe divergence between the distributions which can provide information of how tochange Pg. Iteratively updating Pg, using this information, will create a sequence ofdistributions (Pgn)n∈{1,...,N}. Thus, it is important that the sequence (Pgn)n∈{1,...,N} con-verges to Pdata given the choice of metric (at least in the limit N → ∞), but also thatconvergence in this metric does imply a meaningful learning of Pdata. However, a se-quence can converge in one metric but not another [26]. The GAN framework hasbeen shown to be flexible to the choice of loss function, and many different have beenproposed.

In a series of papers, Wasserstein GAN (WGAN) [1, 2, 30] provided a theoretical anal-ysis of the impact different divergence measures have on training GAN. The analysisresulted in a new suggestion of divergence measure, namely the Wasserstein-1 metric(or Earth Mover distance). It was shown that for the original representation of GAN,the loss function, the Jensen-Shannon divergence between the data’s distribution Pdataand the generators distribution Pg, is often maxed out if the discriminator is traineduntil convergence [1]. It was also shown that this can only happen if the distributions

21

either have disjoint support or if they are not (absolutely) continuous. This is a majordrawback. Because, in this case, an optimal discriminator will not provide any mean-ingful gradients to update the generator (gradients will be zero). Although, due to thelimitations of the neural networks, the optimal discriminator might not be obtained.However, it was also shown that approximations of the optimal discriminator alsosuffers from either vanishing or inaccurate gradients. This is assumed to be one of thereasons GANs have proven so difficult to train. Because, if the discriminator does notprovide any meaningful gradients to the generator - then the generator can not con-verge to accurately model the data distribution, Pdata. Thus, it is a delicate balance totrain the discriminator enough to provide good gradients, but not train it too much sothe gradients vanish.

Thus, to overcome the difficulties of training GANs, the authors in [1, 2] proposeda new loss measure by using the Wasserstein-1 metric. In general, the Wassersteinmetric is defined as follows

Wp(P,Q) =

(infγ∈Γ

∫χ×χ

d(x, y)pdγ(x, y))

)1/p

, (3.13)

where Γ denotes the set of all joint distributions on χ × χ (χ is a compact set overRn) with marginal distributions P and Q. For in-depth explanation of Optimal Trans-port and the Wasserstein metric, see [60], or for a summary of different probabilitymetrics and their bounds, see [26]. It has been shown [26, 2, 60] that the Wassersteinmetric metrizes weak convergence, given that the state space has a bounded diame-ter. By weak convergence means convergence in (cumulative) distribution. That is,W (Pgn ,Pdata)→ 0 implies Pgn

D−→ Pdata. Furthermore, convergence in Kullbeck-Leiblerdistance implies convergence in Total Variation distance - which in turn implies con-vergence in the Wasserstein metric [26, 2]. In addition, [2] also showed that conver-gence in Total Variation distance is equivalent to convergence in Jensen-Shannon di-vergence. All in all, this means that if a sequence converges in any of the mentionedmetrics/divergences it also converges in the Wasserstein metric. Furthermore, sincethe Wasserstein metric does metrize weak convergence, convergence in the Wasser-stein metric is enough to guarantee convergence in distribution. Thus, convergencein the Wasserstein metric does in fact imply a meaningful learning and representationof Pdata by the generator. To summarize, the idea is that the Wasserstein metric canprovide meaningful gradients even when the Jensen-Shannon divergence cannot. Theauthors of [2] and [30] therefore argue that the Wasserstein metric is a more suitablechoice of loss function than, for example, the Jensen-Shannon divergence.

For WGAN, the Wasserstein-1 metric was chosen (p = 1) together with the L1- dis-

22

tance (d(x, y) = ||x− y||). Thus, the metric becomes

W (P,Q) = infγ∈Γ

E(x,y)∼γ

[||x− y||

]. (3.14)

This metric is also known as the Earth Mover (EM) distance [51]. This term was intro-duced in works covering image retrieval in databases [51, 52]. Intuitively, given twodistributions P and Q, the EM distance corresponds to the minimum work required tore-organize (i.e. "move") the mass of Q to fit that of the distribution P. That is, the workrequired to transform Q into P if work is defined as transporting one unit of mass forone unit of distance. The name Earth Mover distance derives from the analogy thatit measures the work required to move and re-organize some earth from Q to P. TheEM distance thus corresponds to the cost of the optimal transport plan, γ∗. SolvingEq. 3.14 is computationally intractable due to the infimum operator. However, it canbe re-formulated using the Kantorovich-Rubinstein duality [60] to

W (P,Q) = sup||f ||L≤1

Ex∼P[f(x)]− Ex∼Q[f(x)]. (3.15)

Here the supremum is over all 1-Lipschitz functions. By 1-Lipschitz function meansall functions f : χ→ R such that ||f(x)− f(y)|| ≤ ||x− y||.

How to implement the Wasserstein-1 metric, defined in Eq. 3.15? The Wassersteinmetric is the supremum over 1-Lipschitz functions. The problem can be simplified byconsidering all K-Lipschitz functions, and noting that this corresponds to a scaling ofthe metric by a constant factor, K ·W (Pdata,Pg) [2]. This problem can, in turn, be ap-proximated by considering only a subset of all K-Lipschitz functions. Let {fw}w∈W besome K-Lipschitz functions with weights w, where w belongs to some compact set ofall allowed weighs W , then the following bound is true

maxw∈W

Ex∼Pdata [fw(x)]−Ex∼Pg [fw(x)] ≤ sup||f ||L≤K

Ex∼Pdata [f(x)]−Ex∼Pg [f(x)] = K·W (Pdata,Pg).

(3.16)If the bound in Eq. 3.16 is reached for some weigths w, then the Wasserstrein objectiveis determined up to a constant factor. Furthermore, differentiating Eq. 3.16 will thenyield the gradient of W , again scaled by a constant factor. If fw is implemented bya neural network, and gθ is a also a neural network, parametrized by θ, such thatgθ : Z → χ, then it has been shown by [2] that the gradients of W (Pdata,Pg) w.r.t θ is

∇θW (Pdata,Pg) = −Ez∼p(z)[∇θfw(gθ(z))]. (3.17)

Thus, a discriminator neural network fw can be trained to approximate f . To enforcethe K-Lipschitz constraint on the discriminator network, the weights w are limited tosome compact weight space W . Furthermore, given this network, a generator gθ can

23

be trained by backpropagating the loss function using Eq. 3.17. The correspondingValue function is

minθ

maxw

Ex∼Pdata [fw(x)]− Ez∼Pz [fw(gθ(z))]. (3.18)

Note that the K-Lipschitz constraint is enforced not by the weights w in fw but by thespace W to which the weights w must belong. In the first paper introducing WGAN,the compactness ofW was enforced by weight-clipping. That is, simply clip all weightvalues that are outside of some allowed range. The WGAN algorithm is detailed inAlg. 2. It should be added that a recent version of the WGAN [30] enforces the Lips-chitz constraint by penalizing the gradient norm of the discriminator instead of weightclipping.

Algorithm 2 WGAN: original algorithm. Note, in original paper k= 5 and c = 0.01 forall experiments.

for number of training iterations dofor k steps do• Sample minibatch of m noise samples {z(1), ..., z(m)} from noise prior pz(z).• Sample minibatch of m data samples {x(1), ..., x(m)} from the data distribution pdata(x).• Update discriminator by ascending its stochastic gradient:

∇w1

m

m∑i=1

[fw(x(i))− fw(gθ(z

(i))))]

• Clip weights w to fall in range [-c,c]

end for• Sample minibatch of m noise samples {z(1), ..., nz(m)} from noise prior pg(z).• Update generator by descending its stochastic gradient:

∇θ1

m

m∑i=1

[−fw(gθ(z(i))))]

end for

3.3.5 Inference in latent space - Reconstructing missing elements

Although GAN has proven to be a powerful model to generate high-dimensional datasamples, it is less adapted for inference tasks compared to other generative models(e.g. VAE). In other words, the mapping g : Z → χ is easy, but the inverse map-ping g−1 : χ → Z is not. However, several suggestions have been put forth tryingto alleviate this disadvantage. Adversarially Learned Inference (ALI) [20] and Bidi-rectional GAN (BiGAN) [19] both try to integrate an inference machine into the GANframework by training both an inference machine and a generative machine in an ad-versarial manner. A different approach is taken by InfoGAN [12]. InfoGAN is an

24

adversarially trained network which maximizes the mutual information between thea subset of the latent space and the observations, thus hoping to learn interpretablerepresentations. Yet another approach has been to invert the flow of the generatornetwork [17, 38, 63]. Given the computational graph of a pre-trained generator g,backpropagation can be used to infer the corresponding latent vector of a sample. Thisapproach will be discussed in more detail below in the setting of image reconstruction.

In [63] by Yeh et al., the task of semantic reconstruction of images is approached froman image generation perspective. The core assumption is that a corrupted image y liesclose to the original, uncorrupted image in input space yet does not reside within themanifold pdata of "natural" images. In other words, even though the corrupted imageshould be "unnatural" due to the corruption, it should still be similar to the original im-age. Thus, it is plausible that, given the corrupted image, reconstructing the corruptedareas from the "closest" image on the manifold pdata will yield a good approximationof the original image.

In order to define the "closest" data point, assume a generator network g is given,trained to embed the data manifold pdata on a latent space Z . Then, a point is soughtwhich minimizes some measure of "closeness" to the corrupted data point. From adata generation perspective, the closest point can be defined as the point generatedfrom the latent point z which minimized some loss function L

z∗ = arg minzL(z).

In the work by Yeh et al., a combination of perceptional and contextual loss was used asloss function. The perceptual loss encourages the approximated image to be "natural"by trying to fool the discriminator D into believing that the approximated image is anatural image

Lperceptual(z) = −D(g(z)).

The contextual loss ensures that the generated image agrees with the uncorruptedparts of the corrupted image. Using a binary mask M , denoting missing elementswith 0 and the uncorrupted elements by 1, the contextual loss is defined as

Lcontextual(z) = ||M � g(z)−M � y||1

where � is the element-wise multiplication of two matrices.

The loss function L is defined as a weighted sum of the perceptual and contextualloss

L(z) = Lcontextual(z) + λLperceptual(z) (3.19)

25

where λ is chosen to be a small value in order to ensure the generated image is con-strained to the uncorrupted elements. By using this loss function to find the "closest"image on the manifold, g(z∗), the reconstructed image, yrec, can easily be acquired by

yrec = M � y + (1−M)� g(z∗). (3.20)

Note that although [63] was concerned with task of image reconstruction, the methodis general and can be applied to reconstruction tasks in other domains as well, as longas the assumption about closeness is valid.

26

Chapter 4

Related work

• Berniker & Kording (2015) [8]. In "Deep networks for motor control functions",Berniker and Kording showed that it is possible to train an autoencoder as aforward and inverse model. The model could produce time-varying commandsand state trajectories governing point-to-point reaches. Furthermore, the net-work could be trained to produce near optimal trajectories.

• Mnih et al. (2015) [44]. In "Human-level control through deep reinforcementlearning", Mnih et al. showed they could train a DQN-network (a model-freereinforcement learning algorithm) to achieve above human-level performance ina range of Atari games.

• Kulkarni et al. (2016) [34]. Successor reinforcement learning is a less knownvariant of reinforcement learning. By decomposing the value function into twocomponents, a reward predictor and a successor map, the successor model triesto combine some of the advantages of model-free and model-based reinforce-ment learning, respectively. In "Deep successor reinforcement learning", Kulka-rni et al. trained a successor network which performed on par with the DQN butadapted quicker to changes in distal rewards. Furthermore, they also showedhow the successor network can be used to approximately identify bottleneckstates, a task which is of use in planing.

27

Chapter 5

Method

5.1 Simulator environment - a toy model

A simulator environment was defined by which training data were produced and gen-erated samples were evaluated. The simulator environment defined the motion of atwo-linked arm in a two-dimensional task space, see Fig. 5.1. The time domain wasdiscretized. The state of each link was defined by its angular position, θi, and angularvelocity, θi. The end-point position, XXX , of the arm was represented in Cartesian coor-dinates and was determined by the angular positions and the lengths of the links. Inthe case of the two-linked arm used in this project, each limb was considered to be ofunit length. Thus, the end-point was described by

XXX(θθθ) = [cos(θ1) + cos(θ1 + θ2), sin(θ1) + sin(θ1 + θ2)]

where θi is the angle of the i:th link. Together, these variables represent the state ofthe arm, St = {XXX t, θθθt, θθθt}, at time step t. The motion dynamic of the toy environmentwas linear and did not incorporate any external forces, nor any internal forces suchas friction etc. Given the current state of the arm, St, and a choice of action, uuut, thesimulator E computes the next state of the arm, St+1, as

St+1 = E(St,uuut) =

˙θθθt+1 = θθθt + uuut

θθθt+1 = θθθt + ˙θθθt+1 = θθθt + θθθt + uuut

XXX t+1 = XXX(θθθt+1)

(5.1)

As there is no general method to evaluate and compare the performance of generativemodels, the simulator environment offers the possibility to compare the generatedsamples to a ground truth.

28

Figure 5.1: Two-linked arm, the ’x’ marks the end position of the arm.

5.1.1 Training data

To train a generative model (see 3.2), it is necessary to provide a large amount of train-ing data. The movement samples used to train the generative models were defined asy = {St,uuut, St+1}. The initial state of the arm, St, and the choice of action (or motorcommand), uuut, were randomly sampled and the subsequent state of the arm St+1 wascomputed using the simulator environment (see Eq. 5.1). The angular positions weresampled from uniform distributions, θ1,t ∼ U [0, π

2] and θ2,t ∼ U [−π

4, π

4], the angular

velocities from U [−2π12, 2π

12] and the actions from U [−2π

12, 2π

12]. In total, 100,000 training

samples were produced and used in training. The training data was zero centered andnormalized (to [−1, 1]).

29

5.2 Training GAN

5.2.1 Wasserstein GAN

𝑧1 𝑧2 𝑧3 𝑧4 𝑧5 𝑧6

𝑋1,𝑡 𝑋2,𝑡 𝑋1,𝑡+1 𝑋2,𝑡+1𝜃1,𝑡 𝜃2,𝑡 ሶ𝜃1,𝑡 ሶ𝜃2,𝑡 𝜃1,𝑡+1 𝜃2,𝑡+1 ሶ𝜃1,𝑡+1 ሶ𝜃2,𝑡+1𝑢2,𝑡𝑢1,𝑡

ℎ2 ℎ3 ℎ4 ℎ5 ℎ6 ℎ7ℎ1 ℎ8Hidden Layer

Input Layer

OutputLayer

Weights 𝑤𝑛𝑚

Weights 𝑤𝑖𝑛

(a) A generator network

ℎ2 ℎ3 ℎ4 ℎ5 ℎ6 ℎ7ℎ1Hidden Layer

Input Layer

Output Layer

Weights 𝑤𝑛𝑚

Weights 𝑤𝑖𝑛

𝑋1,𝑡 𝑋2,𝑡 𝑋1,𝑡+1 𝑋2,𝑡+1𝜃1,𝑡 𝜃2,𝑡 ሶ𝜃1,𝑡 ሶ𝜃2,𝑡 𝜃1,𝑡+1 𝜃2,𝑡+1 ሶ𝜃1,𝑡+1 ሶ𝜃2,𝑡+1𝑢2,𝑡𝑢1,𝑡

𝑜𝑢𝑡

(b) A discriminator network

Figure 5.2: Examples of network architectures

Wasserstein GANs (see 3.3) were trained1 on the data provided by the toy environ-ment (see 5.1) using several different combinations of generator and discriminator ar-chitectures. All networks consisted of fully connected layers with Leaky ReLU [39]activation functions, except for the output layer. The output layer of the generatorinstead included a tanh-activation, limiting the output to the range [−1, 1]. The out-put layer of the discriminator did not include any activation function. By using fullyconnected layers, the WGAN algorithm (Alg. 2) had to learn the mapping withoutany prior assumptions of dependencies imposed by the network structures. The net-work architectures for the discriminator and generator are detailed in table 5.1 andtable 5.2, respectively. In addition, Fig. 5.2 shows a schematic representation of adiscriminator and a generator network. All trainings utilized the RMSProp optimizer[56], and inputs to the generators were sampled from a uniform distribution (U [−1, 1]).Weight clipping was used to enforce the Wasserstein constraints. Table 5.3 lists the hy-perparameters used for training the networks. These values were mostly taken fromliterature, i.e. from [2]. The structure of learning GANs is shown in Fig. 5.3.

1WGAN was implemented using tensorflow. The code was based upon and extended from: 1) Mod-ifications for Inpainting: Brandon Amos (http://bamos.github.io), 2) generative-models: AgustinusKristiadi (https://github.com/wiseodd). The code was validated by training it using the MNIST dataset and generating new images of digits, shown to work in [29].

30

Table 5.1: Discriminator architectures. FC.10 + LReLu = Fully connected layer with 10units and Leaky ReLU non-linearity.

Layer 1 Layer 2 Layer 3 Layer 4Discriminator 1 FC.5 + LReLU FC.1 Nan NanDiscriminator 2 FC.7 + LReLU FC.1 Nan NanDiscriminator 3 FC.9 + LReLU FC.1 Nan NanDiscriminator 4 FC.7 + LReLU FC.7 + LReLU FC.1 NanDiscriminator 5 FC.7 + LReLU FC.7 + LReLU FC.4 + LReLU FC.1

Table 5.2: Generator architectures. FC.10 + LReLU = Fully connected layer with 10units and Leaky ReLU non-linearity.

Layer 1 Layer 2 Layer 3 Layer 4Generator 1 FC.8 + LReLU FC.14 + tanh Nan NanGenerator 2 FC.10 + LReLU FC.14 + tanh Nan NanGenerator 3 FC.12 + LReLU FC.14 + tanh Nan NanGenerator 4 FC.10 + LReLU FC.10 + LReLU FC.14 + tanh NanGenerator 5 FC.10 + LReLU FC.10 + LReLU FC.12 + LReLU FC.14 + tanh

Table 5.3: List of hyperparameters and their values for training the WGAN networks.

Hyperparameter Value DescriptionDimensions of latent space 6 Number of random inputs to the generator network.Learning rate, discriminator network 0.00005 The learning rate used by RMSProp for optimizing the discriminator.Learning rate, generator network 0.00005 The learning rate used by RMSProp for optimizing the generator.RMSProp decay, discriminator 0.9 Discounting factor for the history/coming gradient.RMSProp decay, generator 0.9 Discounting factor for the history/coming gradient.Gradient momentum, discriminator 0 Gradient momentum used by RMSProp for optimizing the discriminator.Gradient momentum, generator 0 Gradient momentum used by RMSProp for optimizing the generator.Weight Clip [-0.01, 0.01] Range of allowed weights values for the discriminator network.Generator update frequency 5 The frequency of which the generator network is updated.Batch size 64 Number of samples in each mini-batch fed to the networks.Training steps 1,000,000 Total number of generator updates during training.Number of evaluation samples 50,000 Number of samples generated by each model to evaluate the performance during training.

Table 5.4: List of hyperparameters and their values for reconstructing data samples.

Hyperparameter Value DescriptionLearning rate 0.0005 Learning rate used in SGD optimization.Momentum 0.9 Momentum used in SGD optimization.Perceptual loss, scaling factor λ 0.1 The scaling factor for the perceptual loss component of the reconstruction loss.Motor command loss, scaling factor β 0.5 The scaling factor for the motor command loss component of the reconstruction loss.Number of starting points 64 Number of random starting points, in the latent space, for each masked sample.Number of updates 1000 Number of backpropagation steps for reconstruction

31

Simulator Environment

Generator

Sample

Sample

Discriminator

Loss

Figure 5.3: Training WGAN

5.2.2 Quantifying training progression

To quantify the performances of the generative networks, two measures were em-ployed. The first measures how well the generative models have learned the dynamicsof the simulator environment. The second measures how well the generative modelsmimic the distribution of samples available in the training data. A good generatorshould create movement samples which do not defy the "physical rules" defined bythe simulator environment (i.e. Eq. 5.1) that are implicitly available in the trainingdata. Furthermore, especially in the early formulations of GAN, mode collapse wasfrequently reported [29, 27]. Therefore, it is important to control that the generativemodels produce a wide variety of plausible movement samples. Combined, the mea-sures provide a somewhat accurate picture of the quality of the generative models.Each model was evaluated after every 100, 000 generator update. The evaluation wasdone by first randomly sampling 50, 000 validation samples from the generator andthen evaluating the samples as described below.

Measuring the learned dynamicsTo measure how well a generative model captures the dynamics of the environment,the initial state St and action uuut from each generated validation sample were passed to

32

the simulator. Then the predicted future state, by the generator, was compared to thetrue resulting future state, given by the simulator. This was done by measuring theRMS error of Sit+1 for each generated validation sample, yival, where ground truth wasgiven by E(Sit ,uuu

it). Thus, the first measure was calculated as

εRMS =

√√√√ 1

N

N∑i=1

(Sit+1 − E(Sit ,uuu

it))2 (5.2)

Note that this error was measured over the normalized data samples.

Measuring diversityTo measure the diversity of the generated samples, the Jensen-Shannon divergencebetween discretized distributions of {θθθt, θθθt,uuut} from the generated validation samplesand the training data, respectively, was employed. By limiting the divergence measureto these variables, the size of the discretized state space was reduced. For θ1,t and θ2,t,10 discrete states was differentiated each. The 10 states were defined as equidistantintervals over the allowed range of respective variable (see 5.1.1). For the variables inθθθt, 5 discrete states were similarly defined each, and for uuut only 3 states were definedeach. Note that these states only encompass the allowed initial states. All sampleswith disallowed initial values were gathered in one single discrete state. Despite thecrude partitioning, the combination of these states results in 22501 (22500 allowed +1 disallowed) discrete states. This large state space requires a large amount of datafor the Jensen-Shannon divergence to give reliable information, therefore the largenumber of validation samples. In addition, to avoid division by zero, each state wasadded a small number at the cost of introducing a bias.

5.3 Experiments

Trained networks were tested as integrated forward and inverse models (see 2.2.1 and2.2.2). To do so, the forward and inverse tasks were re-defined and cast into problemsof data reconstruction. By providing partially obscured data samples, yobs = y �M obs

whereM obs is a binary mask, and letting a generator re-creating the full sample, a widevariety of control tasks can be re-formulated into a common logic. The differencebetween the forward and inverse model becomes completing elements in differentdomains. The obscured samples provide partial evidence, which the generator mustuse in order to find a suitable solution to the missing elements. Furthermore, thecompleted sample must adhere to the dynamics of the environment, thus passing asa valid movement sample. In order to find a suitable reconstructed sample, first arandom latent vector z is sampled. Passing the vector into the generator, a generated

33

sample g(z) is obtained. This sample can then be compared to the unmasked parts ofthe data sample yobs, which should be reconstructed. The latent vector z can then beiteratively updated by backpropagating the gradients of the following loss functionthrough the generator (for further details see 3.3.5 and table 5.4),

L(z) = Lcontextual(z) + λLperceptual(z)

= ||g(z)�M obs − yobs||1 − λD(g(z)).(5.3)

Due to the non-convex nature of this optimization task, several different starting loca-tions zi was used (see table 5.4). Among them, the generated sample, g(z∗), generatedby latent vector with lowest loss, z∗ = arg min

zi

L(zi), was selected and used for recon-

structing the movement data.

yrec = yobs + (1−M obs)� g(z∗). (5.4)

5.3.1 Forward model

To test a model’s capacity as forward model, the generator was provided with truemovement samples where the future state of the arm had been masked.

M forward = [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0], (5.5)yforward = {St,uuut, St+1}. (5.6)

Given an initial state and motor commands, the future state is fully determined andunique. Thus, to evaluate the performance of this task, the RMS error between thepredicted future state and the true future state was measured. The results are shownin chapter 6.2.

5.3.2 Inverse model

As an inverse model, the generator had to reconstruct the motor commands and theresulting future angles and angular velocities - given only the initial state of the armand a desired future end-point position (in Cartesian coordinates).

M inverse = [1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0], (5.7)

yinverse = {XXX t, θθθt, θθθt,uuut,XXX t+1, θθθt+1, ˙θθθt+1}. (5.8)

In contrast to the forward model, the solution to each given problem might not beunique. There is an inherent redundancy in the configuration space, and (almost) anyend-point XXX can be reached by different configurations of the links (see discussion in

34

2.1 and 2.2.1). In the case of the two-linked arm, there might be two distinct solutionsto the same problem. Thus, directly measuring the RMS error between the predictedvariables and the true masked variables might not be a suitable measure of the gener-ator’s performance. Instead, the performance is measured by how close to the targetend-pointXXX t+1 (1) the completed angles θθθt+1 and (2) the action commands uuut togetherwith initial state St bring the arm, respectively. In addition, to control for internal con-sistency, the difference between the predicted future state of the angles and angularvelocities of the arm and the actual future state of those variables, given the predictedmotor command, is computed. Note that the last measure only uses the generators re-constructed samples and the simulator environment. The results are shown in chapter6.3.

5.3.3 Inverse model - Selecting minimal action

As described in 2.1, optimal control theory has been a popular approach to modelhuman movement. In general, the goal is to find the solution which minimizing somecost function consisting of a desired end-state and a penalizing term (see Eq. 2.3). Inmodern research, the penalizing term has often been chosen to be the sum of squaredmotor commands. Inspired by this approach, by augmenting the loss function in Eq.5.3 with a term penalizing the sum of squared motor commands, the following lossfunction can be defined

L(z) = Lcontextual(z) + λLperceptual(z) + βLpenalizing(z)

= ||g(z)�M inverse − yinverse||1 − λD(g(z)) + β||g(z)�Mut ||21.(5.9)

Here Mut is a binary mask setting only the elements corresponding to motor com-mands uuut to 1,

Mut = [0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]. (5.10)

We hypothesized that with this loss, the generator network could find solutions to theinverse tasks which do not only guide the arm to the correct end-point, but at the sametime use significantly smaller motor commands than if using the loss function in Eq.5.3. The results are shown in chapter 6.5.

5.3.4 Exploring the latent space

The latent space of a generator network was explored to understand how the train-ing had structured the information into the low-dimensional manifold. By randomlysampling a latent vector and then altering one element at the time, keeping the otherelements fixed, the effect of each latent variable on the output of the network could betracked. The results are shown in chapter 6.6.

35

Chapter 6

Experimental results

The main hypothesis investigated in this project was that a generative model could beused as an integrated forward and inverse model for motor control. This was testedusing GANs with varying structures of the generator and discriminator networks (seetables 5.1 and 5.2). The samples used to train the GANs were produced by a simulatorenvironment. Then, the GAN which produced the most realistic samples was selectedfor further investigations. Finally, the selected GAN was tested both as forward andinverse model using a separate test set of masked samples, produced by the simulatorenvironment.

6.1 Training phase

In the training phase, the networks’ abilities to generate new movement samples weretested. Each sample was defined as

y = {St,uuut, St+1},

where St is the initial state of the arm, uuut the motor command at time t and St+1 isthe state of the arm at time t + 1. At different stages of training, 50,000 samples wererandomly sampled from each generator network. Then, each generated state Sgt+1 wascompared to the true future state E(Sgt ,uuu

gt ), i.e. the true future state given the gen-

erated initial state and motor command. The RMS error between Sgt+1 and E(Sgt ,uuugi )

was measured using the normalized movement samples. Consistently, trained net-works showed lower RMS error than their corresponding untrained (random) net-works. From the 5× 5 (five generator and five discriminator networks) combinations,discriminator network 1 and generator network 3 showed the smallest RMS error, seetable 6.1. Thus, this combination of networks was chosen for the future training results

36

and for the forward and inverse experiments. The feasibility of this choice was alsotested. The results are shown in chapter 6.4.

Table 6.1: RMS error after training different combinations of discriminator and gener-ator architectures. The RMS error scaled by the RMS error of the model before trainingare shown inside the parenthesis.

Discriminator 1 Discriminator 2 Discriminator 3 Discriminator 4 Discriminator 5Generator 1 0.1518 (0.1047) 0.1343 (0.1174) 0.2629 (0.1645) 0.1544 (0.1049) 0.1443 (0.1121)Generator 2 0.1298 (0.0869) 0.1437 (0.0789) 0.1434 (0.1130) 0.1846 (0.1093) 0.1548 (0.1040)Generator 3 0.1250 (0.0901) 0.1268 (0.0851) 0.1271 (0.0670) 0.1422 (0.0978) 0.1372 (0.0881)Generator 4 0.1317 (0.0997) 0.2078 (0.1174) 0.1331 (0.0982) 0.1581 (0.0870) 0.3208 (0.2261)Generator 5 0.2377 (0.1711) 0.2211 (0.1947) 0.2217 (0.1664) 0.2214 (0.1548) 0.2199 (0.1394)

In table 6.2 different randomly chosen generated samples are visualized. These visu-alizations does not constitute enough evidence by themselves for any validation, butprovide an intuitive understanding of the results of the training process. The blue armrepresents the initial state of the arm, Sgt , sampled from the generator. The red armrepresents the future state of the arm, Sgt+1, according to the generator, and the greenarm represents the true future state of arm, E(Sgt ,uuu

gt ). Ideally, the red and the green

arm should completely overlap - meaning that the generator network has learned toproduce accurate movement samples. Visual inspection of the samples shown in table6.2 suggests that the generated samples becomes more realistic during training. In Fig6.1a, the average RMS errors are shown for samples generated at different stages oftraining. It shows that the generated samples does in fact become more realistic astraining progresses, i.e. the generated future state of the arm is more aligned to thetrue future state of the arm for the trained GAN than the untrained GAN. Note thatin Fig 6.1a, a 95% confidence interval is included in the graph but the interval is toosmall to be visible. The discretized KL-divergence between the training set and the setof generated samples is also shown in 6.1b. Although the partitioning was crude (only∼ 22500 states), Fig 6.1b suggests that the trained GAN has not suffered from a majormode collapse.

The main conclusion from these results is that it is possible to train GANs to producenew movement samples that are more accurate than those produced by the randomnetworks.

37

TrainingIteration

0 (random)

100,000

200,000

1,000,000

Table 6.2: Examples of generated movement samples (zoom in on pdf). Blue arm: Stfrom g. Green arm: true St+1 given St. Red arm: St+1 from g.

38

0 200000 400000 600000 800000 1000000Generator iteration

0.2

0.4

0.6

0.8

1.0

1.2

1.4Av

erag

e RM

S error

Random network

(a)


0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

JS-diverge

nce (nats)

Random network

(b)

Figure 6.1: Training progression of GAN using discriminator network 1 and generatornetwork 3. (a) shows the RMS error and (b) the JS-divergence.

6.2 Forward model

The generative models were tested as forward models. New movement samples wereproduced by the simulator environment and the future state in each sample was masked.

yforward = {St,uuut, St+1}.

The generative models were tasked to reconstruct the masked future state, St+1, giventhe initial state, St, and the motor command uuut. In this chapter, the results are dis-played for the GAN with lowest RMS error in chapter 6.1 (i.e. the GAN using gener-ator network 3 and discriminator network 1). In table 6.3, different samples and theirreconstructions at different stages of training are visualized. The blue arm representsthe initial state of the arm, produced by the simulator environment. The green armrepresents the true future state of the arm, St+1 = E(St,uuut), and the red arm representsthe future state of the arm as reconstructed by the generator. The visualizations showa clear tendency that the generative model improves as training progresses, i.e. thegreen and red arms get closer together as the training progresses. Fig 6.2a displays thecoefficient of explanation for each variable in St+1. The coefficients for the fully trainednetwork lie in the range 0.943 to 0.984, and in the range−2.735319 to−0.780772 for theuntrained network. Thus, the fully trained GAN can explain most of the variation inthe test data. Fig. 6.3 shows that the average RMS error between Sgt+1 and the trueSt+1 is significantly lower for the trained network than the untrained. In conclusion,these results show that the trained GAN performed better as a forward model thanthe untrained GAN.

39

TrainingIteration Sample 1 Sample 2 Sample 3 Sample 4

0 (random)

100,000

200,000

1,000,000

Table 6.3: Examples of reconstructed samples as forward model (zoom in on pdf). Bluearm: St from E. Green arm: St+1 from E. Red arm: St+1 reconstructed by g.

40


0.0

0.2

0.4

0.6

0.8

1.0R2

X1, t+1X2, t+1Θ1, t+1Θ2, t+1

Θ1, t+1

Θ2, t+1

(a)

−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00Fitted Value

−0.2

−0.1

0.0

0.1

0.2

Resid

ual

X1, t+1X2, t+1Θ1, t+1Θ2, t+1

Θ1, t+1

Θ2, t+1

(b)

Figure 6.2: (a) Coefficient of determination of each missing variable as training pro-gresses. (b) The residual plot for fitted values by the fully trained network.


0.2

0.4

0.6

0.8

1.0

1.2

1.4

Averag

e RM

S error

Random network

Figure 6.3: Average RMS error when GAN is used as forward model (95% confidenceinterval included).

41

6.3 Inverse model

The generative models were also tested as inverse models. The samples used to testthe models were the same as for the forward task. However, instead of masking theentire future state of the arm, the motor command uuut together with the future angularpositions and velocities, θθθt+1 and θθθt+1, were masked.

yinverse = {XXX t, θθθt, θθθt,uuut,XXX t+1, θθθt+1, ˙θθθt+1}.

The results in this chapter are shown using the same GAN as in chapter 6.2. In table6.4, different samples and their reconstructions at different stages of training are vi-sualized. As in the forward task, the blue arm represents the initial state of the arm,produced by the simulator environment. The green arm represents the true futurestate of the arm, St+1 = E(St,uuut), and the red arm represents the future state of thearm as reconstructed by the generator, Sgt+1. Additionally, the purple arm representsthe true future state of the arm given the initial state St and the reconstructed motorcommand, uuugt , i.e. the purple arm represents E(St,uuu

gt ). In the samples shown in table

6.4, the green, red and purple arms are, in general, more overlapping for the trainednetwork than for the untrained network. Note that due to the redundancy of the statespace, there often exist two distinct solutions to each inverse problem. Thus, as longas the end-point of the arm is located at the right coordinates, the configuration ofthe arm has to be considered correct. In Fig 6.4a, the coefficients of determination ofthe Cartesian end-point coordinates of the future state, XXX t+1, are shown at differentstages of training. The determination is shown both for the end-point given the recon-structed angular positions of the arm, XXX t+1(θθθt+1), and for the end-point according tothe simulator given the true initial state of the arm and the reconstructed motor com-mand, XXX t+1(E(St,uuu

gt )). All the coefficients are in the range 0.95-0.97 for the trained

network, indicating a reconstruction which explains most of the variance in the fu-ture end-point coordinates. This is further illustrated in Fig 6.5a, which displays theRMS errors between the masked end-point XXX t+1 of the arm and the end-point of thereconstructed arm, similarly to the coefficients of determination. Fig. 6.5b displays theRMS "discrepancy" between the reconstructed angular positions θθθgt+1 and the angularpositions given the initial state of the arm and the reconstructed motor commands,i.e. θθθt+1(E(St,uuu

gt )). It shows that the discrepancy also decreases with training. A low

RMS "discrepancy" indicates a coherent reconstruction where the reconstructed mo-tor commands actually result in the reconstructed angular positions and velocities. Inconclusion, these results show that the trained GAN performed better as an inversemodel than the untrained GAN.

42

TrainingIteration Sample 1 Sample 2 Sample 3 Sample 4

0 (random)

100,000

200,000

1,000,000

Table 6.4: Examples of reconstructed samples as inverse model (zoom in on pdf). Bluearm: St from E. Green arm: St+1 from E. Red arm: St+1 reconstructed by g. Purplearm: true St+1 for given St and reconstructed uuut. Yellow arms: true St+1 given St andmax/min uuut.

43


0.0

0.2

0.4

0.6

0.8

1.0R2

X1, t+1(Θgt+1)

X2, t+1(Θgt+1)

X1, t+1(E(St, ugt ))X2, t+1(E(St, ugt ))

(a)

−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00Fitted Value

−0.2

−0.1

0.0

0.1

0.2

Resid

ual

X1, t+1(Θgt+1)

X2, t+1(Θgt+1)

X1, t+1(E(St, ugt ))

X2, t+1(E(St, ugt ))

(b)

Figure 6.4: (a) Coefficient of determination of each missing variable as training pro-gresses. (b) The residual plot for fitted values by the fully trained network.


0.1

0.2

0.3

0.4

0.5

0.6

Average RMS error

Random network

Random network

Xt+1( gt+1)

Xt+1(E(St, ugt ))

(a)


0.2

0.4

0.6

0.8

1.0

1.2

Averag

e RM

S de

vian

ce

Random network

(b)

Figure 6.5: (a) Average RMS error when GAN is used as inverse model. (b) The aver-age difference between the angular configurations as predicted by the generator andthe actual configurations given initial state and generated action. A 95% confidenceinterval is included in both graphs.

6.4 Choosing generative model

In chapter 6.1, it was shown that the GAN model trained using the combination of gen-erator network 3 and discriminator network 1 had the smallest sample generation er-ror. It was assumed that this model was a good candidate for testing the performance

44

as integrated forward and inverse model. In chapters 6.2 and 6.3, this GAN’s perfor-mance as integrated forward and inverse model is shown. However, in this chapter theassumption that a low generation error leads to low errors in the forward and inversetasks is tested. By testing all trained networks as forward and inverse models andmeasuring their average RMS error, the correlation between sample generation error("training error") and forward/inverse model error ("test error") was calculated. Fig.6.6 shows that there are positive correlations between the training error and the testerrors (both for the forward task and the inverse task), as was hypothesized. Hence,the GAN which produces most coherent, or realistic, samples is also a good choice asintegrated forward and inverse model. Note that the error of the forward model is thesum over six variables whereas the error of the inverse model is measured as the sumover only two variables.

0.125 0.150 0.175 0.200 0.225 0.250 0.275 0.300 0.325Training error

0.10

0.15

0.20

0.25

0.30

0.35

Test erro

r

Forward model, r=0.6558Inverse Model, r=0.6701

Figure 6.6: Correlation between the networks training performances and their perfor-mances as forward and inverse models.

6.5 Inverse model - Selecting minimal action

In chapter 6.3, it was shown that the trained GAN performed better on the inverse taskthan the untrained. However, there exist multiple solutions to each inverse problem.To test if the trained network can discriminate between the different solutions, the

45

network was tested as an inverse model which penalized the sum of motor commands.The task was still to reconstruct

yinverse = {XXX t, θθθt, θθθt,uuut,XXX t+1, θθθt+1, ˙θθθt+1},

however, this time using the loss function in Eq. 5.9. Fig. 6.7a shows that the averagesum of square motor commands for the reconstructed samples is significantly lowerwhen including the penalizing term (Inverse Model + Min. Action) compared to whenit is not included (Inverse model). The accuracy on the inverse task when includingthe penalizing term (solid lines) is compared to the accuracy when the penalizing termis not included (dashed lines) in Fig. 6.7b. There is no clear difference in accuracy be-tween the two reconstruction methods. Fig. 6.8a shows the RMS errors between themasked end-pointXXX t+1 of the arm and the end-point of the reconstructed arm and Fig.6.8b the discrepancy between the reconstructed angular positions θθθgt+1 and those angu-lar position resulting from the initial state St and the reconstructed motor commandsuuugt according to E. From these graphs, it is clear that the reconstructed samples bothbecome more internally correct (i.e. coherent according to the simulator dynamics)and that the reconstructed samples result in end-points closer to the target end-points.In conclusion, by adding a penalizing term to the reconstructing loss function, GANcan find solutions to the inverse problem that require significantly smaller motor com-mands.


0.0

0.2

0.4

0.6

0.8

1.0

∑ iu2 i,t

Inverse ModelInverse Model + Min. Action

(a)


0.0

0.2

0.4

0.6

0.8

1.0

R2

X1, t+1(Θgt+1)

X2, t+1(Θgt+1)

X1, t+1(E(St, ugt ))X2, t+1(E(St, ugt ))

(b)

Figure 6.7: (a) Average sum of square motor commands for inverse tasks. (b) Coeffi-cient of determination, solid line for Inverse Model + Min. Action and dashed for onlyInverse Model.

46


0.1

0.2

0.3

0.4

0.5

0.6Average RMS error

Random network

Random network

Xt+1( gt+1)

Xt+1(E(St, ugt ))

(a)


0.2

0.4

0.6

0.8

1.0

Averag

e RM

S de

vian

ce

Random network

(b)

Figure 6.8: (a) Average RMS error when GAN is used as inverse model. A 95% confi-dence interval is included in the graph. (b) The average difference between the angularconfigurations as predicted by the generator and the actual configurations given initialstate and generated action.

6.6 Exploring the latent space

Walking on the latent manifold and observing the output gave information of howthe mapping from the latent space was structured. In Fig. 6.9 the variance of eachoutput variable (i.e. each variable in the movement samples) is shown as responseto variation in the latent variables, one at the time. Each row shows the variation ofthe output variables as response to changes in a specific latent variable. For each row,there is a coloumn for each variable in the output (generated movement sample). Forexample, z0 (top row) mostly affects θ2,t and variables affected by θ2,t. It seems as theGAN has encoded each of the six variables θ1,t, θ2,t, ˙θ1,t, ˙θ2,t, u1,t and u2,t into separatelatent variables. From table 6.5 it is clear that the learning process has structured thelatent space into a semantic and easily controlled representation of the output vari-ables, where gradually changing one latent variable gradually changes the affectedoutput variables shown in Fig. 6.9.

47

z00.00

0.25

z10.00

0.25

z20.00

0.25

z30.00

0.25

z40.00

0.25

X1, t X2, t Θ1, t Θ2, t Θ1, t Θ2, tu1, t u2, t X1, t1 X2, t1 Θ1, t1 Θ2, t1 Θ1, t1 Θ2, t1

z5

0.00

0.25

Varia

nce

Figure 6.9: The variance of each output node of the generator as response to variationof one dimension in the latent. Note that below each graph, the latent dimensionwhich was varied is shown.

48

zi = −1 zi = −0.4 zi = 0 zi = 0.4 zi = 1

i = 0

i = 1

i = 2

i = 3

i = 4

i = 5

Table 6.5: Generated samples as one element in the latent space is varied while theother elements are kept fixed (zoom in on pdf). Blue arm: St from g. Green arm: trueSt+1 given St. Red arm: St+1 from g.

49

Chapter 7

Discussion and Conclusions

7.1 Discussion of experiments

In this degree project, it was empirically shown that it is possible to train GANs to pro-duce novel movement samples which are plausible given the simulator environment.At least, the networks produced more realistic samples after training than before, seetable 6.1. A possible interpretation of this is that the generator networks have internal-ized the dynamics of the simulator environment. In addition, the results in chapter 6.2and 6.3 suggest GANs could be used as integrated forward and inverse models. Thatis, the same trained model can be used both for forward and inverse tasks. These re-sults are robust across a range of different (small) architectures. Although not achiev-ing perfect performance, most networks converged during the pre-specified trainingphase and all performed better than the random (untrained) networks. Some initialexperiments, not covered in this report, also suggest that the results could be robustacross a range of different dimensions of the latent space Z .

Although the simulator environment was a deliberately simplistic model, the genera-tive models faced some non-trivial tasks. In contrast to the evaluation during training(where the future states also were compared to the ground truths), the forward task re-quires the networks to reconstruct samples which they have not previously observed.This is a subtle but important difference, as it tests the generator’s ability to generalizeoutside of previously observed samples. Generalization is an important ability, as itis desirable that a generative model does not only memorize movement samples fromthe training set but actually internalizes the dynamics of the environment. Doubtshave been raised to the generalizing capabilities of current GANs [3, 4]. Althoughgeneralization has not yet been explicitly investigated in this project, the performanceon the forward and inverse tasks indicates a capability to generalize. This is further

50

strengthen by the correlation between training error and forward and inverse errors,see Fig. 6.6. If the networks only memorized samples from the training set, then thenetworks would only reproduce real samples. Consequently, the training error wouldnot correlate with the errors on the forward and inverse tasks which use new, previ-ously unobserved samples.

In addition to the problem of generalization, the inverse task required the modelsto solve non-uniquely determined problems. The trained networks also improved onthese problems, see Fig. 6.4a and Fig. 6.6. However, given the loss function in Eq.5.3, a large number of different arm configurations were observed achieving roughlythe same loss. In effect, this meant that the final configuration was more or less ran-domly selected among a set of possible configurations. To resolve this dilemma anddemonstrate the flexibility of the reconstruction approach, the loss function (Eq. 5.3)was augmented with an additional penalizing term, see Eq. 5.9. By penalizing thesum of squared motor commands, the trained network found roughly as accurate so-lutions to the inverse tasks as before but used significantly smaller motor commands,see Fig. 6.7. In addition, it also diversified the losses of different configurations. Thisexperiment also demonstrates one of the strengths of the GAN approach, that the net-works can be trained using random movement samples to learn the dynamics of theenvironment and that desired behavior can be imposed after training as part of thebackpropagation.

From Fig. 6.9 it is clear that the generator has embedded the six variables of θθθt, θθθt anduuut into separate and independent latent variables. This can be interpreted as the GANdisentangling the factors of variation, as those variables are also the fewest needed tofully determine the full sample {St,uuut, St+1}. Furthermore, table 6.5 suggests that theGAN which achieves lowest training error structured the latent space Z in an non-distributed and easily controllable manner. The robustness of this result has not beeninvestigated.

7.2 Discussion of implementation

In this project, only small networks were investigated. In general, the smaller net-works tended to perform on par with or better than the "larger" networks, as shownin table 6.1. In fact, in this project the GAN with lowest data generation error wastrained with the smallest discriminator network. However, the hyperparameters werenot thoroughly optimized and many standard machine learning methods, such asdrop-out and batch normalization, were ignored. Thus, by introducing these typesof methods and optimizing the hyperparameters, it should be possible to train largernetworks, with larger computational capacities, to achieve lower errors.

51

An important discussion is how sensitive the results are to changes in parameters.Parameters such as network architecture and the dimensionality of the latent spacehave already been discussed, see chapter 7.1. Other parameters of interest includelearning rate, weight clipping etc. Validating the effects of all these parameters areunfortunately not feasible. However, initial testing suggests that training GANs usinglarger or smaller learning rates still produces GANs which perform better than theiruntrained counterparts. This result also holds for changes in weight clipping. Otherhyperparameters, such as RMSProp decay, were not altered. The effects of changes inarm parameters were not investigated either but were left to further investigation.

7.3 Difficulties

In the first phase of the project, the original formulation of GAN was implemented.However, the networks were prone to mode collapse and were not successful at learn-ing the dynamics of the environment. This motivated the switch to train WGANsinstead, which proved to be a more stable training algorithm.

When training WGANs, it became clear that the generator networks had most diffi-culty learning the mapping of the end-point coordinates, XXX , of the arm. This was notentirely unexpected due to the non-linear relations to the other variables. The map-ping of the X1 variable was particularly difficult. However, this difficulty was allevi-ated by changing the allowed range of θ1,t in the sampled training data, from [−π

4, π

4]

to [0, π2]. This change of range meant that the sampled X1,t coordinates had a higher

variance, which appears to have had a positive impact on learning the mapping of thatvariable.

The evaluation of the inverse model was also less intuitive than the evaluation of theforward model. Due to the redundancy of the configuration space, it was not feasibleto simply compare the reconstructed sample to the original. Instead, this project mea-sured how accurately the motor commands and the angular configurations reachedthe target end-point, respectively. There are other methods to measure the perfor-mance of the inverse model, but in this project these two measures were consideredboth simple and adequate.

52

7.4 Connection to Optimal control?

In this section, I will argue that the inverse model with an additional penalizing term,e.g. Eq. 5.9, could be investigated as an approximation method to optimal control.Optimal control has previously been used to model human motor behavior, see 2.1.

A general formulation of a first-order optimal control problem was presented in Eq.2.3 and Eq. 2.4. A discretized version can be formulated as

J = Θ(xtf , tf ) +

tf∑t=t0

L(xt, ut)

xt+1 = f(xt, ut)

xmin ≤ xt ≤ xmax

umin ≤ ut ≤ umax,

(7.1)

where xt is some state, ut some action and t some discretized time. f is a set of dynam-ical equations governing the evolution of the system. Θ(xtf , tf ) encodes the desiredgoal state and L(xt, t) is an optional penalizing term.

In the context of this project, an optimal control problem could be posed as findingthe motor commands uuut for t = t0, ..., tf which create an optimal state trajectory S∗t fort = t0, ..., tf , given some loss function J = Θ(Stf , tf ) +

∑tft=t0 L(St,uuut). The state trajec-

tory and motor commands must relate to each other according to the dynamics of thesimulator environment. Thus, xt+1 = f(xt, ut) in Eq. 7.1 corresponds, in this setting, toSt+1 = E(St,uuut). Furthermore, in neuroscience, a standard penalizing term is the sumof square motor commands L = ||uuut||2.

To see why the GANs potentially could approximate optimal trajectories using theinverse model with an extra penalizing term, first let us remind ourself that the lossfunction Eq. 5.9 was defined as

L(z) = Lcontextual(z) + λLperceptual(z) + βLpenalizing(z)

= |g(z)�M inverse − yinverse| − λD(g(z)) + β|g(z)�Mut |2.(7.2)

The contextual loss term,

Lcontextual(z) = |g(z)�M inverse − yinverse|,

encodes the approximated end-point and initial state of the arm,

Lcontextual(z) = Θ(XXX tf , tf , St0). (7.3)

53

This term is similar to the goal state term, Θ(Stf , tf ) of the optimal control problem.The penalizing loss term

Lpenalizing(z) = β|g(z)�Mut |2

is simply the sum of squared motor commands, scaled by a factor β. If the penalizingterm in an optimal control problem is the sum of square motor commands, then

Lpenalizing(z) =

tf∑t=t0

L(St,uuut, t). (7.4)

Note that the penalizing term in Eq. 5.9 can be changed to match that of the optimalcontrol problem, if necessary.

In optimal control, the progression of states must adhere to the dynamical constraintsof f . This is a hard constraint, a constraint which cannot be violated. The backprop-agation method lacks such hard constraint. Instead, a soft constraint is introducedby

Lperceptual(z) = −λD(g(z)).

In the case of this project, the set of equations f would correspond to the dynamicalequations E(St,uuut) defined by the simulator environment. The perceptual loss pe-nalizes samples which are not likely sampled from the distribution of real movementsamples. In other words, the backpropagation method does not have a hard constraintforcing the reconstructed sample to follow the dynamics of the environment. Instead,it has a loss term which penalizes samples which the discriminator does not believeare real. Thus, implicitly, the perceptual loss penalizes generated samples which donot adhere to the dynamics of the environment. Note that the boundaries in Eq. 7.1can be included in the loss function by introducing step functions which penalize in-valid parameters with some very high loss.

In conclusion, the backpropagation formulation is quite similar to the structure of theoptimal control formulation, see Eq. 7.1. The main differences are that the hard con-straint St+1 = E(St,uuut) is replaced by a soft constraint Lperceptual(z) and that also theinitial state, St0 , has to be approximated. Therefore, I argue that the solutions found byEq. 5.9 could approximate optimal trajectories. Importantly, such solutions would beobtained without the generative model being trained on optimal trajectories. Instead,the trajectory is selected based on the loss function.

7.5 Future development

GAN’s biological connection is currently unclear. There is today no available sugges-tion of how the brain could implement GANs. However, recent research suggests that

54

some deep learning techniques, such as gradient decent, could be implemented in thebrain. For an overview of the topic of integrating neuroscience and deep learning, see[41]. Thus, investigating the plausibility of GAN as a biological model is left to futureendeavors.

Of course, it can be of interest to extend the current networks to generate move-ment samples of higher dimensions (e.g. greater number of effectors or more timesteps). GANs have successfully been applied to high dimensional data generationin other domains, and therefore it is plausible GANs can be used to model high di-mensional movement, as well. Furthermore, a more complex environment should beused to more accurately simulate the different properties of movement. In addition,in this project there were no targets or behavior required in the control of the arm.The WGAN model could however be extended, e.g. with Conditional GAN [43], tomodel target reaches similar to studies done with human subjects. The model couldalso be extended with some reinforcement algorithm for other types of goal directedbehavior. It could be of interest to integrate this type of generative networks into aperception-action system for robotic control.

55

Bibliography

[1] Martin Arjovsky and Léon Bottou. Towards principled methods for training gen-erative adversarial networks. arXiv preprint arXiv:1701.04862, 2017.

[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXivpreprint arXiv:1701.07875, 2017.

[3] Sanjeev Arora, Rong Ge, Yingyu Liang, Tengyu Ma, and Yi Zhang. Gener-alization and equilibrium in generative adversarial nets (gans). arXiv preprintarXiv:1703.00573, 2017.

[4] Sanjeev Arora and Yi Zhang. Do gans actually learn the distribution? an empiricalstudy. arXiv preprint arXiv:1706.08224, 2017.

[5] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning:A review and new perspectives. IEEE transactions on pattern analysis and machineintelligence, 35(8):1798–1828, 2013.

[6] JM Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman, AFM Smith, andM West. Generative or discriminative? getting the best of both worlds. Bayesianstatistics, 8:3–24, 2007.

[7] Max Berniker, Anthony Jarc, Emilio Bizzi, and Matthew C Tresch. Simplifiedand effective motor control based on muscle synergies to exploit musculoskeletaldynamics. Proceedings of the National Academy of Sciences, 106(18):7601–7606, 2009.

[8] Max Berniker and Konrad P Kording. Deep networks for motor control functions.Frontiers in computational neuroscience, 9, 2015.

[9] Nikolai Aleksandrovich Bernstein. Human motor actions: Bernstein reassessed.North-Holland, 1984.

[10] C Bishop. Pattern recognition and machine learning (information science andstatistics), 1st edn. 2006. corr. 2nd printing edn. Springer, New York, 2007.

[11] Andreja Bubic, D Yves Von Cramon, and Ricarda I Schubotz. Prediction, cogni-tion and the brain. Frontiers in human neuroscience, 4:25, 2010.

56

[12] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and PieterAbbeel. Infogan: Interpretable representation learning by information maximiz-ing generative adversarial nets. In Advances in Neural Information Processing Sys-tems, pages 2172–2180, 2016.

[13] Mark M Churchland, John P Cunningham, Matthew T Kaufman, Justin D Fos-ter, Paul Nuyujukian, Stephen I Ryu, and Krishna V Shenoy. Neural populationdynamics during reaching. Nature, 487(7405):51–56, 2012.

[14] Mark M Churchland, Gopal Santhanam, and Krishna V Shenoy. Preparatory ac-tivity in premotor and motor cortex reflects the speed of the upcoming reach.Journal of Neurophysiology, 96(6):3130–3146, 2006.

[15] Paul Cisek. Cortical mechanisms of action selection: the affordance competi-tion hypothesis. Philosophical Transactions of the Royal Society B: Biological Sciences,362(1485):1585–1599, 2007.

[16] Andy Clark. Whatever next? predictive brains, situated agents, and the future ofcognitive science. Behavioral and Brain Sciences, 36(03):181–204, 2013.

[17] Antonia Creswell and Anil Anthony Bharath. Inverting the generator of a gener-ative adversarial network. arXiv preprint arXiv:1611.05644, 2016.

[18] Jörn Diedrichsen, Reza Shadmehr, and Richard B Ivry. The coordination of move-ment: optimal feedback control and beyond. Trends in cognitive sciences, 14(1):31–39, 2010.

[19] Jeff Donahue, Philipp Krähenbühl, and Trevor Darrell. Adversarial feature learn-ing. arXiv preprint arXiv:1605.09782, 2016.

[20] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Alex Lamb, Martin Arjovsky,Olivier Mastropietro, and Aaron Courville. Adversarially learned inference.arXiv preprint arXiv:1606.00704, 2016.

[21] J Randall Flanagan and Alan M Wing. Modulation of grip force with load forceduring point-to-point arm movements. Experimental Brain Research, 95(1):131–143,1993.

[22] Karl Friston, Jérémie Mattout, and James Kilner. Action understanding and activeinference. Biological cybernetics, 104(1):137–160, 2011.

[23] Karl J Friston, Jean Daunizeau, James Kilner, and Stefan J Kiebel. Action andbehavior: a free-energy formulation. Biological cybernetics, 102(3):227–260, 2010.

[24] QG Fu, D Flament, JD Coltz, and TJ Ebner. Temporal encoding of movementkinematics in the discharge of primate primary motor and premotor neurons.Journal of Neurophysiology, 73(2):836–854, 1995.

57

[25] Antje Gentsch, Arne Weber, Matthis Synofzik, Gottfried Vosgerau, and SimoneSchütz-Bosbach. Towards a common framework of grounded action cognition:Relating motor control, perception and cognition. Cognition, 146:81–89, 2016.

[26] Alison L Gibbs and Francis Edward Su. On choosing and bounding probabilitymetrics. International statistical review, 70(3):419–435, 2002.

[27] Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXivpreprint arXiv:1701.00160, 2016.

[28] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press,2016. http://www.deeplearningbook.org.

[29] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver-sarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q.Weinberger, editors, Advances in Neural Information Processing Systems 27, pages2672–2680. Curran Associates, Inc., 2014.

[30] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and AaronCourville. Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028,2017.

[31] Nicholas G Hatsopoulos and Yali Amit. Synthesizing complex movement frag-ment representations from motor cortical ensembles. Journal of Physiology-Paris,106(3):112–119, 2012.

[32] Neville Hogan. An organizing principle for a class of voluntary movements. Jour-nal of Neuroscience, 4(11):2745–2754, 1984.

[33] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXivpreprint arXiv:1312.6114, 2013.

[34] Tejas D Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J Gershman.Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396, 2016.

[35] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-basedlearning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[36] Friedrich Liese and Igor Vajda. On divergences and informations in statisticsand information theory. IEEE Transactions on Information Theory, 52(10):4394–4412,2006.

[37] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transac-tions on Information theory, 37(1):145–151, 1991.

58

http://www.deeplearningbook.org

[38] Zachary C Lipton and Subarna Tripathi. Precise recovery of latent vectors fromgenerative adversarial networks. arXiv preprint arXiv:1702.04782, 2017.

[39] Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. Rectifier nonlinearitiesimprove neural network acoustic models. In Proc. ICML, volume 30, 2013.

[40] Donald M MacKay. Towards an information-flow model of human behaviour.British Journal of Psychology, 47(1):30–43, 1956.

[41] Adam H. Marblestone, Greg Wayne, and Konrad P. Kording. Toward an integra-tion of deep learning and neuroscience. Frontiers in Computational Neuroscience,10:94, 2016.

[42] R Christopher Miall and Daniel M Wolpert. Forward models for physiologicalmotor control. Neural networks, 9(8):1265–1279, 1996.

[43] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXivpreprint arXiv:1411.1784, 2014.

[44] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, GeorgOstrovski, et al. Human-level control through deep reinforcement learning. Na-ture, 518(7540):529–533, 2015.

[45] Mustafa Mustafa, Deborah Bard, Wahid Bhimji, Rami Al-Rfou, and Zarija Lukic.Creating virtual universes using generative adversarial networks. arXiv preprintarXiv:1706.02390, 2017.

[46] XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating di-vergence functionals and the likelihood ratio by convex risk minimization. IEEETransactions on Information Theory, 56(11):5847–5861, 2010.

[47] Alva Noë. Action in perception. MIT press, 2004.

[48] Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. f-gan: Training genera-tive neural samplers using variational divergence minimization. In Advances inNeural Information Processing Systems, pages 271–279, 2016.

[49] Luiz Pessoa. Understanding brain networks and brain organization. Physics oflife reviews, 11(3):400–435, 2014.

[50] Martin J Pickering and Andy Clark. Getting ahead: forward models and theirplace in cognitive architecture. Trends in cognitive sciences, 18(9):451–456, 2014.

[51] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. A metric for distributionswith applications to image databases. In Computer Vision, 1998. Sixth InternationalConference on, pages 59–66. IEEE, 1998.

59

[52] Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas. The earth mover’s distanceas a metric for image retrieval. International journal of computer vision, 40(2):99–121,2000.

[53] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford,and Xi Chen. Improved techniques for training gans. In Advances in Neural Infor-mation Processing Systems, pages 2226–2234, 2016.

[54] Krishna V Shenoy, Maneesh Sahani, and Mark M Churchland. Cortical control ofarm movements: a dynamical systems perspective. Annual review of neuroscience,36:337–359, 2013.

[55] Jason L Speyer and David H Jacobson. Primer on optimal control theory. Society forIndustrial and Applied Mathematics, 2010.

[56] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradientby a running average of its recent magnitude. COURSERA: Neural networks formachine learning, 4(2):26–31, 2012.

[57] Zhuowen Tu and Song-Chun Zhu. Image segmentation by data-driven markovchain monte carlo. IEEE Transactions on pattern analysis and machine intelligence,24(5):657–673, 2002.

[58] Michael T Turvey. Issues in the theory of action: Degree of freedom, coordinativestructures and coalitions. Attention and performance, pages 557–595, 1978.

[59] Aäron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan,Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and KorayKavukcuoglu. Wavenet: A generative model for raw audio. CoRR abs/1609.03499,2016.

[60] Cédric Villani. Optimal transport: old and new, volume 338. Springer Science &Business Media, 2008.

[61] Daniel M Wolpert, Zoubin Ghahramani, and Michael I Jordan. An internal modelfor sensorimotor integration. Science, 269(5232):1880, 1995.

[62] Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. Improving neural machinetranslation with conditional sequence generative adversarial nets. arXiv preprintarXiv:1703.04887, 2017.

[63] Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, and Minh NDo. Semantic image inpainting with perceptual and contextual losses. arXivpreprint arXiv:1607.07539, 2016.

[64] Alan Yuille and Daniel Kersten. Vision as bayesian inference: analysis by synthe-sis? Trends in cognitive sciences, 10(7):301–308, 2006.

60

[65] Felix E Zajac. Muscle coordination of movement: a perspective. Journal of biome-chanics, 26:109–124, 1993.

61

www.kth.se

Documents

Generative adversarial networks as integrated forward and inverse …1169168/... · 2017-12-22 · could, in the future, be used to test ideas proposed by the theories of human motor