A COMPUTATIONAL MODEL FOR SIMULATION OF EMPATHY … · como uma propriedade emergente de sistemas multiagentes. Nossos resultados tamb em indicam a viabilidade do M odulo de Empatia

Thesis presented to the Instituto Tecnológico de Aeronáutica, in partial

fulfillment of the requirements for the degree of Doctor of Science in the

Program of Electronic Engineering and Computation, Field of Informatics.

Fernanda Monteiro Eliott

A COMPUTATIONAL MODEL FOR SIMULATION

OF EMPATHY AND MORAL BEHAVIOR

Thesis approved in its final version by signatories below:

Prof. Dr. Carlos Henrique Costa Ribeiro

Advisor

Prof. Dr. Luiz Carlos Sandoval Góes

Prorector of Graduate Studies and Research

Campo MontenegroSão José dos Campos, SP - Brazil

2015

Cataloging-in Publication DataDocumentation and Information Division

Eliott, Fernanda MonteiroA Computational Model for Simulation of Empathy and Moral Behavior / Fernanda Monteiro

Eliott.São José dos Campos, 2015.90f.

Thesis of Doctor of Science – Course of Electronic Engineering and Computation. Area ofInformatics – Instituto Tecnológico de Aeronáutica, 2015. Advisor: Prof. Dr. Carlos HenriqueCosta Ribeiro.

1. Arquitetura (computadores). 2. Sistemas multiagentes. 3. Comportamento afetivo.4. Tomada de decisão. 5. Simulação computadorizada. 6. Inteligência artificial. 7. Computação.I. Instituto Tecnológico de Aeronáutica. II. Title.

BIBLIOGRAPHIC REFERENCE

ELIOTT, Fernanda Monteiro. A Computational Model for Simulation ofEmpathy and Moral Behavior. 2015. 90f. Thesis of Doctor of Science – InstitutoTecnológico de Aeronáutica, São José dos Campos.

CESSION OF RIGHTS

AUTHOR’S NAME: Fernanda Monteiro EliottPUBLICATION TITLE: A Computational Model for Simulation of Empathy and MoralBehavior.PUBLICATION KIND/YEAR: Thesis / 2015

It is granted to Instituto Tecnológico de Aeronáutica permission to reproduce copies ofthis thesis and to only loan or to sell copies for academic and scientific purposes. Theauthor reserves other publication rights and no part of this thesis can be reproducedwithout the authorization of the author.

Fernanda Monteiro EliottR. Paulino Blair, 3312232030 – São José dos Campos–SP

A COMPUTATIONAL MODEL FOR SIMULATION

OF EMPATHY AND MORAL BEHAVIOR

Fernanda Monteiro Eliott

Thesis Committee Composition:

Prof. Dr. Paulo André Lima de Castro Chairperson - ITAProf. Dr. Carlos Henrique Costa Ribeiro Advisor - ITAProf. Dr. Jackson Paul Matsuura Internal Member - ITAProf. Dr. Osvaldo Frota Pessoa Júnior External Member - USPProf. Dr. Ricardo Ribeiro Gudwin External Member - UNICAMP

ITA

To my parents.

Acknowledgments

First of all I would like to thank my advisor, Professor Carlos Henrique Costa Ribeiro,

and endear his role in helping my thoughts to turn from abstraction toward a concrete

project. I also would like to thank the entire Aeronautics Institute of Technology (ITA) for

its enormous contribution to the development of my knowledge while making my evolution

in a multidisciplinary field possible.

I would like to emphasize the role of the Philosophy dept. of the University of São

Paulo (USP) in my academic background. Always Thankful.

I am also thankful to Professor Briseida Dôgo de Resende (USP, Experimental Psy-

chology dept.) for taking me as a guest in her class and research group. It was a priceless

experience.

My family was all the way around me. I appreciate and hope that process to continue.

Finally, I would like to thank CNPQ for the financial support.

“Disons donc que, si toutes chosesdeviennent naturelles à l’homme

lorsqu’il s’y habitue, seul reste danssa nature celui qui ne désire que les

choses simples et non altérées. Ainsila première raison de la servitude

volontaire, c’est l’habitude.”— Étienne de La Boétie, 1576.

Resumo

Emoções e sentimentos são considerados cruciais no processo de decisão humana in-

teligente. Em particular, as emoções sociais nos ajudariam a reforçar o grupo e a cooperar.

Ainda é uma questão de debate o que motivaria criaturas biológicas a cooperarem ou não

com seu grupo. Todos os tipos de cooperação ocultariam interesses egóıstas, ou o altrúısmo

realmente existiria? Se nos debruçarmos sobre essas questões a partir de uma perspectiva

humana, acabamos passando por comportamento moral e três tipos de sujeitos: o moral,

imoral e amoral. Se nos movermos de sujeitos biológicos em direção a agentes artificiais,

observamos ser uma questão complexa ficar ileso a mecanismos ad-hoc a fim de atingir

cooperação em abordagens computacionais baseadas em utilidade. Decidimos nos inspirar

em comportamento moral como uma forma de buscar a cooperação em Sistemas Multi-

agentes. Nossa hipótese principal baseia-se na ideia de que a cooperação pode surgir a

partir do aux́ılio de emoções e comportamento moral durante o processo de tomada de

decisões - mesmo quando comportamento egóısta é recompensado por altos reforços. A

analogia com o comportamento moral é promovida através da simulação do sentimento

de empatia.A importância do sentimento de empatia consiste na sua função em regular

as prioridades dos agentes, permitindo a seleção de ações que, talvez, não sejam a melhor

seleção egóısta, uma vez que uma tomada de decisão não egóısta possa ser crucial para

equalizar as interações entre os agentes e resultar em cooperação. Descreveremos aqui

nossa arquitetura computacional multiagente bioinspirada (denominada MultiA), com-

posta por emoções artificiais, sentimentos e por um Módulo de Empatia responsável por

fornecer uma seleção de ações que, rudimentarmente, imite comportamento moral. Infor-

mação sensorial é acionada pelo meio ambiente e, então, a arquitetura computacional a

transforma em emoções e sentimentos artificiais básicos e sociais. Posteriormente, através

do módulo de empatia, suas próprias emoções são empregadas para estimar o estado at-

ual de outros agentes. E então, seus sentimentos artificiais proporcionam uma medida

(denominada bem-estar) do seu desempenho em resposta ao ambiente. Através daquela

medida e de técnicas de aprendizado por reforço, a arquitetura aprende um mapeamento

entre emoções e ações. Diante de recompensas para comportamento egóısta, os agentes

MultiA que adotam estratégia cooperativa, o fazem como resultado de um sentimento de

empatia (altos ńıveis de empatia) regulando as prioridades do agente, agindo como um

viii

agente moral. Os agentes MultiA que não adotam a estratégia cooperativa selecionam

ações egóıstas, e o fazem como resultado de baixos ńıveis de empatia, agindo como agente

imoral. O mecanismo de seleção de ação de MultiA pode ser alimentado a partir de dois

aspectos. O primeiro está relacionado à cooperação, uma vez que um agente MultiA em

particular tenha uma vizinhança cooperativa. Dessa forma, o agente irá cooperar por

reciprocidade. O segundo está relacionado à não-cooperação, uma vez que o entorno é

não-cooperativo (agente MultiA não cooperativo por reciprocidade). Portanto, a arquite-

tura computacional acaba por imitar rudimentarmente agentes morais e imorais. De fato,

obter agentes morais e imorais a partir de uma mesma arquitetura se encaixa em pressu-

postos filosóficos sobre o meio corromper o indiv́ıduo. Dado que relações entre indiv́ıduos

diferentes possam ser representadas por redes, exploramos diferentes topologias de rede

para caracterizar as interações agente-agente, definindo a vizinhança dos mesmos. A fim

de avaliar nossa arquitetura, utilizamos uma versão de um jogo evolutivo que aplica o

jogo do dilema do prisioneiro para estabelecer as alterações sobre a topologia da rede.

Os resultados indicam que, apesar de MultiA também imitar rudimentarmente agentes

imorais, um número suficiente de agentes MultiA seguiram em outra direção, assim,

através da cooperação, mantiveram a estrutura da rede da vizinhança. Portanto, estraté-

gias baseadas em simulação de comportamento moral podem auxiliar na diminuição da

recompensa interna advinda de uma seleção de ação egóısta, favorecendo a cooperação

como uma propriedade emergente de sistemas multiagentes. Nossos resultados também

indicam a viabilidade do Módulo de Empatia e coerência entre a experiência do agente e a

poĺıtica de ação adotada. Intensificamos os parâmetros de teste e ainda assim obtivemos

um número substancial de agentes MultiA cooperativos. Mas, adicionalmente, obtivemos

agentes MultiA não-cooperativos, o que decorreu também do efeito de ocultamento de

estratégia. Este consiste em um problema importante que interfere na poĺıtica de ação de

agentes MultiA. Em relação ao paradigma de reciprocidade sobre o projeto de MultiA,

este se destacou através da prevenção de efeito de falha em cascata em redes descritas por

uma correlação de grau quase neutra, auxiliando os agentes a serem melhor sucedidos em

espelhar a condição dos vizinhos. Nossos resultados confirmam empiricamente a influência

do Módulo de Empatia sobre o Sistema de Decisão de MultiA.

Abstract

Emotions and feelings are now considered as decisive in the human intelligent decision

process. In particular, social emotions would help us to enhance the group and cooperate.

It is still a matter of debate the what that motivates biological creatures to cooperate or

not with their group. Would all kinds of cooperation hide a selfish interest, or would it

exist truly altruism? If we pore over those questions from a human perspective, we end

up passing through moral behavior and three kinds of individuals: the moral, immoral

and amoral. If we move from biological subjects onto artificial agents, it is a complex

matter to go without ad hoc mechanisms to bring up cooperation in utility-based com-

putational approaches. We decided to take inspiration from moral behavior as a way of

moving toward cooperation in Multiagent Systems. Our leading hypothesis relies on the

idea that cooperation can emerge from the assistance of emotions and moral behavior

during the process of decision making - even when selfish behavior is rewarded by high

reinforcements. The analogy with moral behavior is promoted through simulating the

feeling of empathy. The importance of the empathy feeling is its function on regulating

the agents priorities, enabling the selection of actions that may not be the best selfish

selection, since non selfish decision making may be crucial to equalize the interactions

among agents and bring up cooperation. We depict herein our bioinspired computational

multiagent architecture (so-called MultiA) composed by artificial emotions, feelings and

by an Empathy Module responsible for providing an action selection mechanism that rudi-

mentarilly mimic both moral and immoral behaviors. Sensorial information is triggered by

the environment, then, the computational architecture transforms it into basic and social

artificial emotions and feelings. Thereat its own emotions are employed to estimate the

current state of other agents through an Empathy module. Finally, its artificial feelings

provide a measure (termed well-being) of its performance in response to the environment.

Through that measure and reinforcement learning techniques, the architecture learns a

mapping from emotions to actions. While facing high rewards to selfish behavior, the

MultiA agents that adopt the cooperative strategy do so from the result of an empathy

feeling (high empathy levels) regulating the agents priorities, acting as a moral agent. The

MultiA agents that do not adopt the cooperative strategy select selfish actions, and do so

as a result of low empathy levels, acting as an immoral agent. The MultiA mechanism of

x

action selection can be moved from two aspects. The first is related to cooperation, once

the particular MultiA agent has a cooperative neighborhood. Then, the agent will coop-

erate by reciprocity. The second is related to non-cooperation, since the surrounding is

non-cooperative (non-cooperative MultiA agent by reciprocity). Thus, our computational

architecture actually rudimentarilly mimics both moral and immoral agents. But, as a

matter of fact, achieving moral and immoral agents from the very same architecture fits

philosophical assumptions about the environment corrupting the individual. As relations

between different subjects can be represented by networks, we explored varied network

topologies that can characterize the agent-agent interactions, by defining the agents neigh-

borhood. For assessment of our architecture, we use a version of an evolutionary game

that applies the prisoner dilemma paradigm to establish changes over the network topol-

ogy. Our results show that, even though MultiA can also mimic immoral behavior, it is

more likely to mimic moral behavior. Then, in each experiment, an enough number of

MultiA agents mimicked moral agents to solve the task. Thus, through cooperation, they

kept the neighboring network structure. Therefore, strategies relied upon simulation of

moral behavior may help to decrease the internal reward from selfish selection of actions,

thus favoring cooperation as an emergent property of multiagent systems. Our results

also indicate the Empathy Module feasibility and coherence between the agent experience

and the adopted action policy. We tested MultiA agents under stressed parameters and

we still obtained a substantial number of cooperative MultiA agents. We also obtained

non-cooperative MultiA agents and that was also due to the shadow strategy effect. The

shadow strategy effect is one important problem interfering on the MultiA agents action

policy. Regarding the reciprocity paradigm over the MultiA design, it was particularly

highlighted through preventing a cascading failure effect on networks described by an al-

most neutral degree correlation, aiding the agents on being more successful on mirroring

neighbors current condition. Our results empirically confirm the influence of the Empathy

Module on MultiA Decision System.

List of Figures

FIGURE 3.1 – The general scheme of the MultiA Architecture. . . . . . . . . . . . 32

FIGURE 3.2 – The Learning Module of agent i (represented by the black box) pro-

vides the estimated Well-Being values for each available action if it

is going to be executed in response to an interaction with neighbor

p at match t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

FIGURE 3.3 – The ANNs structure from Learning Module, CS. . . . . . . . . . . . 41

FIGURE 3.4 – a) Agent i and neighbor p are going to interact. The Learning Mod-

ule of agent i provides Qipt(Eip

t, k) and the DS chooses to execute

Action B. b)The agents interact and the PS of agent i calculates

the value of Wi. c) Now agent i will interact with its next neighbor,

neighbor p + 1. The Learning Module provides the new values of

Qip+ 1t(Eip+ 1

t, k). Before the DS chooses one action, the Learn-

ing Module will update (through the Backpropagation algorithm)

the weights of the ANN indexed to action B. After being updated,

the ANN indexed to action B will re-calculate Qip+ 1t(Eip+ 1

t, B).

Now the output values will be sent to the DS. . . . . . . . . . . . . 42

FIGURE 3.5 – CS: The structure of the Empathy Module. . . . . . . . . . . . . . 45

FIGURE 3.6 – CS: The reciprocity assumption and the Empathy Module. . . . . . 45

FIGURE 4.1 – The general scheme of MultiAA. . . . . . . . . . . . . . . . . . . . . 55

FIGURE 4.2 – At match t: 20 agents (4 defectors, 16 cooperators). . . . . . . . . . 56

FIGURE 4.3 – At match t, just before match t + 1: 19 agents (3 defectors, 15

cooperators). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

FIGURE 4.4 – Exp.1, MultiA: crossing strategies and ρf at each match. . . . . . . 60

FIGURE 4.5 – Exp.1, MultiAA: crossing strategies and ρf at each match. . . . . . 60


LIST OF FIGURES xii

FIGURE 4.7 – Exp.2, MultiAA: crossing strategies and ρf at each match. . . . . . 61

FIGURE 4.8 – Exp.2, MultiA final network structure: 7406 agents, 2866 defec-

tors(red nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


FIGURE 4.10 –Exp.3, MultiAA: crossing strategies and ρf at each match. . . . . . 62

FIGURE 4.11 –Agents Final Results. Defectors are represented by the red color and

cooperators by blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

FIGURE 4.12 –Exp.3, MultiA final network structure: 215 agents, 2 defectors(red

nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

FIGURE 4.13 –Exp.3, MultiAA final network structure: 30 agents, all cooperators. 63

FIGURE 4.14 –Exp.4, MultiA: crossing strategies and ρf at each match. Final

values: ρf = 80%, ρd = 52%, ρc = 48%. . . . . . . . . . . . . . . . . 63

FIGURE 4.15 –Exp.4, MultiAA: crossing strategies and ρf at each match. Final

values: ρf = 57%, ρd = 65%, ρc = 35%. . . . . . . . . . . . . . . . . 63

FIGURE 4.16 –Exp.4: graphics produced at matches t = 42 and t = 55. . . . . . . . 67

FIGURE 4.17 –Exp.4: graphics produced at matches t = 60 and t = 68. . . . . . . . 67

FIGURE 4.18 –Different values of m and MultiA performance. . . . . . . . . . . . 72

FIGURE 4.19 –Exp.5: MultiAA and MultiA non-failed simulations regarding dif-

ferent values of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

FIGURE 4.20 –Exp.6: MultiAA and MultiA non-failed simulations regarding dif-

ferent values of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

FIGURE 4.21 –Exp.7a (left) and 7b (right): graphics produced at match t = 55.

Cooperators are represented by the blue color and defectors by the

red color. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

FIGURE 4.22 –Exp.7a (left) and 7b (right): graphics produced at match t = 68.

Cooperators are represented by the blue color and defectors by the

red color. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

List of Tables

TABLE 3.1 – Basic Emotions and the Artificial Basic Emotions of MultiA . . . . 35

TABLE 3.2 – Social Emotions and the Artificial Social Emotions of MultiA . . . 37

TABLE 3.3 – Updating Es1,i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

TABLE 3.4 – Artificial Feelings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

TABLE 3.5 – The Calculation of Ypi: If Mip < 0.5 . . . . . . . . . . . . . . . . . . 43

TABLE 3.6 – The Calculation of Ypi: If Mip >= 0.5 . . . . . . . . . . . . . . . . . 43

TABLE 4.1 – Game Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

TABLE 4.2 – Experimental Parameters, Lattice 2D4N . . . . . . . . . . . . . . . . 57

TABLE 4.3 – Data Graphic Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

TABLE 4.4 – Parameters for the experiments, Sect.4.3 . . . . . . . . . . . . . . . 71

Contents

1 Thesis Introduction and Statement . . . . . . . . . . . . . . 16

1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1 Artificial Moral Agents (AMAs) . . . . . . . . . . . . . . . . . . . . . 24

2.1.1 MultiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 MultiA: A Computational Model for Simulation ofEmpathy and Moral Behavior . . . . . . . . . . . . . . . . . . 31

3.1 MultiA Functioning: an Overview . . . . . . . . . . . . . . . . . . . . 31

3.1.1 MultiA and an Interaction Game . . . . . . . . . . . . . . . . . . . . 34

3.2 The Systems of the MultiA Architecture . . . . . . . . . . . . . . . 34

3.2.1 Perceptive System (PS) . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.2 Cognitive System (CS) . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.3 Decision System (DS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3 ALEC Influences on MultiA . . . . . . . . . . . . . . . . . . . . . . . 46

4 Experimental Setup and Results . . . . . . . . . . . . . . . . 48

4.1 MultiA in an Evolutionary Game . . . . . . . . . . . . . . . . . . . . 48

4.1.1 MultiA Experimental Set Up and Amoral Version . . . . . . . . . . . 52

4.2 Results: Lattice Network . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3 Results: Assortativity Coefficient and Moral Agent Performance 68

4.3.1 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 69

CONTENTS xv

4.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.1 Future Work and Relevance . . . . . . . . . . . . . . . . . . . . . . . . 82

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Appendix A – Publications . . . . . . . . . . . . . . . . . . . . . . 90

1 Thesis Introduction and

Statement

Living beings can set forth a complex behavior to such an extent that it stimulates

research. Despite the incongruous reasoning about it, living beings behavior can inspire

a system intuitive design to handle complex matters. The opportunity of conceiving and

modeling artificial moral behavior and empathy arises from the perspective deterioration

of an immaterial soul partaking in the moral behavior process. A biological and philosoph-

ical exam should compose the search for developing a coherent and intuitive bioinspired

computational multiagent architecture that seeks to mimic moral behavior. Likewise, it

has to be scrutinized the embodied emplacement vis-à-vis the pursued theoretical refer-

ences.

In a simplistic approach, moral behavior can be described by the act of following the

set of rules from the group, keeping it cohesive, while new customs progressive annexation

can change that set. By reiterating a custom and naturally incorporating it among our

thoughts, we are actually submitting ourselves to it and establishing it; thereat, the laws

of consciousness essentially came about from custom, and not from nature (MONTAIGNE,

2013 (1580)a). Inadvertently, we may link cooperation and the willingness to do it. Ac-

cording to Tomasello and Vaish (TOMASELLO; VAISH, 2013), if we use an evolutionary

complexion, morality can be presumed as a kind of cooperation, as the association of

skills and reasons for cooperation would provide the emergence of morality. Thus, coop-

eration would demand the individual’s self-interest equalization with that of the others,

or its suppression. Regarding a group composition and its members accompaniment,

Tomasello (TOMASELLO, 2011) ponders over cooperation as a sewing up action, connect-

ing the members of the group. Since the cooperation among living things comprehends

complex matters, it reveals itself as a field of study. Likewise, thinking through utility-

based computational approaches, the emergence of cooperation is not easily achievable

(see Sect. 2).

If we conceive moral behavior as a form of cooperation (by going after the set of rules

from the group) built upon customs among emotions and feelings, it brings up an intuitive

CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 17

line of reasoning to pursue while modeling a bioinspired computational architecture sup-

posed to mimic moral behavior. Thereat, how to design an autonomous artificial agent

able to socially interact and deal with conflicting tasks that require emotional guidance

to be solved? A computational agent that incorporates artificial emotional and moral in-

telligences can lead to ways of achieving cooperation between artificial creatures or from

artificial toward biological.

Regarding the empathy exercise, we can try to divide individuals into three groups:

moral, immoral and amoral. Unpretentiously but in a simple approach, the formers have

the social feeling of empathy properly functioning; and the immoral performs actions that

somehow hurt the established moral code of his/her community. On the other hand, the

latter can be interpreted as moral or immoral, depending on his/her social behavior and

may be characterized by important issues on the mechanism that allows the individual

to put himself/herself in the place of the other, and be sympathetic to his/her circum-

stances. There is neurophysiological basis for this classification: according to Kandel, et

al. (KANDEL et al., 2000) the lateral orbitofrontal cortex seems to participate in mediating

empathetic and socially appropriate responses, then damage to this area would be asso-

ciated with failure to respond to social cues and produce lack of empathy. A mechanism

that allows the existence of empathy is described in Damásio (DAMÁSIO, 2004) through

the cognitive aspect but, as in Proctor et al. (PROCTOR et al., 2013) and Waal (WAAL,

2009), on the account of the emotional standpoint. Therein, if someone succeeds in de-

veloping an artificial moral agent (AMA), would it be more useful the guidance from an

immoral or moral behavior? The answer shall not be easily given as it would sound. We

can think through philosophical questions, premises and practical goals while designing

AMAs, specifically:

1. With respect to AMAs, could we inspire the design from different aspects? Think

through a multiagent task and action policies oriented by three sets of premises:

moral, immoral and amoral. Then, the design may consider which set, and on what

kind of circumstances, fits better within a particular multiagent system (MAS) task:

• The moral agent cares about all members of its group (considered as neigh-bors), even though that may bring its own punishment. But it may also select

actions with the aim of punishing or isolating a constantly defecting neighbor.

However, in general, moral agents tend to cooperate;

• The immoral agent also cares about neighbors, but it is concerned with theprofit it can make through social attachment. It will mostly cooperate with its

immoral group, but can decide to cooperate with others if it is getting isolated

(to prevent complete isolation or high punishment). Immoral will cooperate (if

so) mostly with its partners;


• The amoral agent can imitate both moral and immoral agents and be morepractical on taking decisions.

2. For different kinds of agents, what is the meaning of selfish actions? An action can

sound as selfish but can be motivated by non-selfish goals, as to punish a member

of the group to keep it healthy. Thus, selfishness can be executed by all three

agent types, the difference relies within the goal behind that action: if it is to make

profit (fits better with immoral and amoral agents); if it belongs to an uncertain or

exploratory phase (all three); if it is to prevent from deep punishment (all three, but

morals in a lower intensity) or, even, to isolate someone from the group (all three);

3. If the agent can observe and differentiate its neighbors, it can learn to respond dif-

ferently to them and to stop cooperating with defectors. Both immoral and amoral

agents may isolate defectors more easily than moral agents. Moreover, amoral

agents, in order to survive and keep neighbors, can mirror moral and immoral agents

for convenience;

4. Relating to the agents action policies, can it lead to relevant difference to the network

structure (considering each agent as a node and their relations as links between

them)?

• As moral agents are naturally cooperative, they are supposed to keep bondswith moral, immoral and amoral agents. Therefore, even after an elimination

process that punishes defectors, the final population of a moral majority will

contemplate a reasonable number of agents from all sets;

• Accordingly, immoral agents will only care about the advantage, if any, ofkeeping bonds with others. Thus, a continuous defecting (or failing) one will

be easily excluded;

• As the amoral agents will imitate a neighbor, they will add uncertainty as theychange strategy.

5. What would we expect from artificial empathy? Would it be convenient to develop

a decision process that tends to something Machiavellian (MACHIAVELLI, 1985

(1532))? What is best: to maintain a failing neighbor in order to not lose it or just

eliminate it?

• What about having an AMA morally hybrid: immoral towards agents that failor delay the task and moral while interacting with living creatures? Hypothe-

size a group of artificial agents supposed to coordinate activities and priorities

to complete a task (as finding an object in a certain environment), if one agent

from the group stops working or fails, it might be better to isolate it. This


may be thought as an analogy to a cut off the artificial empathy feeling about

that one agent. Therefore, the agent that simulates moral behavior will have

the tendency to cooperate but it can be triggered to do otherwise;

• Regarding a hybrid artificial agent that can trigger moral and immoral behav-ior, it might be important to autonomously activate moral action policies with

biological creatures, and immoral otherwise.

Finally, beyond philosophical and biological investigation on morality and human/

machine behavior, practical issues can be addressed through exploring decision making

by AMAs in MAS. While simulating moral behavior, AMAs may be helpful in general

social or domestic assignments, e.g. taking the role of monitoring highly dangerous crim-

inals, people in quarantine or in scenarios where there are social dilemmas to deal with.

Moreover, the artificial empathy from an artificial moral agent could be an additional re-

source in argumentation-based negotiation in MAS. AMAs may also be useful to improve

the responses to general MAS issues stressed by Wooldridge (WOOLDRIDGE, 2009), such

as how to bring up cooperation in societies of self-centered agents; how to recognize a

conflict and then encounter an agreement; or, as highlighted by Matignon (MATIGNON

et al., 2012), the challenges to coordinate the agents activities in order to cooperatively

conquer goals (see Sect. 2).

1.1 Thesis Statement

In Damásio (DAMÁSIO, 1994) emotions and feelings are described as imperative in the

human intelligent decision process. Emotions and feelings importance would also be deci-

sive in helping us to spend less time and to reduce the computational burden while taking

intelligent decisions. In particular, social emotions would help us to enhance the group

and cooperate. We depict herein our bioinspired computational multiagent architecture

(so-called MultiA) composed by artificial emotions, feelings and by an Empathy Mod-

ule responsible for providing an action selection mechanism that rudimentarilly mimics

moral and immoral behavior. It is not trivial to achieve cooperative self-centered agents

in a multiagent task. Our search for mimicking moral behavior, among other things, is

driven to achieve rational agents more likely to cooperate. By responding to the feeling

of empathy, MultiA should be able to produce artificial moral behavior and selecting

cooperative action policies. Our leading hypothesis relies on the idea that cooperation

can emerge from the assistance of emotions and moral behavior during the process of

decision making - even when selfish behavior is rewarded by high reinforcements. The

analogy with moral behavior is promoted through simulating the feeling of empathy. The

importance of such a feeling is its function on regulating MultiA agents priorities, en-


abling the selection of actions that may not be the best selfish selection. Non selfish

decision making may be crucial to equalize the interactions among agents and bring up

cooperation. Given to the multidisciplinary complexity of moral behavior, the compu-

tational simulation of moral behavior may be approached through various angles. We

designed a computational architecture to rudimentarilly mimic both moral and immoral

behaviors and developed an Empathy Module to work as the moral/immoral behavior

engine. The Empathy Module is grounded over reciprocity assumptions. Then, the agent

with a cooperative neighborhood will cooperate by reciprocity. Likewise, the agent with a

non-cooperative neighborhood will also be reciprocal by non-cooperating. Therefore the

reciprocity design can carry on selfish behavior, and not only cooperation (see Ch. 4).

Thus, our computational architecture ends up rudimentarilly mimicking both moral and

immoral agents.

Our results indicate the Empathy Module feasibility and, in environments suitable to

the Empathy Module application (see Sect. 4.3.2), we obtained a considerable convergence

to cooperation. We modified the MultiA architecture to design the MultiAA architecture,

supposed to mimic amoral agents. We obtained interesting coherence between our final

results and immoral, moral and amoral action policy.

1.2 Thesis Structure

This thesis comprehends five chapters. In Ch. 1 we introduce a few reflections on

morality and human/machine behavior. In Ch. 2 we present the background and our

project development permeating issues. In Ch. 3 we detail our bioinspired computational

multiagent architecture designed to rudimentarilly simulate moral and immoral behavior,

and, in Ch. 4, we analyze its performance in a multiagent task under different network

structures - we also present a MultiA modified version, the MultiAA architecture. Finally,

in Ch. 5 we reflect upon the final results and suggest future work. In Appendix A we

detail all publications generated in the context of this thesis.

2 Background

An empathy computational simulation encompasses subjects about which there is dis-

agreement and ignorance. There are convincing but opposite/or conflicting explanations

about mind theory, qualia, consciousness, human universals and morality. Given the com-

plexity and undiscovered matters, we lack a broadly accepted theory for unifying those

subjects - besides, it may involve religious taboos. Therefore, for a detailed statement of

an empathy simulation, we would have to stress familiar themes to philosophy, psychology,

neurophysiology and many other fields - hence, in this thesis, our main focus is to detail

the MultiA architecture.

For the sake of feasibility, computational approaches may seek to summarize the-

oretical references and embody a moral simulation from different perspectives (e.g. a

model may try to mimic moral behavior in robotic environments, or may try to provide

answers in an ethically specific domain). Before discussing our perspective and scrutiniz-

ing our MultiA computational architecture (Sect. 3), we introduce a few computational

models that somehow approach the moral simulation (Sect. 2.1), including MultiA itself

(Sect. 2.1.1).

In the moral architecture proposed herein, we used reinforcement learning techniques

(see Sect. 3.2.2.1), which are based in mapping situations into actions to maximize the re-

inforcement, wherein the agent experience is used as parameter (SUTTON; BARTO, 1998).

The reinforcement consists of a numerical sign given to the agent after having executed

a certain action (including action abstention) in a certain state. Through its experience,

by selecting different actions in different states, the agent under a computational archi-

tecture that implements reinforcement learning techniques must learn to execute state

corresponding actions to maximize the expected sum of reinforcements.

Many difficulties arise if the learning agents have no possibility of charing data to

accomplish a task: they will have to choose strategies based on their own experiences

and, through that, learn to coordinate responses. To consider the agents interactions

in an environment, it is important to be aware of Game Theory well-studied challenges.

According to Shoham and Leyton-Brown (SHOHAM; LEYTON-BROWN, 2009), Game The-

ory would comprehend the mathematical study of the agents interactions, and the agents

CHAPTER 2. BACKGROUND 22

predilections would be explained through the function of available options - note that the

agent predilections may change, specially under uncertain situations. We intend to achieve

self-interested moral agents whose predilection comprehends getting high reinforcements

while avoiding bringing negative outcome to the neighborhood - herein we consider to be

neighbors those agents that may directly interact with each other. To simulate moral be-

havior we will adopt an environment described by more than one state and more than one

agent. Game Theory classical domains may provide environments and interaction descrip-

tions to test the moral agents under our computational architecture. Then, a game from

the literature will be chosen to define our agents environment and moral interactions, the

terminal state and possible agent scenarios. Crucial Game Theory concepts came from

Neumann and Morgenstern (NEUMANN; MORGENSTERN, 1944), such as analysis about

environmental possibilities, difficulties and adequate agent policy response to accomplish

goals.

Matignon (MATIGNON et al., 2012) describes some challenges that have to be over-

passed by the agents (that do not exchange data) coordinate their action selection in

order to provide a coherently coordinated behavior, such as the alter -exploration problem

(the interference over the agent learned policy caused by other agents environment explo-

ration). The convergence to a cooperative action policy in self-play (each running agent

follows the very same set of code descriptions) and in general-sum (allows cooperation

and the agents received reinforcements that may assume different values (MATIGNON et

al., 2012); (GREENWALD et al., 2005)) stochastic games is an issue: one problem is how to

achieve cooperative behavior when the Pareto-optimal solution does not coincide with the

Nash equilibrium, such as in the Prisoner Dilemma Game. The Nash equilibrium (NASH,

1951) corresponds to a collection of joint strategies (to all agents in the environment) such

that none agent may get a better outcome (by changing strategy) given that the others

will continue seeking their equilibrium strategies (choosing their best responses). The

Pareto-optimal solution occurs when it does not exist another crossing actions (agents

combination of actions) possibility in which the utility of one agent may increase without

decreasing the other agent utility.

When agents share an environment but do not exchange data, they may be actually

ignoring each others presence. Thus, those agents end up as part of the environment

itself, which means the transition probabilities related to the agent actions/environmental

outcomes are non-stationary. Therefore, the agents actions can be influenced by the joint

history of action selection, as the history influences the future transition probabilities

when the agent re-visit a state.

Regarding the agent itself, a deterministic game may present to the agent as non-

deterministic: rewards or stochastic transitions may be induced by different sources as

noise or non-observable factors, being a challenge to the agent to distinguish the pro-


voking changes over the reinforcements it receives (if noise or other agents actions pro-

moted those changes) (MATIGNON et al., 2012). For instance, the coordination game from

Boutilier (BOUTILIER, 1999) explores different mis-coordination examples: the possible

joint agent actions determine various rewards or penalties and also lead to different states.

Since we intend to run a considerable number of agents under our architecture in

self-play, an important issue stands: how to obtain a final outcome in which the agents

conquer the best possible individual result that does not bring a bad outcome to their

neighbors? Many times the best individual outcome will not conciliate with the best social

one (the best outcome for each agent if all of them choose to cooperate/ rejects the free-

riding). In general, especially in utility-based computational approaches, cooperation is

not easily modeled. As an illustrative example, public goods provide an analogy to analyze

natural societies relations and are most known for two main features: public goods are

public and not wasted through consumption. In natural societies (also within an artificial

scope that use them as a metaphor), unfair relations are possibly common, such as an

agent taking advantage of another agent social commitment. If public services are freely

available, what would endorse other strategy than free riding? The social commitment

may be crucial to accomplish the best social outcome. For example, by paying the taxes,

we intend to keep the Public Systems functioning, but some of us are not actually paying

for anything - consider the free rider problem which, in essence, has been considered since

Plato (PLATO, 2000(IVBC)), Montaigne (MONTAIGNE, 2013 (1580)b) and many others,

and, more recently, by Cornes and Sandler (CORNES; SANDLER, 1986). Since cooperating

within the group generally result in a cost to the cooperator and defectors benefit from

common resources (WARDIL; HAUERT, 2014), it emerges a dilemma between the agent’s

self-interest and the group’s maintenance. In fact, public goods games are a metaphor

to describe trivial relations in natural societies and generalize the Prisoner’s Dilemma

Game (PDG) to an arbitrary number of individuals - see Hardin (HARDIN, 1971) and

Wakano and Hauert (WAKANO; HAUERT, 2011). Not unusually, commitment is required

to accomplish the best social outcome: individuals must keep choices that only as a group

will render that particular outcome. To attain the best social outcome, agents have to

commit themselves, pursue the specific action policy that only as a group will accomplish

for the best. If one agent suddenly changes its action predilection, the other may face the

worst possible result: e.g. if one of the agents stop cooperating while playing the Prisoner

Dilemma Game. Axelrod (AXELROD, 1984) provides a classical reflection example of

cooperation, not only regarding the Prisoner Dilemma Game but, also, the cooperative

behavior placement in the chains of relations (exchange) between different powers.

Through interacting in its multiagent environment and learning the possible outcomes,

a learning agent (that adapts its action selection to environment) will stabilize its action

policy through the influence of other agents actions. Other agents strategies may put


forward environmental uncertainties and, if there is no data sharing, those uncertainties

are considered to be part of the environment itself. On the other hand, if a particular

agent never triggers any change over the others, to them, it may be as if that particular

agent never existed (neither as part of the environment). In an environment with various

states, if agent A usually collides with agent B in a particular state, then, one of them

(or even both) may end up avoiding that particular state and, depending on the game

possibilities and alternative path chosen, the agents may never find their best path to

accomplish a task. That happens because they can be unable to coordinate their action

selection while taking each other as part of the environment itself.

If two agents within the same environment are rational, once both have learned the

environmental dynamics, they will select actions in accordance with what one expects

to come from the other - even if indirectly considering the other agent, since it may be

considered as part of the environment itself. And those selected actions are expected to

be the agent better option. Then, we have agents that will try to give their best shot in

response to what it is expected to be the other agents best shot (see, for instance, the

Minimax theorem from Neumann (NEUMANN, 1928)). Therefore, since in general rational

agents are seeking to choose the best selfish action, how is it possible to achieve a better

social outcome instead of an individual one? How to obtain a rational and cooperative

agent while avoiding ad hoc artifices?

We seek to provide a possible approach through an architecture that does not exchange

environmental data (such as the selected action). We will use a game of incomplete

information: the agents will not have access to the neighbors actions or reinforcements.

But, at the same time, moral agents must morally behave while interacting with other

agents. The only data that will be shared by our agents consists of the neighbor ID: each

interacting agent will identify itself. Then, each one of them will be able to keep a record

of the neighbor ID and the reinforcements from interacting with that very same neighbor.

2.1 Artificial Moral Agents (AMAs)

According to Wallach and Allen (WALLACH; ALLEN, 2008), Artificial Moral Agents

(AMAs) would require the ability of accessing many options and working through differ-

ent evaluative aspects to present a good performance in a human moral domain - moreover,

it would be expected that AMAs would not deform it. Still addressing the computational

simulation of moral behavior, Wallach and Allen (WALLACH; ALLEN, 2008) emphasize the

advantages a machine could have over a human brain to respond to moral dilemmas, such

as the power of working through a higher number of matching possibilities and the exemp-

tion of sexual or emotional interferences. Machine could use those advantages to come


up with a better answer than those usually provided by humans. Wallach (WALLACH,

2009) analyzes moral dilemmas brought about by philosophers and contrasts what people

morally accept to do in order to save lives and the number of lives that could actually

be saved by them - there are cases in which the human moral judgment will not lead to

saving the highest number of lives (that case, could we say the human moral judgment

failed?). The AMA designers have to deal with those tricky situations (should we follow

the human moral judgment while developing our code or design utilitarian machines?)

and stick with a perspective while designing the machine code.

To be that way, it is decisive that the designers themselves reflect over their beliefs,

prejudices, perspectives (such as our bias to identify people, the cross-race effect (FEIN-

GOLD, 1914)) and taboos to avoid embodying it over the machines design. For instance,

Roth (ROTH, 2013) detailed the issue that technology and other mechanisms designed to

represent the skin tone did not evolve to replicate the skin color of non-Caucasian people.

Nowadays technology is currently bringing up more issues to the Ethics of Artificial Intel-

ligence. Another example regards to sex robots from TrueCompanion (TRUECOMPANION,

2010): would it badly interfere on the humans empathy? We are so deeply merging to

technology that it is not required that machines embody moral behavior to affect our

moral system, technology already changed our moral system.

Wallach (WALLACH, 2015) pores over the latest technology resources and potentialities

(including killing possibilities) while addressing responsibility issues of developers and

users. The apprehension of AMAs causing negative influences over humans is mentioned in

Bringsjord et al. (BRINGSJORD et al., 2006). Through enabling the formalization of a moral

code, deontic logic would allow the writing of theories and dilemmas in a declarative way.

That would allow the specialist analysis, then being a method of restricting the machines

behavior in ethically sensitive environments. Bello and Bringsjord (BELLO; BRINGSJORD,

2012) also emphasize a concern that restrictions should be inserted over the machines

design and those should be related to the human cognition. For instance, the moral

common sense and intuition should take part in that. Bringsjord et al. (BRINGSJORD et

al., 2006) present modifications over a mind reading model from Bello et al. (BELLO et al.,

2007) and, from their results, they conclude that we will have to deal with the confuse

human moral cognition to build AMAs that productively interact with humans. They also

ponder that moral machines should have a mechanism similar to common sense. That

adds matter to the debate about Lethal Autonomous Systems, as Arkin points out in

reflections in (ARKIN, 2013) and Asaro in (ASARO, 2012).

Computational simulation of moral behavior may be approached through diverse con-

texts. To exemplify the theme diversity, we detail three models:

1. LIDA Model (WALLACH, 2010), (WALLACH et al., 2008), (WALLACH et al., 2010),


(FRANKLIN et al., 2014) and (FAGHIHI et al., 2015). As a computational and concep-

tual model of human cognition, LIDA is described as a cognitive architecture de-

signed to select an action after dealing with ethically pertinent information. There-

fore, the LIDA model is expected to be able to deal with moral decisions. According

to Wallach (WALLACH, 2010), an artificial moral agent under the LIDA architecture

would be designed to, within the available time, select an action while taking into ac-

count the maximum possible quantity of ethically relevant information. This model

was influenced by the Global Workspace Theory (GWT) (BAARS, 1993 (1988)) and

by the Pandemonium Theory (JACKSON, 1987) for the automation of action selec-

tion. GWT would have detached itself as a human cognitive processing theory given

to its interpretation of the nervous system as distributed in parallel with different

specialized processes; and some coalitions of such processes would allow the agent

to build a sense from its sensorial data (which would come from its current envi-

ronmental situation). Other coalitions would inherit results from the sensorial data

processing that would have competed for attention and would have won. Those

would occupy the global workspace (GW), whose content would be transmitted to

all other specialized processes. Under a functional point of view, the GW content

would be conscious content and serve to recruit other processes to be used for action

selection in response to the current situation. In both GWT and LIDA, learning

would require and work through attention and would come in each conscious trans-

mission. The LIDA model is based on a cognitive cycle. Then, the human cognitive

processing would occur via continuous interaction of cognitive cycles, which would

happen asynchronously. Various asynchronous cycles could have different simultane-

ous parallel processes but that should respect the serial nature of the consciousness

process, important to keep a stable and coherent world scenario. During each cy-

cle, the LIDA agent would give sense to its current situation through updating its

internal and external environmental representations. Through a competitive pro-

cess, it would be decided which representing portion of the current situation should

receive attention. That portion would then be transmitted, becoming the current

content of consciousness and enabling the agent to choose an adequate action and

execute it. The feelings in the conscious flow would participate within many ways of

learning. New representations would be learned when generated in a cognitive cy-

cle and those that were not sufficiently stressed during the concurrent cycles would

disappear. Feelings would induce the action and the activation of environmental

schemes. Thus, the behavior selection would be influenced by its relevance over

the current situation, by the nature and importance of associated feelings and by

their relation with other behaviors, some of them being necessary to the current

behavior. To be executed, the selected behavior and feelings would be transmitted

to the Sensory-motor memory. There, the feelings would participate in the action


execution, as feelings can influence parameters as strength and speed.

2. EthEl Model (ANDERSON; ANDERSON, 2008b), (ANDERSON; ANDERSON, 2008a)

and (ANDERSON; ANDERSON, 2011), which application is related to prima facie

duties (duties that are mandatory, unless overpassed by stronger ones), was imple-

mented and tested within the notification context. This means an analysis of when,

how often, and whether to run a notification about a medicine to a particular pa-

tient. A typical dilemma example comes from the rejection of a patient of taking

the recommended medicine from a doctor. In what situation should the professional

insist the patient changes his mind? If it is crucial that the patient do take the

medicine, how many times should it be mentioned to the patient and when should

the doctor be notified about the patient refuse? EthEl (Toward a Principled Ethical

Eldercare Robot) (ANDERSON; ANDERSON, 2008b), (ANDERSON; ANDERSON, 2008a)

and (ANDERSON; ANDERSON, 2011) is a model trained over a deontic context (con-

cerning the duties of the health care professional), and is a prototype that applies

ethical principles (established by learning) to choose an action. The prototype would

have learned an ethical principle in its action taking in a particular kind of dilemma,

the one that relates to prima facie duties. The duties would embody a philosophical

problem relating to the absence of a decision procedure when the duties provide

conflicting orientation. The inspiration to EthEl come from Rawls (RAWLS, 1951).

The ethical dilemmas were presented to the prototype by a ordered set of values

to each possible solution, whose values would reflect violation or duty satisfaction.

EthEl uses the Inductive Logic (LAVRAC; DZEROSKI, 1994) to measure the decision

principle that has to be used to deal with the proposed dilemmas. EthEl would

have discovered a consistent decision principle that would indicate the correct ac-

tion when specific duties place in different directions a particular kind of dilemma.

Then, the professional should question the patient refuse if she/he is not completely

autonomous and when there is no violation of the duty of non-maleficence or severe

violation of the duty of beneficence. But EthEl would have established that vio-

lations over the duty of non-maleficence should impact more than violations over

duty of beneficence. The authors ponder that EthEl could also be used to other

sets of prima facie duties to which there is agreement among the specialists about

the correct actions.

3. To reflect about the Moral Theory vis-à-vis the conflict Generalism versus Partic-

ularism, Guarini (GUARINI, 2006) and (GUARINI, 2012) draws insights from Dancy

(DANCY, 2010) while pondering if the moral reasoning, including learning, could be

done without the use of moral principles. If so, models of artificial neural networks

(ANN) could provide indications of how to do it, given the fact that ANNs would

be able to generalize new cases from those previously learned - and do it without


principles of any kind. Thereby, ANNs are modeled to classify and reclassify cases

with a moral purport, being the output (acceptable or not) an answer to moral

dilemmas attached to the questions kill or let die. Dancy (DANCY, 2010) empha-

sizes a mismatch between moral principles and the importance of the context for the

analysis of what is morally acceptable: moral decisions would depend on the context

and situation. The subject kill or let die from Guarini (GUARINI, 2006), (GUARINI,

2012) would have come from a modified analogy from Thomson (THOMSON, 1971) -

where, relating to an abortion from being pregnant by rape, it takes place a discus-

sion about the difference between murderer and letting die. The modified analogy is

as follows: there is only one person capable of keeping a particular man alive. That

person is kidnapped and placed to filter the man’s blood and should stay there,

connected to him, for nine months. After that, the man will survive and the person

may be free from him. In short: after using violence, a life became dependent on

the other. Then, would it be morally acceptable or not that the person decided to

disconnect from the man before he could be saved (leading to the man’s death)?

According to Guarini (GUARINI, 2006), (GUARINI, 2012), the results suggested that

the classification of non-trivial cases from the absence of queries about moral prin-

ciples would be more plausible than it might be supposed at first sight, although

important limitations suggest the need for principles. Regarding a reclassification,

which would be an important part of the reasoning in humans, simulations indicated

the need for moral principles.

The approaches from items 2 and 3 fall into an specific application domain: EthEl is

tested in a notification context (analysis of when, how many times and if a notification

shall be done). Item 3 relates to the concern of providing as output an answer to moral

dilemmas related to the questions kill or let die. Research driven to deal with moral

dilemmas is particularly important because it may be useful to design a morality mech-

anism in machine learning (see Sect 5.1). Finally, the LIDA model is a complex project

that was still under development when we started our project. Since none of the studied

works matched our intentions, we searched for other bases (see Sect. 2.1.1) to guide our

moral architecture design.

2.1.1 MultiA

We expect to obtain relevant decision making toward cooperation in MAS tasks by

designing a computational architecture endowed with artificial emotions, feelings and

moral behavior (through the empathy embodiment). We started designing our bioinspired

computational multiagent architecture by using the ALEC architecture from Gadanho

(GADANHO, 2003) as essential reference - we describe such an influence in Sect. 3.3. Our


multiagent architecture is called MultiA since it is driven for usage in multiagent systems

(Multi) and the Alec architecture (−A) inspired it. To design a bioinspired computationalarchitecture, we studied biological and philosophical references while seeking to compu-

tationally mimic rudimentary mechanisms related to both moral and immoral behaviors.

Work has been done to establish the crucial role of emotions during the process of

intelligent decision making and its importance in filtering information and awakening

our attention mechanisms (see the Somatic Marker Hypothesis from Damásio (DAMÁSIO,

1994)). The emotions and feelings vital role in rational decisions embraces social emotions

(such as sympathy and its associated feeling of empathy) and are analyzed from the aspect

of social interaction and homeostatic goals (DAMÁSIO, 2004). Damásio (DAMÁSIO, 2004)

defined social emotions using the concept of moral emotions by Haidt (HAIDT, 2003) -

we will follow it while designing the artificial emotions. Haidt (HAIDT, 2003) explains

emotions as responses to a class of events perceived and understood by the self and so

emotions usually provoke action tendencies. It is particularly important to differentiate

social emotions from other emotions: social emotions trigger action tendencies during

situations that do not represent direct harm or benefit to the self (disinterested action

tendencies), other emotions, on the other hand, are more self-centered.

The brain dexterity of internally simulate emotional states, establishing a basis for

emotionally possible outcomes and emotion-mediated decision making, are also scruti-

nized in Damásio (DAMÁSIO, 2004). Therefore, internal simulation takes place during

the process along which sympathy emotion turns into the feeling of empathy. The social

interaction would be done via mirror-neurons (discovered in the premotor cortex area of

macaque monkeys by Pellegrino et al. (PELLEGRINO et al., 1992) and Rizzolatti et al. (RIZ-

ZOLATTI et al., 1996)) by making our brain internally simulate the movement that others

do while in our field of vision, for example. Such a simulation would enable us to predict

the required movements to establish communication with the other (which will have its

movements mirrored). Finally, the internal simulation about our own body (e.g. when

we internally simulate ourselves executing different activities) could be as well related to

the mirror-neurons.

Gallese and Goldman (GALLESE; GOLDMAN, 1998) reflects over the human aptitude of

simulating the mental states from others, and thus understanding their behavior, assigning

to them intentions, goals or beliefs. There is a suggestion that what might have evolved

to such a capacity is an action execution/observation matching system. Likewise, a class

of mirror neurons would be playing its role on that. Moreover, a possible activity of the

mirror-neurons would be to promote learning by imitation. Nowadays there exists the

agreement that normal humans develop the capacity of representing mental states from

others (the system representation oftentimes receives the name folk psychology). Finally,

there is the consideration that fitness could be evaluated from such ability, as detecting


another agents goals and inner states can help the observer to predict the other future

actions, which can be cooperative or not, or even threatening (researchers are continuously

providing new insights from Mind Reading related experiments).

While holding an emotional background, the dynamics involved in the empathy origin

can be approached from a cognitive aspect (WAAL, 2009) (PROCTOR et al., 2013). The

social emotion of sympathy feeds the feeling of empathy. But the social emotions benefit

from the internal simulation improved by mirror-neurons that internally mirror the situ-

ation of the other (learning by imitation may be related to the mirror neurons activity).

But the feeling of empathy will be less or more intense depending on the importance of a

particular other agent (DAMÁSIO, 2004). We seek to maintain negative emotions in low

levels and the positive ones in high levels (DAMÁSIO, 2004) and the purpose of homeostasis

would be to product a state of life better than neutral, to accomplish what we identify

as well-being. Following the idea, MultiA will establish its preferences considering its

own and peers well-beings. While designing the Empathy Module (see Sect. 3.2.2.2) of

MultiA, we used the mirror-neurons as inspiration. Even though MultiA does not mir-

ror its neighbors movements, MultiA mirrors its own emotions and preferences over the

neighbor. Then, the current emotions of MultiA itself are applied while building an

expectation about the well-being of a neighbor - and during that process, MultiA con-

sider that the neighbor shares the very same emotional preferences (in Sect. 3.2.2.2, see

I = {{IP}, {IN}}).

Regarding the feeling of empathy, we have also been guided by the differentiation of

three types of agents: the moral, immoral and amoral. By rudimentarilly mimicking those

three patterns of morality, our agents will display different social interaction policies. The

moral (MultiA, moral agents) tries not to take advantage from the others and cooperate;

the immoral takes advantage from the others more easily and does not cooperate (MultiA,

immoral agents). Unlike the others, the amoral is not guided by social emotions and

feelings (MultiAA, Sect. 4.1.1). The entireMultiA architecture will be analyzed in Sect. 3.

3 MultiA: A Computational Model

for Simulation of Empathy and

Moral Behavior

We propose the MultiA computational architecture designed from reflections over the

relevance of moral behavior in the search for a rational and cooperative biologically inspired

artificial agent. We hypothesize that the simulation of emotions and moral behavior aiding

the computational architecture to make decisions favors cooperation even in face of high

reinforcements to selfish behavior. The analogy with moral behavior is implemented

through a simulation of empathy, thus the agent can have the ability to select actions that

may not be the best selfish option, but that help to enhance the interactions among agents.

Since MultiA agents have the empathy more accessible for agents whose interactions have

resulted in positive reinforcements, the reciprocity assumption introduces both moral and

immoral agents, since the mechanism of action selection can be moved from two aspects.

The first is related to cooperation, once the particular MultiA agent has a cooperative

neighborhood. Then, the agent will cooperate by reciprocity. The second is related to

non-cooperation, since the surrounding is non-cooperative (non-cooperative MultiA agent

by reciprocity). Then, MultiA rudimentarilly mimics both moral and immoral agents.

MultiA consists of three main systems (Fig. 3.1): the Perceptive System (PS), the

Cognitive System (CS) and the Decision System (DS). The interactions among the

three systems will result on action selection derived from sensations triggered by the

environment while provoking environmental changes that will, on its turn, trigger new

sensations, and so on. As input to PS, MultiA has artificial sensations that are triggered

by reinforcements and indexed by the agent it is interacting with.

3.1 MultiA Functioning: an Overview

While designing the computational architecture from Gadanho (GADANHO, 1999), we

took into account the animal behavioral characteristics analyzed in Hallam and Hayes (HAL-

CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 32

FIGURE 3.1 – The general scheme of the MultiA Architecture.

LAM; HAYES, 1992) that could inspire a robotic design. Among those animal characteris-

tics, there is homeostasis, the biological capability of body auto-regulation, such as keeping

the temperature or the cells pH, in such a way that the internal conditions are kept under

a stable and regular basis. Through its research on organic mechanisms of biological reg-

ulation, Claude Bernard (1813-1878) used the concept Milieu Intérieur, the homeostasis

precursor. Later, Cannon (CANNON, 1932) described the body steady states and some

mechanisms to control them; Cannon (CANNON, 1932) also provided an analogy between

social processes and body regulation. Therefore, it may be natural the association of

homeostasis with a neutral, balanced, state. Nevertheless, according to Damásio (DAMÁ-

SIO, 2004), life regulation would be designed to comprehend the homeostatic efforts to

produce that state that we understand as well-being. The environment and our bodies

evoke ongoing homeostatic reactions that will keep influencing us and our actions from

which we will keep up changing our environment and ourselves. Homeostatic reactions

may continue reflecting upon us even after the particular situation that caused them has

ended.

Through the inspiration from biological homeostasis and from Gadanho and Custó-

dio (GADANHO; CUSTÓDIO, 2002), see Sect. 3.3, we designed the MultiA Perceptive Sys-

tem. The artificial sensations feed emotions, feelings and, afterwards, through a weighted

sum on feelings, the general environmental and internal perspective of a MultiA agent i

(named Well-Being, Wi) about its own performance. MultiA follows its artificial home-


ostatic goals, thus, it selects those actions that are expected to keep the feelings and

emotions within a threshold, therefore, achieving high Wi levels. The history of a MultiA

agent reflects on the current values of its Perceptive System and on the learning of match-

ing emotional responses to actions. Therefore, the feelings maintenance on a threshold

relies upon the selection of adequate actions in response to the environment. Wi is mod-

eled as a function of the feelings, and internally represents the general condition of agent

i. It is calculated with normalizing weights such that its value falls in the range [−1, 1].From another perspective, Wi enlightens how suitable has been the action selection (from

DS) concerning the reinforcements received by the MultiA agent i itself and the remain-

ing feelings, as empathy. In addition to Wi, MultiA also produces Wpi, a prospect about

the current situation of other agents.

MultiA then uses a set of its own emotions to provide itself a prospect about the

current situation of other agents. Although there is some controversy about it (see for

instance Hickok (HICKOK, 2014)), we used mirror neurons ( (PELLEGRINO et al., 1992),

(RIZZOLATTI et al., 1996)) as inspiration on the mechanism for projecting MultiA own

emotions to mirror other agents situation. Actions related to high empathy are designed to

be avoided, since we consider that when an agent rouses high empathy levels it is because

the agent itself may be disturbing the performance of the others. For the design of the

Empathy module, we used the utilitarian calculus from Bentham (BENTHAM, 2007 (1789))

as guideline. This way, MultiA agents have the empathy more accessible for agents whose

interactions have resulted in positive reinforcements. Furthermore, if a MultiA agent has

been receiving a high number of positive reinforcements, it is also more susceptible to

cooperate. The empathy is represented by S4,ip: feeling number 4 of MultiA agent i for

neighbor p; on Figure 3.1, see feeling number 4. As we designed the empathy to reflect

the impact of the action selection of MultiA on its neighbors, the higher the empathy

for a specific neighbor p, the lower is Wi, all the remaining variables that feed Wi kept

constant. This means that, at a certain point, the MultiA agent may not have been

selecting its actions appropriately, since it may be affecting negatively on the particular

neighbor p, thus high empathy levels are an indication of inadequate action selection.

Selected actions are considered adequate when they do generate positive reinforcements

while not provoking high empathy levels. If p fires high empathy on i, p may be getting

low reinforcements and therefore its neighbors, as i, should check their actions.

Thus, MultiA is designed to seek those actions that will not increase its levels of

empathy. Then, after applying the current emotions (from the PS) as input, the CS uses

artificial neural networks (ANNs) to estimate the resulting Well-Being if the corresponding

action is to be selected. The CS will then deliver the outputs from all ANNs to the DS

to choose an action.


3.1.1 MultiA and an Interaction Game

Through following utility-based computational approaches, it is not trivial to model

artificial agents that reject the opportunity of taking advantage of the others actions (e.g.

the selection of actions only driven to obtain the highest reinforcements, no matter the

consequences to others) and still commit with the choice of cooperating. In Sect.2 we in-

troduced the public goods subject (including the related issue of, somehow, take advantage

of the others actions, the free-riding) and mentioned that the Prisoner’s Dilemma Game

(PDG) is generalized through public goods games (HARDIN, 1971), (WAKANO; HAUERT,

2011). We developed MultiA with the aim of providing an architecture extensible to

different domains and to show cooperation as an emergent property. Without loss of

generality, let us hypothesize that each MultiA agent i is going to play the Prisoner’s

Dilemma Game with another MultiA agent p. Hence, each MultiA agent will have to

decide if it is going to cooperate or not with the other (to defect) - and a defector is

highly rewarded for unilateral defection (defection vs. cooperation). In Sect. 3.2 we detail

the MultiA architecture itself while in Ch. 4 we present the MultiA agents, the artificial

learning agents under the MultiA architecture in a multiagent environment and task.

3.2 The Systems of the MultiA Architecture

3.2.1 Perceptive System (PS)

We consider a model where reinforcements are non negative. As long as our research

is grounded on moral behavior, we intend to test and study MultiA agents interacting

among themselves. Thus, each MultiA agent i will keep a list of every agent it has

interacted with (the neighbors of i). Sensations fall in the range [0, 1] and, together

with the history provided by the CS, give rise to artificial emotions. MultiA artificial

sensations are triggered by reinforcements, and by an identifying index for the neighbor it

is interacting with. Indexing is defined in the following way: every MultiA agent has an

identifying index i = {1, ..., N}, and the neighbors relating to each agent i also have anidentifying index p = {1, ..., Z}. A given p value thus refers to a particular neighbor thatis interacting with i. The CS delivers five sets of data (history of agent i) to the PS:

1. The current number of neighbors of agent i;

2. The reinforcement history of agent i;

3. The number of times agent i has interacted with each neighbor p;

4. The number of times interactions with p ended up in positive reinforcements (Mip);


5. The value of Ypi, defined as follows. The CS accesses the current emotions from

PS. Then, the Empathy Module EM (from the CS) produces Wpi: an assumption

of i about the current condition of neighbor p. MultiA will then respond to the

current condition of neighbor p (Wpi), producing Ypi. If the neighbor p is supposed

to be facing low reinforcements, MultiA may have its empathy raised (depending

on the Ypi value) to select less selfish actions and try to cooperate with the raise of

the reinforcements of p. For details, see Sect. 3.2.2.2.

There are basic emotions {Eb1,i, Eb2,i, ..., Ebd,i} and social emotions {Es1,i, Es2,i, ..., Esh,i},all normalized to [−1, 1]. The basic emotions are associated to the general condition ofthe MultiA agent itself. Social emotions are stimulated by neighbors and by the impact

of the own agent actions on those neighbors. The artificial feelings {S1,i, S2,i, ..., Sz,i} alsofall in the range [−1, 1] and are fed by emotions. We used the reference (DAMÁSIO, 2004)as inspiration while shaping the artificial basic emotions. Table 3.1 explicits the particular

biological emotions that inspired each MultiA basic emotion.

TABLE 3.1 – Basic Emotions and the Artificial Basic Emotions of MultiA

Biological Basic Emotion Artificial Basic EmotionAnger Eb1,i

Sadness Eb2,iSurprise Eb3,i

Fear Eb4,iHappiness Eb5,i

Disgust Eb6,i

The artificial basic emotions are:

• Eb1,i: increases with the number of interactions of i in the same match. A matchis defined by every i interacting only once with each and all of its neighbors, and

interactions are always ordered w.r.t. neighbor agent index. Once all neighbors have

interacted, the match ends. It is calculated according to Eq. 3.1:

Eb1,i = −1 + 2 ∗ (mti/V 1i ) (3.1)

where t represents the (possibly unfinished) current match, V 1i is the initial number

of neighbors of agent i at the first match, and mti is the number of concluded

interactions of i with its neighbors during current match t.

• Eb2,i: indicates the difference between the sum of reinforcements rt−1i , received by i


during the match t− 1 (Eq. 3.2) and a threshold value R0,i (range [0, 1]).

rt−1i =

V t−1i∑j=1

Rt−1i,j (3.2)

where V t−1i is the number of neighbors of agent i at match t − 1 and Rt−1i,j is thereinforcement of i after interacting with neighbor j at t− 1.

Eb2,i is then calculated as

Eb2,i = rt−1i −R0,i (3.3)

• Eb3,i: at each match t, it decreases with the number of lost neighbors (a neighbor islost when it stops interacting), Eq. 3.4:

Eb3,i = 1− 2 ∗ ((V 1i − V ti )/V 1i ) (3.4)

where V ti is the number of neighbors of agent i at match t. Note that, as MultiA

social emotions are designed to be triggered by social interaction, we assume that

V 1i > 0.

• Eb4,i: indicates the difference between the current sum of reinforcements rti and athreshold value. That is measured by comparing the current sum of reinforcements

rti and R0,i, Eq. 3.5:

Eb4,i = rti −R0,i (3.5)

• Eb5,i: the current rti during the current match t, see Eq. 3.6;

Eb5,i = −1 + (2 ∗ rti). (3.6)

• Eb6,i: it always starts a match with value = 1 and only decreases (during the currentmatch t) if the interaction with a neighbor does not render positive reinforcements,

see Eq. 3.7:

Eb6,i = Eb6,i − 2 ∗ (1/V 1i ) (3.7)

In contrast with basic emotions, social emotions are driven by the neighbors and by

the influence of the MultiA agent on those neighbors. The same way as we did about

the basic emotions, we used the reference (DAMÁSIO, 2004) as inspiration to shape the

MultiA artificial social emotions. Table 3.2 explicits the particular social emotions that

inspired the artificial social emotions of MultiA.


TABLE 3.2 – Social Emotions and the Artificial Social Emotions of MultiA

Biological Social Emotion Artificial Social EmotionPride Es1,i

Gratitude Es2,iCompassion Es3,iSympathy Es4,i

The artificial social emotions of MultiA are:

• Es1,i: emphasizes those behaviors relating to the social context that did not originatepositive outcomes to i but, still and to a minor degree, increases together with

positive reinforcements of the agent. That way, Es1,i increases at any change on Eb5,i

and, in a greater degree, at any change on Eb6,i. It always starts a match with value

= −1; and s < (2/V 1i ) is a weight used to establish the importance of Eb5,i, see Table3.3.

TABLE 3.3 – Updating Es1,i

At any change on The value of Es1,i:Eb6,i E

s1,i + (2/V

1i )

Eb5,i Es1,i + s

• Es2,i: the average number of variations of Eb5,i per iteration, normalized to the range[−1, 1], from the first match until the current one. It starts with zero;

• Es3,ip: calculated according to Eq. 3.8, is the average number of variations of Eb5,iper iteration with neighbor p:

Es3,ip = −1 + (2 ∗Mip) (3.8)

where Mip is provided by the CS and is the average number of variations of Eb5,i

(i.e., average number of increases in rti) per interaction with neighbor p.

• Es4,ip: is doubly fed, both by the reciprocity value addressed to neighbor p (Ypiprovided by the CS) and by the empathy feeling S4,ip

t−1 (see Table 3.4) by p right

after the last interaction with p (during last match at t− 1), a residual value fromthe past influencing the current emotion:

Es4,ip = (ca ∗ S4,ipt−1) + ((1− ca) ∗ Ypi) (3.9)


where ca = [0, 1] is a weight used to establish the importance of residual values of S4,ipt−1.

The EM from the CS sends the Ypi value of agent i to p, see Sect. 3.2.2.2.

Once in the PS, the Ypi will stimulate the social emotion Es4,ip (social emotion number

4 of agent i for neighbor p; on Figure 3.1, see social emotion number 4), then reaching the

empathy feeling S4,ip. The emotion Es4,ip is fed both by Ypi, and by the empathy feeling

by p right after the last interaction with p, a residual value from the past influencing

the current emotion. Then, right before a new interaction with p, the empathy feeling

is fed both by the emotions Es4,ip and Es3,ip (social emotion number 3 of agent i for

neighbor p; on Figure 3.1, see social emotion number 3). The last summarizes the utility

of neighbor p: the average number of times interacting with neighbor p has resulted in

positive reinforcements.

The artificial feelings = {S1,i, S2,i, ..., Sn,i} fall in the range [−1, 1] and arise through aweighted sum of emotions. The weights are set according to the relevance of each emotion

to the domain. Table 3.4 presents the set of emotions that feed each feeling (Eb1,i does not

feed any feeling). Because of its feeding set of emotions, the only feeling that adapts to

the interacting neighbor p is S4,ip. The well-being Wi uses feelings to internally represent

the general situation of agent i. It is calculated (Eq. 3.10) with normalizing weights so

that the final value falls in the range [−1, 1]:

Wi =n∑

j=1

ajSj,i (3.10)

where n is the number of feelings. The weights aj are set respecting the relevance of each

feeling to the domain. For simplification, the p index of S4,ip is omitted from Eq. 3.10. Wi

measures the performance of MultiA agent i in the environment, considering the empathy

feeling for p. If the empathy reaches high levels, Wi will be low: probably the last selected

actions may be causing bad outcomes to p; therefore the well-being Wi of agent i should

be low, even though its reinforcements may be high.

TABLE 3.4 – Artificial Feelings

Feeling Fed by EmotionS1,i E

b2,i and E

b3,i

S2,i Eb4,i, E

b5,i and E

b6,i

S3,i Es1,i and E

s2,i

S4,ip Es3,ip and E

s4,ip


3.2.2 Cognitive System (CS)

The CS consists of two Modules: Empathy (Sect. 3.2.2.2) and Learning(Sect. 3.2.2.1).

The first is responsible for producing the Ypi value to be sent to the PS. Once there, Ypi

ends up feeding the empathy feeling. The second module applies artificial neural networks

(ANNs) to estimate the Well-Being Qipt(Eip

t, k) that will result from the execution of a

specific action k (k ∈ actions) in response to the current set of emotions Eipt. Observethat Eip

t is the current set of all emotions (basic and social) of agent i at match t and

for neighbor p. In Fig. 3.2 we illustrate the Learning Module functioning: at match t,

agent i is going to interact with neighbor p and its current set of emotions is Eipt. Before

the agent i takes an action, the Learning Module estimates the Well-Being values that

will probably follow from the execution of each action k. In the example, agent i has two

options of action: action A or B. If executed, action A is expected to obtain the higher

estimated Well-Being value, as Qipt(Eip

t, A) = 0.2 and Qipt(Eip

t, B) = 0.1.

FIGURE 3.2 – The Learning Module of agent i (represented by the black box) providesthe estimated Well-Being values for each available action if it is going to be executed inresponse to an interaction with neighbor p at match t.

3.2.2.1 The Learning Module

Two main references were considered while designing our Learning Module: Gadanho

(GADANHO, 1999) and Lin (LIN, 1993). The Learning System from Gadanho (GADANHO,

1999) received inspiration from Lin (LIN, 1993), which depicts the application of one ANN

for each action available to the agent, and the action policy acquisition based on the Q-

Learning algorithm (WATKINS, 1989). The ANNs from Gadanho (GADANHO, 1999) are

feed-forward and trained t

Documents

A COMPUTATIONAL MODEL FOR SIMULATION OF EMPATHY … · como uma propriedade emergente de sistemas multiagentes. Nossos resultados tamb em indicam a viabilidade do M odulo de Empatia