Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Thesis presented to the Instituto Tecnológico de Aeronáutica, in partial
fulfillment of the requirements for the degree of Doctor of Science in the
Program of Electronic Engineering and Computation, Field of Informatics.
Fernanda Monteiro Eliott
A COMPUTATIONAL MODEL FOR SIMULATION
OF EMPATHY AND MORAL BEHAVIOR
Thesis approved in its final version by signatories below:
Prof. Dr. Carlos Henrique Costa Ribeiro
Advisor
Prof. Dr. Luiz Carlos Sandoval Góes
Prorector of Graduate Studies and Research
Campo MontenegroSão José dos Campos, SP - Brazil
2015
Cataloging-in Publication DataDocumentation and Information Division
Eliott, Fernanda MonteiroA Computational Model for Simulation of Empathy and Moral Behavior / Fernanda Monteiro
Eliott.São José dos Campos, 2015.90f.
Thesis of Doctor of Science – Course of Electronic Engineering and Computation. Area ofInformatics – Instituto Tecnológico de Aeronáutica, 2015. Advisor: Prof. Dr. Carlos HenriqueCosta Ribeiro.
1. Arquitetura (computadores). 2. Sistemas multiagentes. 3. Comportamento afetivo.4. Tomada de decisão. 5. Simulação computadorizada. 6. Inteligência artificial. 7. Computação.I. Instituto Tecnológico de Aeronáutica. II. Title.
BIBLIOGRAPHIC REFERENCE
ELIOTT, Fernanda Monteiro. A Computational Model for Simulation ofEmpathy and Moral Behavior. 2015. 90f. Thesis of Doctor of Science – InstitutoTecnológico de Aeronáutica, São José dos Campos.
CESSION OF RIGHTS
AUTHOR’S NAME: Fernanda Monteiro EliottPUBLICATION TITLE: A Computational Model for Simulation of Empathy and MoralBehavior.PUBLICATION KIND/YEAR: Thesis / 2015
It is granted to Instituto Tecnológico de Aeronáutica permission to reproduce copies ofthis thesis and to only loan or to sell copies for academic and scientific purposes. Theauthor reserves other publication rights and no part of this thesis can be reproducedwithout the authorization of the author.
Fernanda Monteiro EliottR. Paulino Blair, 3312232030 – São José dos Campos–SP
A COMPUTATIONAL MODEL FOR SIMULATION
OF EMPATHY AND MORAL BEHAVIOR
Fernanda Monteiro Eliott
Thesis Committee Composition:
Prof. Dr. Paulo André Lima de Castro Chairperson - ITAProf. Dr. Carlos Henrique Costa Ribeiro Advisor - ITAProf. Dr. Jackson Paul Matsuura Internal Member - ITAProf. Dr. Osvaldo Frota Pessoa Júnior External Member - USPProf. Dr. Ricardo Ribeiro Gudwin External Member - UNICAMP
ITA
To my parents.
Acknowledgments
First of all I would like to thank my advisor, Professor Carlos Henrique Costa Ribeiro,
and endear his role in helping my thoughts to turn from abstraction toward a concrete
project. I also would like to thank the entire Aeronautics Institute of Technology (ITA) for
its enormous contribution to the development of my knowledge while making my evolution
in a multidisciplinary field possible.
I would like to emphasize the role of the Philosophy dept. of the University of São
Paulo (USP) in my academic background. Always Thankful.
I am also thankful to Professor Briseida Dôgo de Resende (USP, Experimental Psy-
chology dept.) for taking me as a guest in her class and research group. It was a priceless
experience.
My family was all the way around me. I appreciate and hope that process to continue.
Finally, I would like to thank CNPQ for the financial support.
“Disons donc que, si toutes chosesdeviennent naturelles à l’homme
lorsqu’il s’y habitue, seul reste danssa nature celui qui ne désire que les
choses simples et non altérées. Ainsila première raison de la servitude
volontaire, c’est l’habitude.”— Étienne de La Boétie, 1576.
Resumo
Emoções e sentimentos são considerados cruciais no processo de decisão humana in-
teligente. Em particular, as emoções sociais nos ajudariam a reforçar o grupo e a cooperar.
Ainda é uma questão de debate o que motivaria criaturas biológicas a cooperarem ou não
com seu grupo. Todos os tipos de cooperação ocultariam interesses egóıstas, ou o altrúısmo
realmente existiria? Se nos debruçarmos sobre essas questões a partir de uma perspectiva
humana, acabamos passando por comportamento moral e três tipos de sujeitos: o moral,
imoral e amoral. Se nos movermos de sujeitos biológicos em direção a agentes artificiais,
observamos ser uma questão complexa ficar ileso a mecanismos ad-hoc a fim de atingir
cooperação em abordagens computacionais baseadas em utilidade. Decidimos nos inspirar
em comportamento moral como uma forma de buscar a cooperação em Sistemas Multi-
agentes. Nossa hipótese principal baseia-se na ideia de que a cooperação pode surgir a
partir do aux́ılio de emoções e comportamento moral durante o processo de tomada de
decisões - mesmo quando comportamento egóısta é recompensado por altos reforços. A
analogia com o comportamento moral é promovida através da simulação do sentimento
de empatia.A importância do sentimento de empatia consiste na sua função em regular
as prioridades dos agentes, permitindo a seleção de ações que, talvez, não sejam a melhor
seleção egóısta, uma vez que uma tomada de decisão não egóısta possa ser crucial para
equalizar as interações entre os agentes e resultar em cooperação. Descreveremos aqui
nossa arquitetura computacional multiagente bioinspirada (denominada MultiA), com-
posta por emoções artificiais, sentimentos e por um Módulo de Empatia responsável por
fornecer uma seleção de ações que, rudimentarmente, imite comportamento moral. Infor-
mação sensorial é acionada pelo meio ambiente e, então, a arquitetura computacional a
transforma em emoções e sentimentos artificiais básicos e sociais. Posteriormente, através
do módulo de empatia, suas próprias emoções são empregadas para estimar o estado at-
ual de outros agentes. E então, seus sentimentos artificiais proporcionam uma medida
(denominada bem-estar) do seu desempenho em resposta ao ambiente. Através daquela
medida e de técnicas de aprendizado por reforço, a arquitetura aprende um mapeamento
entre emoções e ações. Diante de recompensas para comportamento egóısta, os agentes
MultiA que adotam estratégia cooperativa, o fazem como resultado de um sentimento de
empatia (altos ńıveis de empatia) regulando as prioridades do agente, agindo como um
viii
agente moral. Os agentes MultiA que não adotam a estratégia cooperativa selecionam
ações egóıstas, e o fazem como resultado de baixos ńıveis de empatia, agindo como agente
imoral. O mecanismo de seleção de ação de MultiA pode ser alimentado a partir de dois
aspectos. O primeiro está relacionado à cooperação, uma vez que um agente MultiA em
particular tenha uma vizinhança cooperativa. Dessa forma, o agente irá cooperar por
reciprocidade. O segundo está relacionado à não-cooperação, uma vez que o entorno é
não-cooperativo (agente MultiA não cooperativo por reciprocidade). Portanto, a arquite-
tura computacional acaba por imitar rudimentarmente agentes morais e imorais. De fato,
obter agentes morais e imorais a partir de uma mesma arquitetura se encaixa em pressu-
postos filosóficos sobre o meio corromper o indiv́ıduo. Dado que relações entre indiv́ıduos
diferentes possam ser representadas por redes, exploramos diferentes topologias de rede
para caracterizar as interações agente-agente, definindo a vizinhança dos mesmos. A fim
de avaliar nossa arquitetura, utilizamos uma versão de um jogo evolutivo que aplica o
jogo do dilema do prisioneiro para estabelecer as alterações sobre a topologia da rede.
Os resultados indicam que, apesar de MultiA também imitar rudimentarmente agentes
imorais, um número suficiente de agentes MultiA seguiram em outra direção, assim,
através da cooperação, mantiveram a estrutura da rede da vizinhança. Portanto, estraté-
gias baseadas em simulação de comportamento moral podem auxiliar na diminuição da
recompensa interna advinda de uma seleção de ação egóısta, favorecendo a cooperação
como uma propriedade emergente de sistemas multiagentes. Nossos resultados também
indicam a viabilidade do Módulo de Empatia e coerência entre a experiência do agente e a
poĺıtica de ação adotada. Intensificamos os parâmetros de teste e ainda assim obtivemos
um número substancial de agentes MultiA cooperativos. Mas, adicionalmente, obtivemos
agentes MultiA não-cooperativos, o que decorreu também do efeito de ocultamento de
estratégia. Este consiste em um problema importante que interfere na poĺıtica de ação de
agentes MultiA. Em relação ao paradigma de reciprocidade sobre o projeto de MultiA,
este se destacou através da prevenção de efeito de falha em cascata em redes descritas por
uma correlação de grau quase neutra, auxiliando os agentes a serem melhor sucedidos em
espelhar a condição dos vizinhos. Nossos resultados confirmam empiricamente a influência
do Módulo de Empatia sobre o Sistema de Decisão de MultiA.
Abstract
Emotions and feelings are now considered as decisive in the human intelligent decision
process. In particular, social emotions would help us to enhance the group and cooperate.
It is still a matter of debate the what that motivates biological creatures to cooperate or
not with their group. Would all kinds of cooperation hide a selfish interest, or would it
exist truly altruism? If we pore over those questions from a human perspective, we end
up passing through moral behavior and three kinds of individuals: the moral, immoral
and amoral. If we move from biological subjects onto artificial agents, it is a complex
matter to go without ad hoc mechanisms to bring up cooperation in utility-based com-
putational approaches. We decided to take inspiration from moral behavior as a way of
moving toward cooperation in Multiagent Systems. Our leading hypothesis relies on the
idea that cooperation can emerge from the assistance of emotions and moral behavior
during the process of decision making - even when selfish behavior is rewarded by high
reinforcements. The analogy with moral behavior is promoted through simulating the
feeling of empathy. The importance of the empathy feeling is its function on regulating
the agents priorities, enabling the selection of actions that may not be the best selfish
selection, since non selfish decision making may be crucial to equalize the interactions
among agents and bring up cooperation. We depict herein our bioinspired computational
multiagent architecture (so-called MultiA) composed by artificial emotions, feelings and
by an Empathy Module responsible for providing an action selection mechanism that rudi-
mentarilly mimic both moral and immoral behaviors. Sensorial information is triggered by
the environment, then, the computational architecture transforms it into basic and social
artificial emotions and feelings. Thereat its own emotions are employed to estimate the
current state of other agents through an Empathy module. Finally, its artificial feelings
provide a measure (termed well-being) of its performance in response to the environment.
Through that measure and reinforcement learning techniques, the architecture learns a
mapping from emotions to actions. While facing high rewards to selfish behavior, the
MultiA agents that adopt the cooperative strategy do so from the result of an empathy
feeling (high empathy levels) regulating the agents priorities, acting as a moral agent. The
MultiA agents that do not adopt the cooperative strategy select selfish actions, and do so
as a result of low empathy levels, acting as an immoral agent. The MultiA mechanism of
x
action selection can be moved from two aspects. The first is related to cooperation, once
the particular MultiA agent has a cooperative neighborhood. Then, the agent will coop-
erate by reciprocity. The second is related to non-cooperation, since the surrounding is
non-cooperative (non-cooperative MultiA agent by reciprocity). Thus, our computational
architecture actually rudimentarilly mimics both moral and immoral agents. But, as a
matter of fact, achieving moral and immoral agents from the very same architecture fits
philosophical assumptions about the environment corrupting the individual. As relations
between different subjects can be represented by networks, we explored varied network
topologies that can characterize the agent-agent interactions, by defining the agents neigh-
borhood. For assessment of our architecture, we use a version of an evolutionary game
that applies the prisoner dilemma paradigm to establish changes over the network topol-
ogy. Our results show that, even though MultiA can also mimic immoral behavior, it is
more likely to mimic moral behavior. Then, in each experiment, an enough number of
MultiA agents mimicked moral agents to solve the task. Thus, through cooperation, they
kept the neighboring network structure. Therefore, strategies relied upon simulation of
moral behavior may help to decrease the internal reward from selfish selection of actions,
thus favoring cooperation as an emergent property of multiagent systems. Our results
also indicate the Empathy Module feasibility and coherence between the agent experience
and the adopted action policy. We tested MultiA agents under stressed parameters and
we still obtained a substantial number of cooperative MultiA agents. We also obtained
non-cooperative MultiA agents and that was also due to the shadow strategy effect. The
shadow strategy effect is one important problem interfering on the MultiA agents action
policy. Regarding the reciprocity paradigm over the MultiA design, it was particularly
highlighted through preventing a cascading failure effect on networks described by an al-
most neutral degree correlation, aiding the agents on being more successful on mirroring
neighbors current condition. Our results empirically confirm the influence of the Empathy
Module on MultiA Decision System.
List of Figures
FIGURE 3.1 – The general scheme of the MultiA Architecture. . . . . . . . . . . . 32
FIGURE 3.2 – The Learning Module of agent i (represented by the black box) pro-
vides the estimated Well-Being values for each available action if it
is going to be executed in response to an interaction with neighbor
p at match t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
FIGURE 3.3 – The ANNs structure from Learning Module, CS. . . . . . . . . . . . 41
FIGURE 3.4 – a) Agent i and neighbor p are going to interact. The Learning Mod-
ule of agent i provides Qipt(Eip
t, k) and the DS chooses to execute
Action B. b)The agents interact and the PS of agent i calculates
the value of Wi. c) Now agent i will interact with its next neighbor,
neighbor p + 1. The Learning Module provides the new values of
Qip+ 1t(Eip+ 1
t, k). Before the DS chooses one action, the Learn-
ing Module will update (through the Backpropagation algorithm)
the weights of the ANN indexed to action B. After being updated,
the ANN indexed to action B will re-calculate Qip+ 1t(Eip+ 1
t, B).
Now the output values will be sent to the DS. . . . . . . . . . . . . 42
FIGURE 3.5 – CS: The structure of the Empathy Module. . . . . . . . . . . . . . 45
FIGURE 3.6 – CS: The reciprocity assumption and the Empathy Module. . . . . . 45
FIGURE 4.1 – The general scheme of MultiAA. . . . . . . . . . . . . . . . . . . . . 55
FIGURE 4.2 – At match t: 20 agents (4 defectors, 16 cooperators). . . . . . . . . . 56
FIGURE 4.3 – At match t, just before match t + 1: 19 agents (3 defectors, 15
cooperators). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
FIGURE 4.4 – Exp.1, MultiA: crossing strategies and ρf at each match. . . . . . . 60
FIGURE 4.5 – Exp.1, MultiAA: crossing strategies and ρf at each match. . . . . . 60
FIGURE 4.6 – Exp.2, MultiA: crossing strategies and ρf at each match. . . . . . . 61
LIST OF FIGURES xii
FIGURE 4.7 – Exp.2, MultiAA: crossing strategies and ρf at each match. . . . . . 61
FIGURE 4.8 – Exp.2, MultiA final network structure: 7406 agents, 2866 defec-
tors(red nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
FIGURE 4.9 – Exp.3, MultiA: crossing strategies and ρf at each match. . . . . . . 62
FIGURE 4.10 –Exp.3, MultiAA: crossing strategies and ρf at each match. . . . . . 62
FIGURE 4.11 –Agents Final Results. Defectors are represented by the red color and
cooperators by blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
FIGURE 4.12 –Exp.3, MultiA final network structure: 215 agents, 2 defectors(red
nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
FIGURE 4.13 –Exp.3, MultiAA final network structure: 30 agents, all cooperators. 63
FIGURE 4.14 –Exp.4, MultiA: crossing strategies and ρf at each match. Final
values: ρf = 80%, ρd = 52%, ρc = 48%. . . . . . . . . . . . . . . . . 63
FIGURE 4.15 –Exp.4, MultiAA: crossing strategies and ρf at each match. Final
values: ρf = 57%, ρd = 65%, ρc = 35%. . . . . . . . . . . . . . . . . 63
FIGURE 4.16 –Exp.4: graphics produced at matches t = 42 and t = 55. . . . . . . . 67
FIGURE 4.17 –Exp.4: graphics produced at matches t = 60 and t = 68. . . . . . . . 67
FIGURE 4.18 –Different values of m and MultiA performance. . . . . . . . . . . . 72
FIGURE 4.19 –Exp.5: MultiAA and MultiA non-failed simulations regarding dif-
ferent values of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
FIGURE 4.20 –Exp.6: MultiAA and MultiA non-failed simulations regarding dif-
ferent values of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
FIGURE 4.21 –Exp.7a (left) and 7b (right): graphics produced at match t = 55.
Cooperators are represented by the blue color and defectors by the
red color. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
FIGURE 4.22 –Exp.7a (left) and 7b (right): graphics produced at match t = 68.
Cooperators are represented by the blue color and defectors by the
red color. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
List of Tables
TABLE 3.1 – Basic Emotions and the Artificial Basic Emotions of MultiA . . . . 35
TABLE 3.2 – Social Emotions and the Artificial Social Emotions of MultiA . . . 37
TABLE 3.3 – Updating Es1,i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
TABLE 3.4 – Artificial Feelings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
TABLE 3.5 – The Calculation of Ypi: If Mip < 0.5 . . . . . . . . . . . . . . . . . . 43
TABLE 3.6 – The Calculation of Ypi: If Mip >= 0.5 . . . . . . . . . . . . . . . . . 43
TABLE 4.1 – Game Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
TABLE 4.2 – Experimental Parameters, Lattice 2D4N . . . . . . . . . . . . . . . . 57
TABLE 4.3 – Data Graphic Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
TABLE 4.4 – Parameters for the experiments, Sect.4.3 . . . . . . . . . . . . . . . 71
Contents
1 Thesis Introduction and Statement . . . . . . . . . . . . . . 16
1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 Artificial Moral Agents (AMAs) . . . . . . . . . . . . . . . . . . . . . 24
2.1.1 MultiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 MultiA: A Computational Model for Simulation ofEmpathy and Moral Behavior . . . . . . . . . . . . . . . . . . 31
3.1 MultiA Functioning: an Overview . . . . . . . . . . . . . . . . . . . . 31
3.1.1 MultiA and an Interaction Game . . . . . . . . . . . . . . . . . . . . 34
3.2 The Systems of the MultiA Architecture . . . . . . . . . . . . . . . 34
3.2.1 Perceptive System (PS) . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.2 Cognitive System (CS) . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 Decision System (DS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 ALEC Influences on MultiA . . . . . . . . . . . . . . . . . . . . . . . 46
4 Experimental Setup and Results . . . . . . . . . . . . . . . . 48
4.1 MultiA in an Evolutionary Game . . . . . . . . . . . . . . . . . . . . 48
4.1.1 MultiA Experimental Set Up and Amoral Version . . . . . . . . . . . 52
4.2 Results: Lattice Network . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Results: Assortativity Coefficient and Moral Agent Performance 68
4.3.1 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 69
CONTENTS xv
4.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1 Future Work and Relevance . . . . . . . . . . . . . . . . . . . . . . . . 82
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Appendix A – Publications . . . . . . . . . . . . . . . . . . . . . . 90
1 Thesis Introduction and
Statement
Living beings can set forth a complex behavior to such an extent that it stimulates
research. Despite the incongruous reasoning about it, living beings behavior can inspire
a system intuitive design to handle complex matters. The opportunity of conceiving and
modeling artificial moral behavior and empathy arises from the perspective deterioration
of an immaterial soul partaking in the moral behavior process. A biological and philosoph-
ical exam should compose the search for developing a coherent and intuitive bioinspired
computational multiagent architecture that seeks to mimic moral behavior. Likewise, it
has to be scrutinized the embodied emplacement vis-à-vis the pursued theoretical refer-
ences.
In a simplistic approach, moral behavior can be described by the act of following the
set of rules from the group, keeping it cohesive, while new customs progressive annexation
can change that set. By reiterating a custom and naturally incorporating it among our
thoughts, we are actually submitting ourselves to it and establishing it; thereat, the laws
of consciousness essentially came about from custom, and not from nature (MONTAIGNE,
2013 (1580)a). Inadvertently, we may link cooperation and the willingness to do it. Ac-
cording to Tomasello and Vaish (TOMASELLO; VAISH, 2013), if we use an evolutionary
complexion, morality can be presumed as a kind of cooperation, as the association of
skills and reasons for cooperation would provide the emergence of morality. Thus, coop-
eration would demand the individual’s self-interest equalization with that of the others,
or its suppression. Regarding a group composition and its members accompaniment,
Tomasello (TOMASELLO, 2011) ponders over cooperation as a sewing up action, connect-
ing the members of the group. Since the cooperation among living things comprehends
complex matters, it reveals itself as a field of study. Likewise, thinking through utility-
based computational approaches, the emergence of cooperation is not easily achievable
(see Sect. 2).
If we conceive moral behavior as a form of cooperation (by going after the set of rules
from the group) built upon customs among emotions and feelings, it brings up an intuitive
CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 17
line of reasoning to pursue while modeling a bioinspired computational architecture sup-
posed to mimic moral behavior. Thereat, how to design an autonomous artificial agent
able to socially interact and deal with conflicting tasks that require emotional guidance
to be solved? A computational agent that incorporates artificial emotional and moral in-
telligences can lead to ways of achieving cooperation between artificial creatures or from
artificial toward biological.
Regarding the empathy exercise, we can try to divide individuals into three groups:
moral, immoral and amoral. Unpretentiously but in a simple approach, the formers have
the social feeling of empathy properly functioning; and the immoral performs actions that
somehow hurt the established moral code of his/her community. On the other hand, the
latter can be interpreted as moral or immoral, depending on his/her social behavior and
may be characterized by important issues on the mechanism that allows the individual
to put himself/herself in the place of the other, and be sympathetic to his/her circum-
stances. There is neurophysiological basis for this classification: according to Kandel, et
al. (KANDEL et al., 2000) the lateral orbitofrontal cortex seems to participate in mediating
empathetic and socially appropriate responses, then damage to this area would be asso-
ciated with failure to respond to social cues and produce lack of empathy. A mechanism
that allows the existence of empathy is described in Damásio (DAMÁSIO, 2004) through
the cognitive aspect but, as in Proctor et al. (PROCTOR et al., 2013) and Waal (WAAL,
2009), on the account of the emotional standpoint. Therein, if someone succeeds in de-
veloping an artificial moral agent (AMA), would it be more useful the guidance from an
immoral or moral behavior? The answer shall not be easily given as it would sound. We
can think through philosophical questions, premises and practical goals while designing
AMAs, specifically:
1. With respect to AMAs, could we inspire the design from different aspects? Think
through a multiagent task and action policies oriented by three sets of premises:
moral, immoral and amoral. Then, the design may consider which set, and on what
kind of circumstances, fits better within a particular multiagent system (MAS) task:
• The moral agent cares about all members of its group (considered as neigh-bors), even though that may bring its own punishment. But it may also select
actions with the aim of punishing or isolating a constantly defecting neighbor.
However, in general, moral agents tend to cooperate;
• The immoral agent also cares about neighbors, but it is concerned with theprofit it can make through social attachment. It will mostly cooperate with its
immoral group, but can decide to cooperate with others if it is getting isolated
(to prevent complete isolation or high punishment). Immoral will cooperate (if
so) mostly with its partners;
CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 18
• The amoral agent can imitate both moral and immoral agents and be morepractical on taking decisions.
2. For different kinds of agents, what is the meaning of selfish actions? An action can
sound as selfish but can be motivated by non-selfish goals, as to punish a member
of the group to keep it healthy. Thus, selfishness can be executed by all three
agent types, the difference relies within the goal behind that action: if it is to make
profit (fits better with immoral and amoral agents); if it belongs to an uncertain or
exploratory phase (all three); if it is to prevent from deep punishment (all three, but
morals in a lower intensity) or, even, to isolate someone from the group (all three);
3. If the agent can observe and differentiate its neighbors, it can learn to respond dif-
ferently to them and to stop cooperating with defectors. Both immoral and amoral
agents may isolate defectors more easily than moral agents. Moreover, amoral
agents, in order to survive and keep neighbors, can mirror moral and immoral agents
for convenience;
4. Relating to the agents action policies, can it lead to relevant difference to the network
structure (considering each agent as a node and their relations as links between
them)?
• As moral agents are naturally cooperative, they are supposed to keep bondswith moral, immoral and amoral agents. Therefore, even after an elimination
process that punishes defectors, the final population of a moral majority will
contemplate a reasonable number of agents from all sets;
• Accordingly, immoral agents will only care about the advantage, if any, ofkeeping bonds with others. Thus, a continuous defecting (or failing) one will
be easily excluded;
• As the amoral agents will imitate a neighbor, they will add uncertainty as theychange strategy.
5. What would we expect from artificial empathy? Would it be convenient to develop
a decision process that tends to something Machiavellian (MACHIAVELLI, 1985
(1532))? What is best: to maintain a failing neighbor in order to not lose it or just
eliminate it?
• What about having an AMA morally hybrid: immoral towards agents that failor delay the task and moral while interacting with living creatures? Hypothe-
size a group of artificial agents supposed to coordinate activities and priorities
to complete a task (as finding an object in a certain environment), if one agent
from the group stops working or fails, it might be better to isolate it. This
CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 19
may be thought as an analogy to a cut off the artificial empathy feeling about
that one agent. Therefore, the agent that simulates moral behavior will have
the tendency to cooperate but it can be triggered to do otherwise;
• Regarding a hybrid artificial agent that can trigger moral and immoral behav-ior, it might be important to autonomously activate moral action policies with
biological creatures, and immoral otherwise.
Finally, beyond philosophical and biological investigation on morality and human/
machine behavior, practical issues can be addressed through exploring decision making
by AMAs in MAS. While simulating moral behavior, AMAs may be helpful in general
social or domestic assignments, e.g. taking the role of monitoring highly dangerous crim-
inals, people in quarantine or in scenarios where there are social dilemmas to deal with.
Moreover, the artificial empathy from an artificial moral agent could be an additional re-
source in argumentation-based negotiation in MAS. AMAs may also be useful to improve
the responses to general MAS issues stressed by Wooldridge (WOOLDRIDGE, 2009), such
as how to bring up cooperation in societies of self-centered agents; how to recognize a
conflict and then encounter an agreement; or, as highlighted by Matignon (MATIGNON
et al., 2012), the challenges to coordinate the agents activities in order to cooperatively
conquer goals (see Sect. 2).
1.1 Thesis Statement
In Damásio (DAMÁSIO, 1994) emotions and feelings are described as imperative in the
human intelligent decision process. Emotions and feelings importance would also be deci-
sive in helping us to spend less time and to reduce the computational burden while taking
intelligent decisions. In particular, social emotions would help us to enhance the group
and cooperate. We depict herein our bioinspired computational multiagent architecture
(so-called MultiA) composed by artificial emotions, feelings and by an Empathy Mod-
ule responsible for providing an action selection mechanism that rudimentarilly mimics
moral and immoral behavior. It is not trivial to achieve cooperative self-centered agents
in a multiagent task. Our search for mimicking moral behavior, among other things, is
driven to achieve rational agents more likely to cooperate. By responding to the feeling
of empathy, MultiA should be able to produce artificial moral behavior and selecting
cooperative action policies. Our leading hypothesis relies on the idea that cooperation
can emerge from the assistance of emotions and moral behavior during the process of
decision making - even when selfish behavior is rewarded by high reinforcements. The
analogy with moral behavior is promoted through simulating the feeling of empathy. The
importance of such a feeling is its function on regulating MultiA agents priorities, en-
CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 20
abling the selection of actions that may not be the best selfish selection. Non selfish
decision making may be crucial to equalize the interactions among agents and bring up
cooperation. Given to the multidisciplinary complexity of moral behavior, the compu-
tational simulation of moral behavior may be approached through various angles. We
designed a computational architecture to rudimentarilly mimic both moral and immoral
behaviors and developed an Empathy Module to work as the moral/immoral behavior
engine. The Empathy Module is grounded over reciprocity assumptions. Then, the agent
with a cooperative neighborhood will cooperate by reciprocity. Likewise, the agent with a
non-cooperative neighborhood will also be reciprocal by non-cooperating. Therefore the
reciprocity design can carry on selfish behavior, and not only cooperation (see Ch. 4).
Thus, our computational architecture ends up rudimentarilly mimicking both moral and
immoral agents.
Our results indicate the Empathy Module feasibility and, in environments suitable to
the Empathy Module application (see Sect. 4.3.2), we obtained a considerable convergence
to cooperation. We modified the MultiA architecture to design the MultiAA architecture,
supposed to mimic amoral agents. We obtained interesting coherence between our final
results and immoral, moral and amoral action policy.
1.2 Thesis Structure
This thesis comprehends five chapters. In Ch. 1 we introduce a few reflections on
morality and human/machine behavior. In Ch. 2 we present the background and our
project development permeating issues. In Ch. 3 we detail our bioinspired computational
multiagent architecture designed to rudimentarilly simulate moral and immoral behavior,
and, in Ch. 4, we analyze its performance in a multiagent task under different network
structures - we also present a MultiA modified version, the MultiAA architecture. Finally,
in Ch. 5 we reflect upon the final results and suggest future work. In Appendix A we
detail all publications generated in the context of this thesis.
2 Background
An empathy computational simulation encompasses subjects about which there is dis-
agreement and ignorance. There are convincing but opposite/or conflicting explanations
about mind theory, qualia, consciousness, human universals and morality. Given the com-
plexity and undiscovered matters, we lack a broadly accepted theory for unifying those
subjects - besides, it may involve religious taboos. Therefore, for a detailed statement of
an empathy simulation, we would have to stress familiar themes to philosophy, psychology,
neurophysiology and many other fields - hence, in this thesis, our main focus is to detail
the MultiA architecture.
For the sake of feasibility, computational approaches may seek to summarize the-
oretical references and embody a moral simulation from different perspectives (e.g. a
model may try to mimic moral behavior in robotic environments, or may try to provide
answers in an ethically specific domain). Before discussing our perspective and scrutiniz-
ing our MultiA computational architecture (Sect. 3), we introduce a few computational
models that somehow approach the moral simulation (Sect. 2.1), including MultiA itself
(Sect. 2.1.1).
In the moral architecture proposed herein, we used reinforcement learning techniques
(see Sect. 3.2.2.1), which are based in mapping situations into actions to maximize the re-
inforcement, wherein the agent experience is used as parameter (SUTTON; BARTO, 1998).
The reinforcement consists of a numerical sign given to the agent after having executed
a certain action (including action abstention) in a certain state. Through its experience,
by selecting different actions in different states, the agent under a computational archi-
tecture that implements reinforcement learning techniques must learn to execute state
corresponding actions to maximize the expected sum of reinforcements.
Many difficulties arise if the learning agents have no possibility of charing data to
accomplish a task: they will have to choose strategies based on their own experiences
and, through that, learn to coordinate responses. To consider the agents interactions
in an environment, it is important to be aware of Game Theory well-studied challenges.
According to Shoham and Leyton-Brown (SHOHAM; LEYTON-BROWN, 2009), Game The-
ory would comprehend the mathematical study of the agents interactions, and the agents
CHAPTER 2. BACKGROUND 22
predilections would be explained through the function of available options - note that the
agent predilections may change, specially under uncertain situations. We intend to achieve
self-interested moral agents whose predilection comprehends getting high reinforcements
while avoiding bringing negative outcome to the neighborhood - herein we consider to be
neighbors those agents that may directly interact with each other. To simulate moral be-
havior we will adopt an environment described by more than one state and more than one
agent. Game Theory classical domains may provide environments and interaction descrip-
tions to test the moral agents under our computational architecture. Then, a game from
the literature will be chosen to define our agents environment and moral interactions, the
terminal state and possible agent scenarios. Crucial Game Theory concepts came from
Neumann and Morgenstern (NEUMANN; MORGENSTERN, 1944), such as analysis about
environmental possibilities, difficulties and adequate agent policy response to accomplish
goals.
Matignon (MATIGNON et al., 2012) describes some challenges that have to be over-
passed by the agents (that do not exchange data) coordinate their action selection in
order to provide a coherently coordinated behavior, such as the alter -exploration problem
(the interference over the agent learned policy caused by other agents environment explo-
ration). The convergence to a cooperative action policy in self-play (each running agent
follows the very same set of code descriptions) and in general-sum (allows cooperation
and the agents received reinforcements that may assume different values (MATIGNON et
al., 2012); (GREENWALD et al., 2005)) stochastic games is an issue: one problem is how to
achieve cooperative behavior when the Pareto-optimal solution does not coincide with the
Nash equilibrium, such as in the Prisoner Dilemma Game. The Nash equilibrium (NASH,
1951) corresponds to a collection of joint strategies (to all agents in the environment) such
that none agent may get a better outcome (by changing strategy) given that the others
will continue seeking their equilibrium strategies (choosing their best responses). The
Pareto-optimal solution occurs when it does not exist another crossing actions (agents
combination of actions) possibility in which the utility of one agent may increase without
decreasing the other agent utility.
When agents share an environment but do not exchange data, they may be actually
ignoring each others presence. Thus, those agents end up as part of the environment
itself, which means the transition probabilities related to the agent actions/environmental
outcomes are non-stationary. Therefore, the agents actions can be influenced by the joint
history of action selection, as the history influences the future transition probabilities
when the agent re-visit a state.
Regarding the agent itself, a deterministic game may present to the agent as non-
deterministic: rewards or stochastic transitions may be induced by different sources as
noise or non-observable factors, being a challenge to the agent to distinguish the pro-
CHAPTER 2. BACKGROUND 23
voking changes over the reinforcements it receives (if noise or other agents actions pro-
moted those changes) (MATIGNON et al., 2012). For instance, the coordination game from
Boutilier (BOUTILIER, 1999) explores different mis-coordination examples: the possible
joint agent actions determine various rewards or penalties and also lead to different states.
Since we intend to run a considerable number of agents under our architecture in
self-play, an important issue stands: how to obtain a final outcome in which the agents
conquer the best possible individual result that does not bring a bad outcome to their
neighbors? Many times the best individual outcome will not conciliate with the best social
one (the best outcome for each agent if all of them choose to cooperate/ rejects the free-
riding). In general, especially in utility-based computational approaches, cooperation is
not easily modeled. As an illustrative example, public goods provide an analogy to analyze
natural societies relations and are most known for two main features: public goods are
public and not wasted through consumption. In natural societies (also within an artificial
scope that use them as a metaphor), unfair relations are possibly common, such as an
agent taking advantage of another agent social commitment. If public services are freely
available, what would endorse other strategy than free riding? The social commitment
may be crucial to accomplish the best social outcome. For example, by paying the taxes,
we intend to keep the Public Systems functioning, but some of us are not actually paying
for anything - consider the free rider problem which, in essence, has been considered since
Plato (PLATO, 2000(IVBC)), Montaigne (MONTAIGNE, 2013 (1580)b) and many others,
and, more recently, by Cornes and Sandler (CORNES; SANDLER, 1986). Since cooperating
within the group generally result in a cost to the cooperator and defectors benefit from
common resources (WARDIL; HAUERT, 2014), it emerges a dilemma between the agent’s
self-interest and the group’s maintenance. In fact, public goods games are a metaphor
to describe trivial relations in natural societies and generalize the Prisoner’s Dilemma
Game (PDG) to an arbitrary number of individuals - see Hardin (HARDIN, 1971) and
Wakano and Hauert (WAKANO; HAUERT, 2011). Not unusually, commitment is required
to accomplish the best social outcome: individuals must keep choices that only as a group
will render that particular outcome. To attain the best social outcome, agents have to
commit themselves, pursue the specific action policy that only as a group will accomplish
for the best. If one agent suddenly changes its action predilection, the other may face the
worst possible result: e.g. if one of the agents stop cooperating while playing the Prisoner
Dilemma Game. Axelrod (AXELROD, 1984) provides a classical reflection example of
cooperation, not only regarding the Prisoner Dilemma Game but, also, the cooperative
behavior placement in the chains of relations (exchange) between different powers.
Through interacting in its multiagent environment and learning the possible outcomes,
a learning agent (that adapts its action selection to environment) will stabilize its action
policy through the influence of other agents actions. Other agents strategies may put
CHAPTER 2. BACKGROUND 24
forward environmental uncertainties and, if there is no data sharing, those uncertainties
are considered to be part of the environment itself. On the other hand, if a particular
agent never triggers any change over the others, to them, it may be as if that particular
agent never existed (neither as part of the environment). In an environment with various
states, if agent A usually collides with agent B in a particular state, then, one of them
(or even both) may end up avoiding that particular state and, depending on the game
possibilities and alternative path chosen, the agents may never find their best path to
accomplish a task. That happens because they can be unable to coordinate their action
selection while taking each other as part of the environment itself.
If two agents within the same environment are rational, once both have learned the
environmental dynamics, they will select actions in accordance with what one expects
to come from the other - even if indirectly considering the other agent, since it may be
considered as part of the environment itself. And those selected actions are expected to
be the agent better option. Then, we have agents that will try to give their best shot in
response to what it is expected to be the other agents best shot (see, for instance, the
Minimax theorem from Neumann (NEUMANN, 1928)). Therefore, since in general rational
agents are seeking to choose the best selfish action, how is it possible to achieve a better
social outcome instead of an individual one? How to obtain a rational and cooperative
agent while avoiding ad hoc artifices?
We seek to provide a possible approach through an architecture that does not exchange
environmental data (such as the selected action). We will use a game of incomplete
information: the agents will not have access to the neighbors actions or reinforcements.
But, at the same time, moral agents must morally behave while interacting with other
agents. The only data that will be shared by our agents consists of the neighbor ID: each
interacting agent will identify itself. Then, each one of them will be able to keep a record
of the neighbor ID and the reinforcements from interacting with that very same neighbor.
2.1 Artificial Moral Agents (AMAs)
According to Wallach and Allen (WALLACH; ALLEN, 2008), Artificial Moral Agents
(AMAs) would require the ability of accessing many options and working through differ-
ent evaluative aspects to present a good performance in a human moral domain - moreover,
it would be expected that AMAs would not deform it. Still addressing the computational
simulation of moral behavior, Wallach and Allen (WALLACH; ALLEN, 2008) emphasize the
advantages a machine could have over a human brain to respond to moral dilemmas, such
as the power of working through a higher number of matching possibilities and the exemp-
tion of sexual or emotional interferences. Machine could use those advantages to come
CHAPTER 2. BACKGROUND 25
up with a better answer than those usually provided by humans. Wallach (WALLACH,
2009) analyzes moral dilemmas brought about by philosophers and contrasts what people
morally accept to do in order to save lives and the number of lives that could actually
be saved by them - there are cases in which the human moral judgment will not lead to
saving the highest number of lives (that case, could we say the human moral judgment
failed?). The AMA designers have to deal with those tricky situations (should we follow
the human moral judgment while developing our code or design utilitarian machines?)
and stick with a perspective while designing the machine code.
To be that way, it is decisive that the designers themselves reflect over their beliefs,
prejudices, perspectives (such as our bias to identify people, the cross-race effect (FEIN-
GOLD, 1914)) and taboos to avoid embodying it over the machines design. For instance,
Roth (ROTH, 2013) detailed the issue that technology and other mechanisms designed to
represent the skin tone did not evolve to replicate the skin color of non-Caucasian people.
Nowadays technology is currently bringing up more issues to the Ethics of Artificial Intel-
ligence. Another example regards to sex robots from TrueCompanion (TRUECOMPANION,
2010): would it badly interfere on the humans empathy? We are so deeply merging to
technology that it is not required that machines embody moral behavior to affect our
moral system, technology already changed our moral system.
Wallach (WALLACH, 2015) pores over the latest technology resources and potentialities
(including killing possibilities) while addressing responsibility issues of developers and
users. The apprehension of AMAs causing negative influences over humans is mentioned in
Bringsjord et al. (BRINGSJORD et al., 2006). Through enabling the formalization of a moral
code, deontic logic would allow the writing of theories and dilemmas in a declarative way.
That would allow the specialist analysis, then being a method of restricting the machines
behavior in ethically sensitive environments. Bello and Bringsjord (BELLO; BRINGSJORD,
2012) also emphasize a concern that restrictions should be inserted over the machines
design and those should be related to the human cognition. For instance, the moral
common sense and intuition should take part in that. Bringsjord et al. (BRINGSJORD et
al., 2006) present modifications over a mind reading model from Bello et al. (BELLO et al.,
2007) and, from their results, they conclude that we will have to deal with the confuse
human moral cognition to build AMAs that productively interact with humans. They also
ponder that moral machines should have a mechanism similar to common sense. That
adds matter to the debate about Lethal Autonomous Systems, as Arkin points out in
reflections in (ARKIN, 2013) and Asaro in (ASARO, 2012).
Computational simulation of moral behavior may be approached through diverse con-
texts. To exemplify the theme diversity, we detail three models:
1. LIDA Model (WALLACH, 2010), (WALLACH et al., 2008), (WALLACH et al., 2010),
CHAPTER 2. BACKGROUND 26
(FRANKLIN et al., 2014) and (FAGHIHI et al., 2015). As a computational and concep-
tual model of human cognition, LIDA is described as a cognitive architecture de-
signed to select an action after dealing with ethically pertinent information. There-
fore, the LIDA model is expected to be able to deal with moral decisions. According
to Wallach (WALLACH, 2010), an artificial moral agent under the LIDA architecture
would be designed to, within the available time, select an action while taking into ac-
count the maximum possible quantity of ethically relevant information. This model
was influenced by the Global Workspace Theory (GWT) (BAARS, 1993 (1988)) and
by the Pandemonium Theory (JACKSON, 1987) for the automation of action selec-
tion. GWT would have detached itself as a human cognitive processing theory given
to its interpretation of the nervous system as distributed in parallel with different
specialized processes; and some coalitions of such processes would allow the agent
to build a sense from its sensorial data (which would come from its current envi-
ronmental situation). Other coalitions would inherit results from the sensorial data
processing that would have competed for attention and would have won. Those
would occupy the global workspace (GW), whose content would be transmitted to
all other specialized processes. Under a functional point of view, the GW content
would be conscious content and serve to recruit other processes to be used for action
selection in response to the current situation. In both GWT and LIDA, learning
would require and work through attention and would come in each conscious trans-
mission. The LIDA model is based on a cognitive cycle. Then, the human cognitive
processing would occur via continuous interaction of cognitive cycles, which would
happen asynchronously. Various asynchronous cycles could have different simultane-
ous parallel processes but that should respect the serial nature of the consciousness
process, important to keep a stable and coherent world scenario. During each cy-
cle, the LIDA agent would give sense to its current situation through updating its
internal and external environmental representations. Through a competitive pro-
cess, it would be decided which representing portion of the current situation should
receive attention. That portion would then be transmitted, becoming the current
content of consciousness and enabling the agent to choose an adequate action and
execute it. The feelings in the conscious flow would participate within many ways of
learning. New representations would be learned when generated in a cognitive cy-
cle and those that were not sufficiently stressed during the concurrent cycles would
disappear. Feelings would induce the action and the activation of environmental
schemes. Thus, the behavior selection would be influenced by its relevance over
the current situation, by the nature and importance of associated feelings and by
their relation with other behaviors, some of them being necessary to the current
behavior. To be executed, the selected behavior and feelings would be transmitted
to the Sensory-motor memory. There, the feelings would participate in the action
CHAPTER 2. BACKGROUND 27
execution, as feelings can influence parameters as strength and speed.
2. EthEl Model (ANDERSON; ANDERSON, 2008b), (ANDERSON; ANDERSON, 2008a)
and (ANDERSON; ANDERSON, 2011), which application is related to prima facie
duties (duties that are mandatory, unless overpassed by stronger ones), was imple-
mented and tested within the notification context. This means an analysis of when,
how often, and whether to run a notification about a medicine to a particular pa-
tient. A typical dilemma example comes from the rejection of a patient of taking
the recommended medicine from a doctor. In what situation should the professional
insist the patient changes his mind? If it is crucial that the patient do take the
medicine, how many times should it be mentioned to the patient and when should
the doctor be notified about the patient refuse? EthEl (Toward a Principled Ethical
Eldercare Robot) (ANDERSON; ANDERSON, 2008b), (ANDERSON; ANDERSON, 2008a)
and (ANDERSON; ANDERSON, 2011) is a model trained over a deontic context (con-
cerning the duties of the health care professional), and is a prototype that applies
ethical principles (established by learning) to choose an action. The prototype would
have learned an ethical principle in its action taking in a particular kind of dilemma,
the one that relates to prima facie duties. The duties would embody a philosophical
problem relating to the absence of a decision procedure when the duties provide
conflicting orientation. The inspiration to EthEl come from Rawls (RAWLS, 1951).
The ethical dilemmas were presented to the prototype by a ordered set of values
to each possible solution, whose values would reflect violation or duty satisfaction.
EthEl uses the Inductive Logic (LAVRAC; DZEROSKI, 1994) to measure the decision
principle that has to be used to deal with the proposed dilemmas. EthEl would
have discovered a consistent decision principle that would indicate the correct ac-
tion when specific duties place in different directions a particular kind of dilemma.
Then, the professional should question the patient refuse if she/he is not completely
autonomous and when there is no violation of the duty of non-maleficence or severe
violation of the duty of beneficence. But EthEl would have established that vio-
lations over the duty of non-maleficence should impact more than violations over
duty of beneficence. The authors ponder that EthEl could also be used to other
sets of prima facie duties to which there is agreement among the specialists about
the correct actions.
3. To reflect about the Moral Theory vis-à-vis the conflict Generalism versus Partic-
ularism, Guarini (GUARINI, 2006) and (GUARINI, 2012) draws insights from Dancy
(DANCY, 2010) while pondering if the moral reasoning, including learning, could be
done without the use of moral principles. If so, models of artificial neural networks
(ANN) could provide indications of how to do it, given the fact that ANNs would
be able to generalize new cases from those previously learned - and do it without
CHAPTER 2. BACKGROUND 28
principles of any kind. Thereby, ANNs are modeled to classify and reclassify cases
with a moral purport, being the output (acceptable or not) an answer to moral
dilemmas attached to the questions kill or let die. Dancy (DANCY, 2010) empha-
sizes a mismatch between moral principles and the importance of the context for the
analysis of what is morally acceptable: moral decisions would depend on the context
and situation. The subject kill or let die from Guarini (GUARINI, 2006), (GUARINI,
2012) would have come from a modified analogy from Thomson (THOMSON, 1971) -
where, relating to an abortion from being pregnant by rape, it takes place a discus-
sion about the difference between murderer and letting die. The modified analogy is
as follows: there is only one person capable of keeping a particular man alive. That
person is kidnapped and placed to filter the man’s blood and should stay there,
connected to him, for nine months. After that, the man will survive and the person
may be free from him. In short: after using violence, a life became dependent on
the other. Then, would it be morally acceptable or not that the person decided to
disconnect from the man before he could be saved (leading to the man’s death)?
According to Guarini (GUARINI, 2006), (GUARINI, 2012), the results suggested that
the classification of non-trivial cases from the absence of queries about moral prin-
ciples would be more plausible than it might be supposed at first sight, although
important limitations suggest the need for principles. Regarding a reclassification,
which would be an important part of the reasoning in humans, simulations indicated
the need for moral principles.
The approaches from items 2 and 3 fall into an specific application domain: EthEl is
tested in a notification context (analysis of when, how many times and if a notification
shall be done). Item 3 relates to the concern of providing as output an answer to moral
dilemmas related to the questions kill or let die. Research driven to deal with moral
dilemmas is particularly important because it may be useful to design a morality mech-
anism in machine learning (see Sect 5.1). Finally, the LIDA model is a complex project
that was still under development when we started our project. Since none of the studied
works matched our intentions, we searched for other bases (see Sect. 2.1.1) to guide our
moral architecture design.
2.1.1 MultiA
We expect to obtain relevant decision making toward cooperation in MAS tasks by
designing a computational architecture endowed with artificial emotions, feelings and
moral behavior (through the empathy embodiment). We started designing our bioinspired
computational multiagent architecture by using the ALEC architecture from Gadanho
(GADANHO, 2003) as essential reference - we describe such an influence in Sect. 3.3. Our
CHAPTER 2. BACKGROUND 29
multiagent architecture is called MultiA since it is driven for usage in multiagent systems
(Multi) and the Alec architecture (−A) inspired it. To design a bioinspired computationalarchitecture, we studied biological and philosophical references while seeking to compu-
tationally mimic rudimentary mechanisms related to both moral and immoral behaviors.
Work has been done to establish the crucial role of emotions during the process of
intelligent decision making and its importance in filtering information and awakening
our attention mechanisms (see the Somatic Marker Hypothesis from Damásio (DAMÁSIO,
1994)). The emotions and feelings vital role in rational decisions embraces social emotions
(such as sympathy and its associated feeling of empathy) and are analyzed from the aspect
of social interaction and homeostatic goals (DAMÁSIO, 2004). Damásio (DAMÁSIO, 2004)
defined social emotions using the concept of moral emotions by Haidt (HAIDT, 2003) -
we will follow it while designing the artificial emotions. Haidt (HAIDT, 2003) explains
emotions as responses to a class of events perceived and understood by the self and so
emotions usually provoke action tendencies. It is particularly important to differentiate
social emotions from other emotions: social emotions trigger action tendencies during
situations that do not represent direct harm or benefit to the self (disinterested action
tendencies), other emotions, on the other hand, are more self-centered.
The brain dexterity of internally simulate emotional states, establishing a basis for
emotionally possible outcomes and emotion-mediated decision making, are also scruti-
nized in Damásio (DAMÁSIO, 2004). Therefore, internal simulation takes place during
the process along which sympathy emotion turns into the feeling of empathy. The social
interaction would be done via mirror-neurons (discovered in the premotor cortex area of
macaque monkeys by Pellegrino et al. (PELLEGRINO et al., 1992) and Rizzolatti et al. (RIZ-
ZOLATTI et al., 1996)) by making our brain internally simulate the movement that others
do while in our field of vision, for example. Such a simulation would enable us to predict
the required movements to establish communication with the other (which will have its
movements mirrored). Finally, the internal simulation about our own body (e.g. when
we internally simulate ourselves executing different activities) could be as well related to
the mirror-neurons.
Gallese and Goldman (GALLESE; GOLDMAN, 1998) reflects over the human aptitude of
simulating the mental states from others, and thus understanding their behavior, assigning
to them intentions, goals or beliefs. There is a suggestion that what might have evolved
to such a capacity is an action execution/observation matching system. Likewise, a class
of mirror neurons would be playing its role on that. Moreover, a possible activity of the
mirror-neurons would be to promote learning by imitation. Nowadays there exists the
agreement that normal humans develop the capacity of representing mental states from
others (the system representation oftentimes receives the name folk psychology). Finally,
there is the consideration that fitness could be evaluated from such ability, as detecting
CHAPTER 2. BACKGROUND 30
another agents goals and inner states can help the observer to predict the other future
actions, which can be cooperative or not, or even threatening (researchers are continuously
providing new insights from Mind Reading related experiments).
While holding an emotional background, the dynamics involved in the empathy origin
can be approached from a cognitive aspect (WAAL, 2009) (PROCTOR et al., 2013). The
social emotion of sympathy feeds the feeling of empathy. But the social emotions benefit
from the internal simulation improved by mirror-neurons that internally mirror the situ-
ation of the other (learning by imitation may be related to the mirror neurons activity).
But the feeling of empathy will be less or more intense depending on the importance of a
particular other agent (DAMÁSIO, 2004). We seek to maintain negative emotions in low
levels and the positive ones in high levels (DAMÁSIO, 2004) and the purpose of homeostasis
would be to product a state of life better than neutral, to accomplish what we identify
as well-being. Following the idea, MultiA will establish its preferences considering its
own and peers well-beings. While designing the Empathy Module (see Sect. 3.2.2.2) of
MultiA, we used the mirror-neurons as inspiration. Even though MultiA does not mir-
ror its neighbors movements, MultiA mirrors its own emotions and preferences over the
neighbor. Then, the current emotions of MultiA itself are applied while building an
expectation about the well-being of a neighbor - and during that process, MultiA con-
sider that the neighbor shares the very same emotional preferences (in Sect. 3.2.2.2, see
I = {{IP}, {IN}}).
Regarding the feeling of empathy, we have also been guided by the differentiation of
three types of agents: the moral, immoral and amoral. By rudimentarilly mimicking those
three patterns of morality, our agents will display different social interaction policies. The
moral (MultiA, moral agents) tries not to take advantage from the others and cooperate;
the immoral takes advantage from the others more easily and does not cooperate (MultiA,
immoral agents). Unlike the others, the amoral is not guided by social emotions and
feelings (MultiAA, Sect. 4.1.1). The entireMultiA architecture will be analyzed in Sect. 3.
3 MultiA: A Computational Model
for Simulation of Empathy and
Moral Behavior
We propose the MultiA computational architecture designed from reflections over the
relevance of moral behavior in the search for a rational and cooperative biologically inspired
artificial agent. We hypothesize that the simulation of emotions and moral behavior aiding
the computational architecture to make decisions favors cooperation even in face of high
reinforcements to selfish behavior. The analogy with moral behavior is implemented
through a simulation of empathy, thus the agent can have the ability to select actions that
may not be the best selfish option, but that help to enhance the interactions among agents.
Since MultiA agents have the empathy more accessible for agents whose interactions have
resulted in positive reinforcements, the reciprocity assumption introduces both moral and
immoral agents, since the mechanism of action selection can be moved from two aspects.
The first is related to cooperation, once the particular MultiA agent has a cooperative
neighborhood. Then, the agent will cooperate by reciprocity. The second is related to
non-cooperation, since the surrounding is non-cooperative (non-cooperative MultiA agent
by reciprocity). Then, MultiA rudimentarilly mimics both moral and immoral agents.
MultiA consists of three main systems (Fig. 3.1): the Perceptive System (PS), the
Cognitive System (CS) and the Decision System (DS). The interactions among the
three systems will result on action selection derived from sensations triggered by the
environment while provoking environmental changes that will, on its turn, trigger new
sensations, and so on. As input to PS, MultiA has artificial sensations that are triggered
by reinforcements and indexed by the agent it is interacting with.
3.1 MultiA Functioning: an Overview
While designing the computational architecture from Gadanho (GADANHO, 1999), we
took into account the animal behavioral characteristics analyzed in Hallam and Hayes (HAL-
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 32
FIGURE 3.1 – The general scheme of the MultiA Architecture.
LAM; HAYES, 1992) that could inspire a robotic design. Among those animal characteris-
tics, there is homeostasis, the biological capability of body auto-regulation, such as keeping
the temperature or the cells pH, in such a way that the internal conditions are kept under
a stable and regular basis. Through its research on organic mechanisms of biological reg-
ulation, Claude Bernard (1813-1878) used the concept Milieu Intérieur, the homeostasis
precursor. Later, Cannon (CANNON, 1932) described the body steady states and some
mechanisms to control them; Cannon (CANNON, 1932) also provided an analogy between
social processes and body regulation. Therefore, it may be natural the association of
homeostasis with a neutral, balanced, state. Nevertheless, according to Damásio (DAMÁ-
SIO, 2004), life regulation would be designed to comprehend the homeostatic efforts to
produce that state that we understand as well-being. The environment and our bodies
evoke ongoing homeostatic reactions that will keep influencing us and our actions from
which we will keep up changing our environment and ourselves. Homeostatic reactions
may continue reflecting upon us even after the particular situation that caused them has
ended.
Through the inspiration from biological homeostasis and from Gadanho and Custó-
dio (GADANHO; CUSTÓDIO, 2002), see Sect. 3.3, we designed the MultiA Perceptive Sys-
tem. The artificial sensations feed emotions, feelings and, afterwards, through a weighted
sum on feelings, the general environmental and internal perspective of a MultiA agent i
(named Well-Being, Wi) about its own performance. MultiA follows its artificial home-
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 33
ostatic goals, thus, it selects those actions that are expected to keep the feelings and
emotions within a threshold, therefore, achieving high Wi levels. The history of a MultiA
agent reflects on the current values of its Perceptive System and on the learning of match-
ing emotional responses to actions. Therefore, the feelings maintenance on a threshold
relies upon the selection of adequate actions in response to the environment. Wi is mod-
eled as a function of the feelings, and internally represents the general condition of agent
i. It is calculated with normalizing weights such that its value falls in the range [−1, 1].From another perspective, Wi enlightens how suitable has been the action selection (from
DS) concerning the reinforcements received by the MultiA agent i itself and the remain-
ing feelings, as empathy. In addition to Wi, MultiA also produces Wpi, a prospect about
the current situation of other agents.
MultiA then uses a set of its own emotions to provide itself a prospect about the
current situation of other agents. Although there is some controversy about it (see for
instance Hickok (HICKOK, 2014)), we used mirror neurons ( (PELLEGRINO et al., 1992),
(RIZZOLATTI et al., 1996)) as inspiration on the mechanism for projecting MultiA own
emotions to mirror other agents situation. Actions related to high empathy are designed to
be avoided, since we consider that when an agent rouses high empathy levels it is because
the agent itself may be disturbing the performance of the others. For the design of the
Empathy module, we used the utilitarian calculus from Bentham (BENTHAM, 2007 (1789))
as guideline. This way, MultiA agents have the empathy more accessible for agents whose
interactions have resulted in positive reinforcements. Furthermore, if a MultiA agent has
been receiving a high number of positive reinforcements, it is also more susceptible to
cooperate. The empathy is represented by S4,ip: feeling number 4 of MultiA agent i for
neighbor p; on Figure 3.1, see feeling number 4. As we designed the empathy to reflect
the impact of the action selection of MultiA on its neighbors, the higher the empathy
for a specific neighbor p, the lower is Wi, all the remaining variables that feed Wi kept
constant. This means that, at a certain point, the MultiA agent may not have been
selecting its actions appropriately, since it may be affecting negatively on the particular
neighbor p, thus high empathy levels are an indication of inadequate action selection.
Selected actions are considered adequate when they do generate positive reinforcements
while not provoking high empathy levels. If p fires high empathy on i, p may be getting
low reinforcements and therefore its neighbors, as i, should check their actions.
Thus, MultiA is designed to seek those actions that will not increase its levels of
empathy. Then, after applying the current emotions (from the PS) as input, the CS uses
artificial neural networks (ANNs) to estimate the resulting Well-Being if the corresponding
action is to be selected. The CS will then deliver the outputs from all ANNs to the DS
to choose an action.
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 34
3.1.1 MultiA and an Interaction Game
Through following utility-based computational approaches, it is not trivial to model
artificial agents that reject the opportunity of taking advantage of the others actions (e.g.
the selection of actions only driven to obtain the highest reinforcements, no matter the
consequences to others) and still commit with the choice of cooperating. In Sect.2 we in-
troduced the public goods subject (including the related issue of, somehow, take advantage
of the others actions, the free-riding) and mentioned that the Prisoner’s Dilemma Game
(PDG) is generalized through public goods games (HARDIN, 1971), (WAKANO; HAUERT,
2011). We developed MultiA with the aim of providing an architecture extensible to
different domains and to show cooperation as an emergent property. Without loss of
generality, let us hypothesize that each MultiA agent i is going to play the Prisoner’s
Dilemma Game with another MultiA agent p. Hence, each MultiA agent will have to
decide if it is going to cooperate or not with the other (to defect) - and a defector is
highly rewarded for unilateral defection (defection vs. cooperation). In Sect. 3.2 we detail
the MultiA architecture itself while in Ch. 4 we present the MultiA agents, the artificial
learning agents under the MultiA architecture in a multiagent environment and task.
3.2 The Systems of the MultiA Architecture
3.2.1 Perceptive System (PS)
We consider a model where reinforcements are non negative. As long as our research
is grounded on moral behavior, we intend to test and study MultiA agents interacting
among themselves. Thus, each MultiA agent i will keep a list of every agent it has
interacted with (the neighbors of i). Sensations fall in the range [0, 1] and, together
with the history provided by the CS, give rise to artificial emotions. MultiA artificial
sensations are triggered by reinforcements, and by an identifying index for the neighbor it
is interacting with. Indexing is defined in the following way: every MultiA agent has an
identifying index i = {1, ..., N}, and the neighbors relating to each agent i also have anidentifying index p = {1, ..., Z}. A given p value thus refers to a particular neighbor thatis interacting with i. The CS delivers five sets of data (history of agent i) to the PS:
1. The current number of neighbors of agent i;
2. The reinforcement history of agent i;
3. The number of times agent i has interacted with each neighbor p;
4. The number of times interactions with p ended up in positive reinforcements (Mip);
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 35
5. The value of Ypi, defined as follows. The CS accesses the current emotions from
PS. Then, the Empathy Module EM (from the CS) produces Wpi: an assumption
of i about the current condition of neighbor p. MultiA will then respond to the
current condition of neighbor p (Wpi), producing Ypi. If the neighbor p is supposed
to be facing low reinforcements, MultiA may have its empathy raised (depending
on the Ypi value) to select less selfish actions and try to cooperate with the raise of
the reinforcements of p. For details, see Sect. 3.2.2.2.
There are basic emotions {Eb1,i, Eb2,i, ..., Ebd,i} and social emotions {Es1,i, Es2,i, ..., Esh,i},all normalized to [−1, 1]. The basic emotions are associated to the general condition ofthe MultiA agent itself. Social emotions are stimulated by neighbors and by the impact
of the own agent actions on those neighbors. The artificial feelings {S1,i, S2,i, ..., Sz,i} alsofall in the range [−1, 1] and are fed by emotions. We used the reference (DAMÁSIO, 2004)as inspiration while shaping the artificial basic emotions. Table 3.1 explicits the particular
biological emotions that inspired each MultiA basic emotion.
TABLE 3.1 – Basic Emotions and the Artificial Basic Emotions of MultiA
Biological Basic Emotion Artificial Basic EmotionAnger Eb1,i
Sadness Eb2,iSurprise Eb3,i
Fear Eb4,iHappiness Eb5,i
Disgust Eb6,i
The artificial basic emotions are:
• Eb1,i: increases with the number of interactions of i in the same match. A matchis defined by every i interacting only once with each and all of its neighbors, and
interactions are always ordered w.r.t. neighbor agent index. Once all neighbors have
interacted, the match ends. It is calculated according to Eq. 3.1:
Eb1,i = −1 + 2 ∗ (mti/V 1i ) (3.1)
where t represents the (possibly unfinished) current match, V 1i is the initial number
of neighbors of agent i at the first match, and mti is the number of concluded
interactions of i with its neighbors during current match t.
• Eb2,i: indicates the difference between the sum of reinforcements rt−1i , received by i
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 36
during the match t− 1 (Eq. 3.2) and a threshold value R0,i (range [0, 1]).
rt−1i =
V t−1i∑j=1
Rt−1i,j (3.2)
where V t−1i is the number of neighbors of agent i at match t − 1 and Rt−1i,j is thereinforcement of i after interacting with neighbor j at t− 1.
Eb2,i is then calculated as
Eb2,i = rt−1i −R0,i (3.3)
• Eb3,i: at each match t, it decreases with the number of lost neighbors (a neighbor islost when it stops interacting), Eq. 3.4:
Eb3,i = 1− 2 ∗ ((V 1i − V ti )/V 1i ) (3.4)
where V ti is the number of neighbors of agent i at match t. Note that, as MultiA
social emotions are designed to be triggered by social interaction, we assume that
V 1i > 0.
• Eb4,i: indicates the difference between the current sum of reinforcements rti and athreshold value. That is measured by comparing the current sum of reinforcements
rti and R0,i, Eq. 3.5:
Eb4,i = rti −R0,i (3.5)
• Eb5,i: the current rti during the current match t, see Eq. 3.6;
Eb5,i = −1 + (2 ∗ rti). (3.6)
• Eb6,i: it always starts a match with value = 1 and only decreases (during the currentmatch t) if the interaction with a neighbor does not render positive reinforcements,
see Eq. 3.7:
Eb6,i = Eb6,i − 2 ∗ (1/V 1i ) (3.7)
In contrast with basic emotions, social emotions are driven by the neighbors and by
the influence of the MultiA agent on those neighbors. The same way as we did about
the basic emotions, we used the reference (DAMÁSIO, 2004) as inspiration to shape the
MultiA artificial social emotions. Table 3.2 explicits the particular social emotions that
inspired the artificial social emotions of MultiA.
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 37
TABLE 3.2 – Social Emotions and the Artificial Social Emotions of MultiA
Biological Social Emotion Artificial Social EmotionPride Es1,i
Gratitude Es2,iCompassion Es3,iSympathy Es4,i
The artificial social emotions of MultiA are:
• Es1,i: emphasizes those behaviors relating to the social context that did not originatepositive outcomes to i but, still and to a minor degree, increases together with
positive reinforcements of the agent. That way, Es1,i increases at any change on Eb5,i
and, in a greater degree, at any change on Eb6,i. It always starts a match with value
= −1; and s < (2/V 1i ) is a weight used to establish the importance of Eb5,i, see Table3.3.
TABLE 3.3 – Updating Es1,i
At any change on The value of Es1,i:Eb6,i E
s1,i + (2/V
1i )
Eb5,i Es1,i + s
• Es2,i: the average number of variations of Eb5,i per iteration, normalized to the range[−1, 1], from the first match until the current one. It starts with zero;
• Es3,ip: calculated according to Eq. 3.8, is the average number of variations of Eb5,iper iteration with neighbor p:
Es3,ip = −1 + (2 ∗Mip) (3.8)
where Mip is provided by the CS and is the average number of variations of Eb5,i
(i.e., average number of increases in rti) per interaction with neighbor p.
• Es4,ip: is doubly fed, both by the reciprocity value addressed to neighbor p (Ypiprovided by the CS) and by the empathy feeling S4,ip
t−1 (see Table 3.4) by p right
after the last interaction with p (during last match at t− 1), a residual value fromthe past influencing the current emotion:
Es4,ip = (ca ∗ S4,ipt−1) + ((1− ca) ∗ Ypi) (3.9)
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 38
where ca = [0, 1] is a weight used to establish the importance of residual values of S4,ipt−1.
The EM from the CS sends the Ypi value of agent i to p, see Sect. 3.2.2.2.
Once in the PS, the Ypi will stimulate the social emotion Es4,ip (social emotion number
4 of agent i for neighbor p; on Figure 3.1, see social emotion number 4), then reaching the
empathy feeling S4,ip. The emotion Es4,ip is fed both by Ypi, and by the empathy feeling
by p right after the last interaction with p, a residual value from the past influencing
the current emotion. Then, right before a new interaction with p, the empathy feeling
is fed both by the emotions Es4,ip and Es3,ip (social emotion number 3 of agent i for
neighbor p; on Figure 3.1, see social emotion number 3). The last summarizes the utility
of neighbor p: the average number of times interacting with neighbor p has resulted in
positive reinforcements.
The artificial feelings = {S1,i, S2,i, ..., Sn,i} fall in the range [−1, 1] and arise through aweighted sum of emotions. The weights are set according to the relevance of each emotion
to the domain. Table 3.4 presents the set of emotions that feed each feeling (Eb1,i does not
feed any feeling). Because of its feeding set of emotions, the only feeling that adapts to
the interacting neighbor p is S4,ip. The well-being Wi uses feelings to internally represent
the general situation of agent i. It is calculated (Eq. 3.10) with normalizing weights so
that the final value falls in the range [−1, 1]:
Wi =n∑
j=1
ajSj,i (3.10)
where n is the number of feelings. The weights aj are set respecting the relevance of each
feeling to the domain. For simplification, the p index of S4,ip is omitted from Eq. 3.10. Wi
measures the performance of MultiA agent i in the environment, considering the empathy
feeling for p. If the empathy reaches high levels, Wi will be low: probably the last selected
actions may be causing bad outcomes to p; therefore the well-being Wi of agent i should
be low, even though its reinforcements may be high.
TABLE 3.4 – Artificial Feelings
Feeling Fed by EmotionS1,i E
b2,i and E
b3,i
S2,i Eb4,i, E
b5,i and E
b6,i
S3,i Es1,i and E
s2,i
S4,ip Es3,ip and E
s4,ip
CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 39
3.2.2 Cognitive System (CS)
The CS consists of two Modules: Empathy (Sect. 3.2.2.2) and Learning(Sect. 3.2.2.1).
The first is responsible for producing the Ypi value to be sent to the PS. Once there, Ypi
ends up feeding the empathy feeling. The second module applies artificial neural networks
(ANNs) to estimate the Well-Being Qipt(Eip
t, k) that will result from the execution of a
specific action k (k ∈ actions) in response to the current set of emotions Eipt. Observethat Eip
t is the current set of all emotions (basic and social) of agent i at match t and
for neighbor p. In Fig. 3.2 we illustrate the Learning Module functioning: at match t,
agent i is going to interact with neighbor p and its current set of emotions is Eipt. Before
the agent i takes an action, the Learning Module estimates the Well-Being values that
will probably follow from the execution of each action k. In the example, agent i has two
options of action: action A or B. If executed, action A is expected to obtain the higher
estimated Well-Being value, as Qipt(Eip
t, A) = 0.2 and Qipt(Eip
t, B) = 0.1.
FIGURE 3.2 – The Learning Module of agent i (represented by the black box) providesthe estimated Well-Being values for each available action if it is going to be executed inresponse to an interaction with neighbor p at match t.
3.2.2.1 The Learning Module
Two main references were considered while designing our Learning Module: Gadanho
(GADANHO, 1999) and Lin (LIN, 1993). The Learning System from Gadanho (GADANHO,
1999) received inspiration from Lin (LIN, 1993), which depicts the application of one ANN
for each action available to the agent, and the action policy acquisition based on the Q-
Learning algorithm (WATKINS, 1989). The ANNs from Gadanho (GADANHO, 1999) are
feed-forward and trained t