91
Thesis presented to the Instituto Tecnol´ ogico de Aeron´autica, in partial fulfillment of the requirements for the degree of Doctor of Science in the Program of Electronic Engineering and Computation, Field of Informatics. Fernanda Monteiro Eliott A COMPUTATIONAL MODEL FOR SIMULATION OF EMPATHY AND MORAL BEHAVIOR Thesis approved in its final version by signatories below: Prof. Dr. Carlos Henrique Costa Ribeiro Advisor Prof. Dr. Luiz Carlos Sandoval G´oes Prorector of Graduate Studies and Research Campo Montenegro ao Jos´ e dos Campos, SP - Brazil 2015

A COMPUTATIONAL MODEL FOR SIMULATION OF EMPATHY … · como uma propriedade emergente de sistemas multiagentes. Nossos resultados tamb em indicam a viabilidade do M odulo de Empatia

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Thesis presented to the Instituto Tecnológico de Aeronáutica, in partial

    fulfillment of the requirements for the degree of Doctor of Science in the

    Program of Electronic Engineering and Computation, Field of Informatics.

    Fernanda Monteiro Eliott

    A COMPUTATIONAL MODEL FOR SIMULATION

    OF EMPATHY AND MORAL BEHAVIOR

    Thesis approved in its final version by signatories below:

    Prof. Dr. Carlos Henrique Costa Ribeiro

    Advisor

    Prof. Dr. Luiz Carlos Sandoval Góes

    Prorector of Graduate Studies and Research

    Campo MontenegroSão José dos Campos, SP - Brazil

    2015

  • Cataloging-in Publication DataDocumentation and Information Division

    Eliott, Fernanda MonteiroA Computational Model for Simulation of Empathy and Moral Behavior / Fernanda Monteiro

    Eliott.São José dos Campos, 2015.90f.

    Thesis of Doctor of Science – Course of Electronic Engineering and Computation. Area ofInformatics – Instituto Tecnológico de Aeronáutica, 2015. Advisor: Prof. Dr. Carlos HenriqueCosta Ribeiro.

    1. Arquitetura (computadores). 2. Sistemas multiagentes. 3. Comportamento afetivo.4. Tomada de decisão. 5. Simulação computadorizada. 6. Inteligência artificial. 7. Computação.I. Instituto Tecnológico de Aeronáutica. II. Title.

    BIBLIOGRAPHIC REFERENCE

    ELIOTT, Fernanda Monteiro. A Computational Model for Simulation ofEmpathy and Moral Behavior. 2015. 90f. Thesis of Doctor of Science – InstitutoTecnológico de Aeronáutica, São José dos Campos.

    CESSION OF RIGHTS

    AUTHOR’S NAME: Fernanda Monteiro EliottPUBLICATION TITLE: A Computational Model for Simulation of Empathy and MoralBehavior.PUBLICATION KIND/YEAR: Thesis / 2015

    It is granted to Instituto Tecnológico de Aeronáutica permission to reproduce copies ofthis thesis and to only loan or to sell copies for academic and scientific purposes. Theauthor reserves other publication rights and no part of this thesis can be reproducedwithout the authorization of the author.

    Fernanda Monteiro EliottR. Paulino Blair, 3312232030 – São José dos Campos–SP

  • A COMPUTATIONAL MODEL FOR SIMULATION

    OF EMPATHY AND MORAL BEHAVIOR

    Fernanda Monteiro Eliott

    Thesis Committee Composition:

    Prof. Dr. Paulo André Lima de Castro Chairperson - ITAProf. Dr. Carlos Henrique Costa Ribeiro Advisor - ITAProf. Dr. Jackson Paul Matsuura Internal Member - ITAProf. Dr. Osvaldo Frota Pessoa Júnior External Member - USPProf. Dr. Ricardo Ribeiro Gudwin External Member - UNICAMP

    ITA

  • To my parents.

  • Acknowledgments

    First of all I would like to thank my advisor, Professor Carlos Henrique Costa Ribeiro,

    and endear his role in helping my thoughts to turn from abstraction toward a concrete

    project. I also would like to thank the entire Aeronautics Institute of Technology (ITA) for

    its enormous contribution to the development of my knowledge while making my evolution

    in a multidisciplinary field possible.

    I would like to emphasize the role of the Philosophy dept. of the University of São

    Paulo (USP) in my academic background. Always Thankful.

    I am also thankful to Professor Briseida Dôgo de Resende (USP, Experimental Psy-

    chology dept.) for taking me as a guest in her class and research group. It was a priceless

    experience.

    My family was all the way around me. I appreciate and hope that process to continue.

    Finally, I would like to thank CNPQ for the financial support.

  • “Disons donc que, si toutes chosesdeviennent naturelles à l’homme

    lorsqu’il s’y habitue, seul reste danssa nature celui qui ne désire que les

    choses simples et non altérées. Ainsila première raison de la servitude

    volontaire, c’est l’habitude.”— Étienne de La Boétie, 1576.

  • Resumo

    Emoções e sentimentos são considerados cruciais no processo de decisão humana in-

    teligente. Em particular, as emoções sociais nos ajudariam a reforçar o grupo e a cooperar.

    Ainda é uma questão de debate o que motivaria criaturas biológicas a cooperarem ou não

    com seu grupo. Todos os tipos de cooperação ocultariam interesses egóıstas, ou o altrúısmo

    realmente existiria? Se nos debruçarmos sobre essas questões a partir de uma perspectiva

    humana, acabamos passando por comportamento moral e três tipos de sujeitos: o moral,

    imoral e amoral. Se nos movermos de sujeitos biológicos em direção a agentes artificiais,

    observamos ser uma questão complexa ficar ileso a mecanismos ad-hoc a fim de atingir

    cooperação em abordagens computacionais baseadas em utilidade. Decidimos nos inspirar

    em comportamento moral como uma forma de buscar a cooperação em Sistemas Multi-

    agentes. Nossa hipótese principal baseia-se na ideia de que a cooperação pode surgir a

    partir do aux́ılio de emoções e comportamento moral durante o processo de tomada de

    decisões - mesmo quando comportamento egóısta é recompensado por altos reforços. A

    analogia com o comportamento moral é promovida através da simulação do sentimento

    de empatia.A importância do sentimento de empatia consiste na sua função em regular

    as prioridades dos agentes, permitindo a seleção de ações que, talvez, não sejam a melhor

    seleção egóısta, uma vez que uma tomada de decisão não egóısta possa ser crucial para

    equalizar as interações entre os agentes e resultar em cooperação. Descreveremos aqui

    nossa arquitetura computacional multiagente bioinspirada (denominada MultiA), com-

    posta por emoções artificiais, sentimentos e por um Módulo de Empatia responsável por

    fornecer uma seleção de ações que, rudimentarmente, imite comportamento moral. Infor-

    mação sensorial é acionada pelo meio ambiente e, então, a arquitetura computacional a

    transforma em emoções e sentimentos artificiais básicos e sociais. Posteriormente, através

    do módulo de empatia, suas próprias emoções são empregadas para estimar o estado at-

    ual de outros agentes. E então, seus sentimentos artificiais proporcionam uma medida

    (denominada bem-estar) do seu desempenho em resposta ao ambiente. Através daquela

    medida e de técnicas de aprendizado por reforço, a arquitetura aprende um mapeamento

    entre emoções e ações. Diante de recompensas para comportamento egóısta, os agentes

    MultiA que adotam estratégia cooperativa, o fazem como resultado de um sentimento de

    empatia (altos ńıveis de empatia) regulando as prioridades do agente, agindo como um

  • viii

    agente moral. Os agentes MultiA que não adotam a estratégia cooperativa selecionam

    ações egóıstas, e o fazem como resultado de baixos ńıveis de empatia, agindo como agente

    imoral. O mecanismo de seleção de ação de MultiA pode ser alimentado a partir de dois

    aspectos. O primeiro está relacionado à cooperação, uma vez que um agente MultiA em

    particular tenha uma vizinhança cooperativa. Dessa forma, o agente irá cooperar por

    reciprocidade. O segundo está relacionado à não-cooperação, uma vez que o entorno é

    não-cooperativo (agente MultiA não cooperativo por reciprocidade). Portanto, a arquite-

    tura computacional acaba por imitar rudimentarmente agentes morais e imorais. De fato,

    obter agentes morais e imorais a partir de uma mesma arquitetura se encaixa em pressu-

    postos filosóficos sobre o meio corromper o indiv́ıduo. Dado que relações entre indiv́ıduos

    diferentes possam ser representadas por redes, exploramos diferentes topologias de rede

    para caracterizar as interações agente-agente, definindo a vizinhança dos mesmos. A fim

    de avaliar nossa arquitetura, utilizamos uma versão de um jogo evolutivo que aplica o

    jogo do dilema do prisioneiro para estabelecer as alterações sobre a topologia da rede.

    Os resultados indicam que, apesar de MultiA também imitar rudimentarmente agentes

    imorais, um número suficiente de agentes MultiA seguiram em outra direção, assim,

    através da cooperação, mantiveram a estrutura da rede da vizinhança. Portanto, estraté-

    gias baseadas em simulação de comportamento moral podem auxiliar na diminuição da

    recompensa interna advinda de uma seleção de ação egóısta, favorecendo a cooperação

    como uma propriedade emergente de sistemas multiagentes. Nossos resultados também

    indicam a viabilidade do Módulo de Empatia e coerência entre a experiência do agente e a

    poĺıtica de ação adotada. Intensificamos os parâmetros de teste e ainda assim obtivemos

    um número substancial de agentes MultiA cooperativos. Mas, adicionalmente, obtivemos

    agentes MultiA não-cooperativos, o que decorreu também do efeito de ocultamento de

    estratégia. Este consiste em um problema importante que interfere na poĺıtica de ação de

    agentes MultiA. Em relação ao paradigma de reciprocidade sobre o projeto de MultiA,

    este se destacou através da prevenção de efeito de falha em cascata em redes descritas por

    uma correlação de grau quase neutra, auxiliando os agentes a serem melhor sucedidos em

    espelhar a condição dos vizinhos. Nossos resultados confirmam empiricamente a influência

    do Módulo de Empatia sobre o Sistema de Decisão de MultiA.

  • Abstract

    Emotions and feelings are now considered as decisive in the human intelligent decision

    process. In particular, social emotions would help us to enhance the group and cooperate.

    It is still a matter of debate the what that motivates biological creatures to cooperate or

    not with their group. Would all kinds of cooperation hide a selfish interest, or would it

    exist truly altruism? If we pore over those questions from a human perspective, we end

    up passing through moral behavior and three kinds of individuals: the moral, immoral

    and amoral. If we move from biological subjects onto artificial agents, it is a complex

    matter to go without ad hoc mechanisms to bring up cooperation in utility-based com-

    putational approaches. We decided to take inspiration from moral behavior as a way of

    moving toward cooperation in Multiagent Systems. Our leading hypothesis relies on the

    idea that cooperation can emerge from the assistance of emotions and moral behavior

    during the process of decision making - even when selfish behavior is rewarded by high

    reinforcements. The analogy with moral behavior is promoted through simulating the

    feeling of empathy. The importance of the empathy feeling is its function on regulating

    the agents priorities, enabling the selection of actions that may not be the best selfish

    selection, since non selfish decision making may be crucial to equalize the interactions

    among agents and bring up cooperation. We depict herein our bioinspired computational

    multiagent architecture (so-called MultiA) composed by artificial emotions, feelings and

    by an Empathy Module responsible for providing an action selection mechanism that rudi-

    mentarilly mimic both moral and immoral behaviors. Sensorial information is triggered by

    the environment, then, the computational architecture transforms it into basic and social

    artificial emotions and feelings. Thereat its own emotions are employed to estimate the

    current state of other agents through an Empathy module. Finally, its artificial feelings

    provide a measure (termed well-being) of its performance in response to the environment.

    Through that measure and reinforcement learning techniques, the architecture learns a

    mapping from emotions to actions. While facing high rewards to selfish behavior, the

    MultiA agents that adopt the cooperative strategy do so from the result of an empathy

    feeling (high empathy levels) regulating the agents priorities, acting as a moral agent. The

    MultiA agents that do not adopt the cooperative strategy select selfish actions, and do so

    as a result of low empathy levels, acting as an immoral agent. The MultiA mechanism of

  • x

    action selection can be moved from two aspects. The first is related to cooperation, once

    the particular MultiA agent has a cooperative neighborhood. Then, the agent will coop-

    erate by reciprocity. The second is related to non-cooperation, since the surrounding is

    non-cooperative (non-cooperative MultiA agent by reciprocity). Thus, our computational

    architecture actually rudimentarilly mimics both moral and immoral agents. But, as a

    matter of fact, achieving moral and immoral agents from the very same architecture fits

    philosophical assumptions about the environment corrupting the individual. As relations

    between different subjects can be represented by networks, we explored varied network

    topologies that can characterize the agent-agent interactions, by defining the agents neigh-

    borhood. For assessment of our architecture, we use a version of an evolutionary game

    that applies the prisoner dilemma paradigm to establish changes over the network topol-

    ogy. Our results show that, even though MultiA can also mimic immoral behavior, it is

    more likely to mimic moral behavior. Then, in each experiment, an enough number of

    MultiA agents mimicked moral agents to solve the task. Thus, through cooperation, they

    kept the neighboring network structure. Therefore, strategies relied upon simulation of

    moral behavior may help to decrease the internal reward from selfish selection of actions,

    thus favoring cooperation as an emergent property of multiagent systems. Our results

    also indicate the Empathy Module feasibility and coherence between the agent experience

    and the adopted action policy. We tested MultiA agents under stressed parameters and

    we still obtained a substantial number of cooperative MultiA agents. We also obtained

    non-cooperative MultiA agents and that was also due to the shadow strategy effect. The

    shadow strategy effect is one important problem interfering on the MultiA agents action

    policy. Regarding the reciprocity paradigm over the MultiA design, it was particularly

    highlighted through preventing a cascading failure effect on networks described by an al-

    most neutral degree correlation, aiding the agents on being more successful on mirroring

    neighbors current condition. Our results empirically confirm the influence of the Empathy

    Module on MultiA Decision System.

  • List of Figures

    FIGURE 3.1 – The general scheme of the MultiA Architecture. . . . . . . . . . . . 32

    FIGURE 3.2 – The Learning Module of agent i (represented by the black box) pro-

    vides the estimated Well-Being values for each available action if it

    is going to be executed in response to an interaction with neighbor

    p at match t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    FIGURE 3.3 – The ANNs structure from Learning Module, CS. . . . . . . . . . . . 41

    FIGURE 3.4 – a) Agent i and neighbor p are going to interact. The Learning Mod-

    ule of agent i provides Qipt(Eip

    t, k) and the DS chooses to execute

    Action B. b)The agents interact and the PS of agent i calculates

    the value of Wi. c) Now agent i will interact with its next neighbor,

    neighbor p + 1. The Learning Module provides the new values of

    Qip+ 1t(Eip+ 1

    t, k). Before the DS chooses one action, the Learn-

    ing Module will update (through the Backpropagation algorithm)

    the weights of the ANN indexed to action B. After being updated,

    the ANN indexed to action B will re-calculate Qip+ 1t(Eip+ 1

    t, B).

    Now the output values will be sent to the DS. . . . . . . . . . . . . 42

    FIGURE 3.5 – CS: The structure of the Empathy Module. . . . . . . . . . . . . . 45

    FIGURE 3.6 – CS: The reciprocity assumption and the Empathy Module. . . . . . 45

    FIGURE 4.1 – The general scheme of MultiAA. . . . . . . . . . . . . . . . . . . . . 55

    FIGURE 4.2 – At match t: 20 agents (4 defectors, 16 cooperators). . . . . . . . . . 56

    FIGURE 4.3 – At match t, just before match t + 1: 19 agents (3 defectors, 15

    cooperators). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    FIGURE 4.4 – Exp.1, MultiA: crossing strategies and ρf at each match. . . . . . . 60

    FIGURE 4.5 – Exp.1, MultiAA: crossing strategies and ρf at each match. . . . . . 60

    FIGURE 4.6 – Exp.2, MultiA: crossing strategies and ρf at each match. . . . . . . 61

  • LIST OF FIGURES xii

    FIGURE 4.7 – Exp.2, MultiAA: crossing strategies and ρf at each match. . . . . . 61

    FIGURE 4.8 – Exp.2, MultiA final network structure: 7406 agents, 2866 defec-

    tors(red nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    FIGURE 4.9 – Exp.3, MultiA: crossing strategies and ρf at each match. . . . . . . 62

    FIGURE 4.10 –Exp.3, MultiAA: crossing strategies and ρf at each match. . . . . . 62

    FIGURE 4.11 –Agents Final Results. Defectors are represented by the red color and

    cooperators by blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    FIGURE 4.12 –Exp.3, MultiA final network structure: 215 agents, 2 defectors(red

    nodes). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    FIGURE 4.13 –Exp.3, MultiAA final network structure: 30 agents, all cooperators. 63

    FIGURE 4.14 –Exp.4, MultiA: crossing strategies and ρf at each match. Final

    values: ρf = 80%, ρd = 52%, ρc = 48%. . . . . . . . . . . . . . . . . 63

    FIGURE 4.15 –Exp.4, MultiAA: crossing strategies and ρf at each match. Final

    values: ρf = 57%, ρd = 65%, ρc = 35%. . . . . . . . . . . . . . . . . 63

    FIGURE 4.16 –Exp.4: graphics produced at matches t = 42 and t = 55. . . . . . . . 67

    FIGURE 4.17 –Exp.4: graphics produced at matches t = 60 and t = 68. . . . . . . . 67

    FIGURE 4.18 –Different values of m and MultiA performance. . . . . . . . . . . . 72

    FIGURE 4.19 –Exp.5: MultiAA and MultiA non-failed simulations regarding dif-

    ferent values of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    FIGURE 4.20 –Exp.6: MultiAA and MultiA non-failed simulations regarding dif-

    ferent values of ψ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    FIGURE 4.21 –Exp.7a (left) and 7b (right): graphics produced at match t = 55.

    Cooperators are represented by the blue color and defectors by the

    red color. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    FIGURE 4.22 –Exp.7a (left) and 7b (right): graphics produced at match t = 68.

    Cooperators are represented by the blue color and defectors by the

    red color. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

  • List of Tables

    TABLE 3.1 – Basic Emotions and the Artificial Basic Emotions of MultiA . . . . 35

    TABLE 3.2 – Social Emotions and the Artificial Social Emotions of MultiA . . . 37

    TABLE 3.3 – Updating Es1,i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    TABLE 3.4 – Artificial Feelings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    TABLE 3.5 – The Calculation of Ypi: If Mip < 0.5 . . . . . . . . . . . . . . . . . . 43

    TABLE 3.6 – The Calculation of Ypi: If Mip >= 0.5 . . . . . . . . . . . . . . . . . 43

    TABLE 4.1 – Game Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    TABLE 4.2 – Experimental Parameters, Lattice 2D4N . . . . . . . . . . . . . . . . 57

    TABLE 4.3 – Data Graphic Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    TABLE 4.4 – Parameters for the experiments, Sect.4.3 . . . . . . . . . . . . . . . 71

  • Contents

    1 Thesis Introduction and Statement . . . . . . . . . . . . . . 16

    1.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    1.2 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.1 Artificial Moral Agents (AMAs) . . . . . . . . . . . . . . . . . . . . . 24

    2.1.1 MultiA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    3 MultiA: A Computational Model for Simulation ofEmpathy and Moral Behavior . . . . . . . . . . . . . . . . . . 31

    3.1 MultiA Functioning: an Overview . . . . . . . . . . . . . . . . . . . . 31

    3.1.1 MultiA and an Interaction Game . . . . . . . . . . . . . . . . . . . . 34

    3.2 The Systems of the MultiA Architecture . . . . . . . . . . . . . . . 34

    3.2.1 Perceptive System (PS) . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    3.2.2 Cognitive System (CS) . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    3.2.3 Decision System (DS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    3.3 ALEC Influences on MultiA . . . . . . . . . . . . . . . . . . . . . . . 46

    4 Experimental Setup and Results . . . . . . . . . . . . . . . . 48

    4.1 MultiA in an Evolutionary Game . . . . . . . . . . . . . . . . . . . . 48

    4.1.1 MultiA Experimental Set Up and Amoral Version . . . . . . . . . . . 52

    4.2 Results: Lattice Network . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4.2.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    4.3 Results: Assortativity Coefficient and Moral Agent Performance 68

    4.3.1 Experimental Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 69

  • CONTENTS xv

    4.3.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    5.1 Future Work and Relevance . . . . . . . . . . . . . . . . . . . . . . . . 82

    Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

    Appendix A – Publications . . . . . . . . . . . . . . . . . . . . . . 90

  • 1 Thesis Introduction and

    Statement

    Living beings can set forth a complex behavior to such an extent that it stimulates

    research. Despite the incongruous reasoning about it, living beings behavior can inspire

    a system intuitive design to handle complex matters. The opportunity of conceiving and

    modeling artificial moral behavior and empathy arises from the perspective deterioration

    of an immaterial soul partaking in the moral behavior process. A biological and philosoph-

    ical exam should compose the search for developing a coherent and intuitive bioinspired

    computational multiagent architecture that seeks to mimic moral behavior. Likewise, it

    has to be scrutinized the embodied emplacement vis-à-vis the pursued theoretical refer-

    ences.

    In a simplistic approach, moral behavior can be described by the act of following the

    set of rules from the group, keeping it cohesive, while new customs progressive annexation

    can change that set. By reiterating a custom and naturally incorporating it among our

    thoughts, we are actually submitting ourselves to it and establishing it; thereat, the laws

    of consciousness essentially came about from custom, and not from nature (MONTAIGNE,

    2013 (1580)a). Inadvertently, we may link cooperation and the willingness to do it. Ac-

    cording to Tomasello and Vaish (TOMASELLO; VAISH, 2013), if we use an evolutionary

    complexion, morality can be presumed as a kind of cooperation, as the association of

    skills and reasons for cooperation would provide the emergence of morality. Thus, coop-

    eration would demand the individual’s self-interest equalization with that of the others,

    or its suppression. Regarding a group composition and its members accompaniment,

    Tomasello (TOMASELLO, 2011) ponders over cooperation as a sewing up action, connect-

    ing the members of the group. Since the cooperation among living things comprehends

    complex matters, it reveals itself as a field of study. Likewise, thinking through utility-

    based computational approaches, the emergence of cooperation is not easily achievable

    (see Sect. 2).

    If we conceive moral behavior as a form of cooperation (by going after the set of rules

    from the group) built upon customs among emotions and feelings, it brings up an intuitive

  • CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 17

    line of reasoning to pursue while modeling a bioinspired computational architecture sup-

    posed to mimic moral behavior. Thereat, how to design an autonomous artificial agent

    able to socially interact and deal with conflicting tasks that require emotional guidance

    to be solved? A computational agent that incorporates artificial emotional and moral in-

    telligences can lead to ways of achieving cooperation between artificial creatures or from

    artificial toward biological.

    Regarding the empathy exercise, we can try to divide individuals into three groups:

    moral, immoral and amoral. Unpretentiously but in a simple approach, the formers have

    the social feeling of empathy properly functioning; and the immoral performs actions that

    somehow hurt the established moral code of his/her community. On the other hand, the

    latter can be interpreted as moral or immoral, depending on his/her social behavior and

    may be characterized by important issues on the mechanism that allows the individual

    to put himself/herself in the place of the other, and be sympathetic to his/her circum-

    stances. There is neurophysiological basis for this classification: according to Kandel, et

    al. (KANDEL et al., 2000) the lateral orbitofrontal cortex seems to participate in mediating

    empathetic and socially appropriate responses, then damage to this area would be asso-

    ciated with failure to respond to social cues and produce lack of empathy. A mechanism

    that allows the existence of empathy is described in Damásio (DAMÁSIO, 2004) through

    the cognitive aspect but, as in Proctor et al. (PROCTOR et al., 2013) and Waal (WAAL,

    2009), on the account of the emotional standpoint. Therein, if someone succeeds in de-

    veloping an artificial moral agent (AMA), would it be more useful the guidance from an

    immoral or moral behavior? The answer shall not be easily given as it would sound. We

    can think through philosophical questions, premises and practical goals while designing

    AMAs, specifically:

    1. With respect to AMAs, could we inspire the design from different aspects? Think

    through a multiagent task and action policies oriented by three sets of premises:

    moral, immoral and amoral. Then, the design may consider which set, and on what

    kind of circumstances, fits better within a particular multiagent system (MAS) task:

    • The moral agent cares about all members of its group (considered as neigh-bors), even though that may bring its own punishment. But it may also select

    actions with the aim of punishing or isolating a constantly defecting neighbor.

    However, in general, moral agents tend to cooperate;

    • The immoral agent also cares about neighbors, but it is concerned with theprofit it can make through social attachment. It will mostly cooperate with its

    immoral group, but can decide to cooperate with others if it is getting isolated

    (to prevent complete isolation or high punishment). Immoral will cooperate (if

    so) mostly with its partners;

  • CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 18

    • The amoral agent can imitate both moral and immoral agents and be morepractical on taking decisions.

    2. For different kinds of agents, what is the meaning of selfish actions? An action can

    sound as selfish but can be motivated by non-selfish goals, as to punish a member

    of the group to keep it healthy. Thus, selfishness can be executed by all three

    agent types, the difference relies within the goal behind that action: if it is to make

    profit (fits better with immoral and amoral agents); if it belongs to an uncertain or

    exploratory phase (all three); if it is to prevent from deep punishment (all three, but

    morals in a lower intensity) or, even, to isolate someone from the group (all three);

    3. If the agent can observe and differentiate its neighbors, it can learn to respond dif-

    ferently to them and to stop cooperating with defectors. Both immoral and amoral

    agents may isolate defectors more easily than moral agents. Moreover, amoral

    agents, in order to survive and keep neighbors, can mirror moral and immoral agents

    for convenience;

    4. Relating to the agents action policies, can it lead to relevant difference to the network

    structure (considering each agent as a node and their relations as links between

    them)?

    • As moral agents are naturally cooperative, they are supposed to keep bondswith moral, immoral and amoral agents. Therefore, even after an elimination

    process that punishes defectors, the final population of a moral majority will

    contemplate a reasonable number of agents from all sets;

    • Accordingly, immoral agents will only care about the advantage, if any, ofkeeping bonds with others. Thus, a continuous defecting (or failing) one will

    be easily excluded;

    • As the amoral agents will imitate a neighbor, they will add uncertainty as theychange strategy.

    5. What would we expect from artificial empathy? Would it be convenient to develop

    a decision process that tends to something Machiavellian (MACHIAVELLI, 1985

    (1532))? What is best: to maintain a failing neighbor in order to not lose it or just

    eliminate it?

    • What about having an AMA morally hybrid: immoral towards agents that failor delay the task and moral while interacting with living creatures? Hypothe-

    size a group of artificial agents supposed to coordinate activities and priorities

    to complete a task (as finding an object in a certain environment), if one agent

    from the group stops working or fails, it might be better to isolate it. This

  • CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 19

    may be thought as an analogy to a cut off the artificial empathy feeling about

    that one agent. Therefore, the agent that simulates moral behavior will have

    the tendency to cooperate but it can be triggered to do otherwise;

    • Regarding a hybrid artificial agent that can trigger moral and immoral behav-ior, it might be important to autonomously activate moral action policies with

    biological creatures, and immoral otherwise.

    Finally, beyond philosophical and biological investigation on morality and human/

    machine behavior, practical issues can be addressed through exploring decision making

    by AMAs in MAS. While simulating moral behavior, AMAs may be helpful in general

    social or domestic assignments, e.g. taking the role of monitoring highly dangerous crim-

    inals, people in quarantine or in scenarios where there are social dilemmas to deal with.

    Moreover, the artificial empathy from an artificial moral agent could be an additional re-

    source in argumentation-based negotiation in MAS. AMAs may also be useful to improve

    the responses to general MAS issues stressed by Wooldridge (WOOLDRIDGE, 2009), such

    as how to bring up cooperation in societies of self-centered agents; how to recognize a

    conflict and then encounter an agreement; or, as highlighted by Matignon (MATIGNON

    et al., 2012), the challenges to coordinate the agents activities in order to cooperatively

    conquer goals (see Sect. 2).

    1.1 Thesis Statement

    In Damásio (DAMÁSIO, 1994) emotions and feelings are described as imperative in the

    human intelligent decision process. Emotions and feelings importance would also be deci-

    sive in helping us to spend less time and to reduce the computational burden while taking

    intelligent decisions. In particular, social emotions would help us to enhance the group

    and cooperate. We depict herein our bioinspired computational multiagent architecture

    (so-called MultiA) composed by artificial emotions, feelings and by an Empathy Mod-

    ule responsible for providing an action selection mechanism that rudimentarilly mimics

    moral and immoral behavior. It is not trivial to achieve cooperative self-centered agents

    in a multiagent task. Our search for mimicking moral behavior, among other things, is

    driven to achieve rational agents more likely to cooperate. By responding to the feeling

    of empathy, MultiA should be able to produce artificial moral behavior and selecting

    cooperative action policies. Our leading hypothesis relies on the idea that cooperation

    can emerge from the assistance of emotions and moral behavior during the process of

    decision making - even when selfish behavior is rewarded by high reinforcements. The

    analogy with moral behavior is promoted through simulating the feeling of empathy. The

    importance of such a feeling is its function on regulating MultiA agents priorities, en-

  • CHAPTER 1. THESIS INTRODUCTION AND STATEMENT 20

    abling the selection of actions that may not be the best selfish selection. Non selfish

    decision making may be crucial to equalize the interactions among agents and bring up

    cooperation. Given to the multidisciplinary complexity of moral behavior, the compu-

    tational simulation of moral behavior may be approached through various angles. We

    designed a computational architecture to rudimentarilly mimic both moral and immoral

    behaviors and developed an Empathy Module to work as the moral/immoral behavior

    engine. The Empathy Module is grounded over reciprocity assumptions. Then, the agent

    with a cooperative neighborhood will cooperate by reciprocity. Likewise, the agent with a

    non-cooperative neighborhood will also be reciprocal by non-cooperating. Therefore the

    reciprocity design can carry on selfish behavior, and not only cooperation (see Ch. 4).

    Thus, our computational architecture ends up rudimentarilly mimicking both moral and

    immoral agents.

    Our results indicate the Empathy Module feasibility and, in environments suitable to

    the Empathy Module application (see Sect. 4.3.2), we obtained a considerable convergence

    to cooperation. We modified the MultiA architecture to design the MultiAA architecture,

    supposed to mimic amoral agents. We obtained interesting coherence between our final

    results and immoral, moral and amoral action policy.

    1.2 Thesis Structure

    This thesis comprehends five chapters. In Ch. 1 we introduce a few reflections on

    morality and human/machine behavior. In Ch. 2 we present the background and our

    project development permeating issues. In Ch. 3 we detail our bioinspired computational

    multiagent architecture designed to rudimentarilly simulate moral and immoral behavior,

    and, in Ch. 4, we analyze its performance in a multiagent task under different network

    structures - we also present a MultiA modified version, the MultiAA architecture. Finally,

    in Ch. 5 we reflect upon the final results and suggest future work. In Appendix A we

    detail all publications generated in the context of this thesis.

  • 2 Background

    An empathy computational simulation encompasses subjects about which there is dis-

    agreement and ignorance. There are convincing but opposite/or conflicting explanations

    about mind theory, qualia, consciousness, human universals and morality. Given the com-

    plexity and undiscovered matters, we lack a broadly accepted theory for unifying those

    subjects - besides, it may involve religious taboos. Therefore, for a detailed statement of

    an empathy simulation, we would have to stress familiar themes to philosophy, psychology,

    neurophysiology and many other fields - hence, in this thesis, our main focus is to detail

    the MultiA architecture.

    For the sake of feasibility, computational approaches may seek to summarize the-

    oretical references and embody a moral simulation from different perspectives (e.g. a

    model may try to mimic moral behavior in robotic environments, or may try to provide

    answers in an ethically specific domain). Before discussing our perspective and scrutiniz-

    ing our MultiA computational architecture (Sect. 3), we introduce a few computational

    models that somehow approach the moral simulation (Sect. 2.1), including MultiA itself

    (Sect. 2.1.1).

    In the moral architecture proposed herein, we used reinforcement learning techniques

    (see Sect. 3.2.2.1), which are based in mapping situations into actions to maximize the re-

    inforcement, wherein the agent experience is used as parameter (SUTTON; BARTO, 1998).

    The reinforcement consists of a numerical sign given to the agent after having executed

    a certain action (including action abstention) in a certain state. Through its experience,

    by selecting different actions in different states, the agent under a computational archi-

    tecture that implements reinforcement learning techniques must learn to execute state

    corresponding actions to maximize the expected sum of reinforcements.

    Many difficulties arise if the learning agents have no possibility of charing data to

    accomplish a task: they will have to choose strategies based on their own experiences

    and, through that, learn to coordinate responses. To consider the agents interactions

    in an environment, it is important to be aware of Game Theory well-studied challenges.

    According to Shoham and Leyton-Brown (SHOHAM; LEYTON-BROWN, 2009), Game The-

    ory would comprehend the mathematical study of the agents interactions, and the agents

  • CHAPTER 2. BACKGROUND 22

    predilections would be explained through the function of available options - note that the

    agent predilections may change, specially under uncertain situations. We intend to achieve

    self-interested moral agents whose predilection comprehends getting high reinforcements

    while avoiding bringing negative outcome to the neighborhood - herein we consider to be

    neighbors those agents that may directly interact with each other. To simulate moral be-

    havior we will adopt an environment described by more than one state and more than one

    agent. Game Theory classical domains may provide environments and interaction descrip-

    tions to test the moral agents under our computational architecture. Then, a game from

    the literature will be chosen to define our agents environment and moral interactions, the

    terminal state and possible agent scenarios. Crucial Game Theory concepts came from

    Neumann and Morgenstern (NEUMANN; MORGENSTERN, 1944), such as analysis about

    environmental possibilities, difficulties and adequate agent policy response to accomplish

    goals.

    Matignon (MATIGNON et al., 2012) describes some challenges that have to be over-

    passed by the agents (that do not exchange data) coordinate their action selection in

    order to provide a coherently coordinated behavior, such as the alter -exploration problem

    (the interference over the agent learned policy caused by other agents environment explo-

    ration). The convergence to a cooperative action policy in self-play (each running agent

    follows the very same set of code descriptions) and in general-sum (allows cooperation

    and the agents received reinforcements that may assume different values (MATIGNON et

    al., 2012); (GREENWALD et al., 2005)) stochastic games is an issue: one problem is how to

    achieve cooperative behavior when the Pareto-optimal solution does not coincide with the

    Nash equilibrium, such as in the Prisoner Dilemma Game. The Nash equilibrium (NASH,

    1951) corresponds to a collection of joint strategies (to all agents in the environment) such

    that none agent may get a better outcome (by changing strategy) given that the others

    will continue seeking their equilibrium strategies (choosing their best responses). The

    Pareto-optimal solution occurs when it does not exist another crossing actions (agents

    combination of actions) possibility in which the utility of one agent may increase without

    decreasing the other agent utility.

    When agents share an environment but do not exchange data, they may be actually

    ignoring each others presence. Thus, those agents end up as part of the environment

    itself, which means the transition probabilities related to the agent actions/environmental

    outcomes are non-stationary. Therefore, the agents actions can be influenced by the joint

    history of action selection, as the history influences the future transition probabilities

    when the agent re-visit a state.

    Regarding the agent itself, a deterministic game may present to the agent as non-

    deterministic: rewards or stochastic transitions may be induced by different sources as

    noise or non-observable factors, being a challenge to the agent to distinguish the pro-

  • CHAPTER 2. BACKGROUND 23

    voking changes over the reinforcements it receives (if noise or other agents actions pro-

    moted those changes) (MATIGNON et al., 2012). For instance, the coordination game from

    Boutilier (BOUTILIER, 1999) explores different mis-coordination examples: the possible

    joint agent actions determine various rewards or penalties and also lead to different states.

    Since we intend to run a considerable number of agents under our architecture in

    self-play, an important issue stands: how to obtain a final outcome in which the agents

    conquer the best possible individual result that does not bring a bad outcome to their

    neighbors? Many times the best individual outcome will not conciliate with the best social

    one (the best outcome for each agent if all of them choose to cooperate/ rejects the free-

    riding). In general, especially in utility-based computational approaches, cooperation is

    not easily modeled. As an illustrative example, public goods provide an analogy to analyze

    natural societies relations and are most known for two main features: public goods are

    public and not wasted through consumption. In natural societies (also within an artificial

    scope that use them as a metaphor), unfair relations are possibly common, such as an

    agent taking advantage of another agent social commitment. If public services are freely

    available, what would endorse other strategy than free riding? The social commitment

    may be crucial to accomplish the best social outcome. For example, by paying the taxes,

    we intend to keep the Public Systems functioning, but some of us are not actually paying

    for anything - consider the free rider problem which, in essence, has been considered since

    Plato (PLATO, 2000(IVBC)), Montaigne (MONTAIGNE, 2013 (1580)b) and many others,

    and, more recently, by Cornes and Sandler (CORNES; SANDLER, 1986). Since cooperating

    within the group generally result in a cost to the cooperator and defectors benefit from

    common resources (WARDIL; HAUERT, 2014), it emerges a dilemma between the agent’s

    self-interest and the group’s maintenance. In fact, public goods games are a metaphor

    to describe trivial relations in natural societies and generalize the Prisoner’s Dilemma

    Game (PDG) to an arbitrary number of individuals - see Hardin (HARDIN, 1971) and

    Wakano and Hauert (WAKANO; HAUERT, 2011). Not unusually, commitment is required

    to accomplish the best social outcome: individuals must keep choices that only as a group

    will render that particular outcome. To attain the best social outcome, agents have to

    commit themselves, pursue the specific action policy that only as a group will accomplish

    for the best. If one agent suddenly changes its action predilection, the other may face the

    worst possible result: e.g. if one of the agents stop cooperating while playing the Prisoner

    Dilemma Game. Axelrod (AXELROD, 1984) provides a classical reflection example of

    cooperation, not only regarding the Prisoner Dilemma Game but, also, the cooperative

    behavior placement in the chains of relations (exchange) between different powers.

    Through interacting in its multiagent environment and learning the possible outcomes,

    a learning agent (that adapts its action selection to environment) will stabilize its action

    policy through the influence of other agents actions. Other agents strategies may put

  • CHAPTER 2. BACKGROUND 24

    forward environmental uncertainties and, if there is no data sharing, those uncertainties

    are considered to be part of the environment itself. On the other hand, if a particular

    agent never triggers any change over the others, to them, it may be as if that particular

    agent never existed (neither as part of the environment). In an environment with various

    states, if agent A usually collides with agent B in a particular state, then, one of them

    (or even both) may end up avoiding that particular state and, depending on the game

    possibilities and alternative path chosen, the agents may never find their best path to

    accomplish a task. That happens because they can be unable to coordinate their action

    selection while taking each other as part of the environment itself.

    If two agents within the same environment are rational, once both have learned the

    environmental dynamics, they will select actions in accordance with what one expects

    to come from the other - even if indirectly considering the other agent, since it may be

    considered as part of the environment itself. And those selected actions are expected to

    be the agent better option. Then, we have agents that will try to give their best shot in

    response to what it is expected to be the other agents best shot (see, for instance, the

    Minimax theorem from Neumann (NEUMANN, 1928)). Therefore, since in general rational

    agents are seeking to choose the best selfish action, how is it possible to achieve a better

    social outcome instead of an individual one? How to obtain a rational and cooperative

    agent while avoiding ad hoc artifices?

    We seek to provide a possible approach through an architecture that does not exchange

    environmental data (such as the selected action). We will use a game of incomplete

    information: the agents will not have access to the neighbors actions or reinforcements.

    But, at the same time, moral agents must morally behave while interacting with other

    agents. The only data that will be shared by our agents consists of the neighbor ID: each

    interacting agent will identify itself. Then, each one of them will be able to keep a record

    of the neighbor ID and the reinforcements from interacting with that very same neighbor.

    2.1 Artificial Moral Agents (AMAs)

    According to Wallach and Allen (WALLACH; ALLEN, 2008), Artificial Moral Agents

    (AMAs) would require the ability of accessing many options and working through differ-

    ent evaluative aspects to present a good performance in a human moral domain - moreover,

    it would be expected that AMAs would not deform it. Still addressing the computational

    simulation of moral behavior, Wallach and Allen (WALLACH; ALLEN, 2008) emphasize the

    advantages a machine could have over a human brain to respond to moral dilemmas, such

    as the power of working through a higher number of matching possibilities and the exemp-

    tion of sexual or emotional interferences. Machine could use those advantages to come

  • CHAPTER 2. BACKGROUND 25

    up with a better answer than those usually provided by humans. Wallach (WALLACH,

    2009) analyzes moral dilemmas brought about by philosophers and contrasts what people

    morally accept to do in order to save lives and the number of lives that could actually

    be saved by them - there are cases in which the human moral judgment will not lead to

    saving the highest number of lives (that case, could we say the human moral judgment

    failed?). The AMA designers have to deal with those tricky situations (should we follow

    the human moral judgment while developing our code or design utilitarian machines?)

    and stick with a perspective while designing the machine code.

    To be that way, it is decisive that the designers themselves reflect over their beliefs,

    prejudices, perspectives (such as our bias to identify people, the cross-race effect (FEIN-

    GOLD, 1914)) and taboos to avoid embodying it over the machines design. For instance,

    Roth (ROTH, 2013) detailed the issue that technology and other mechanisms designed to

    represent the skin tone did not evolve to replicate the skin color of non-Caucasian people.

    Nowadays technology is currently bringing up more issues to the Ethics of Artificial Intel-

    ligence. Another example regards to sex robots from TrueCompanion (TRUECOMPANION,

    2010): would it badly interfere on the humans empathy? We are so deeply merging to

    technology that it is not required that machines embody moral behavior to affect our

    moral system, technology already changed our moral system.

    Wallach (WALLACH, 2015) pores over the latest technology resources and potentialities

    (including killing possibilities) while addressing responsibility issues of developers and

    users. The apprehension of AMAs causing negative influences over humans is mentioned in

    Bringsjord et al. (BRINGSJORD et al., 2006). Through enabling the formalization of a moral

    code, deontic logic would allow the writing of theories and dilemmas in a declarative way.

    That would allow the specialist analysis, then being a method of restricting the machines

    behavior in ethically sensitive environments. Bello and Bringsjord (BELLO; BRINGSJORD,

    2012) also emphasize a concern that restrictions should be inserted over the machines

    design and those should be related to the human cognition. For instance, the moral

    common sense and intuition should take part in that. Bringsjord et al. (BRINGSJORD et

    al., 2006) present modifications over a mind reading model from Bello et al. (BELLO et al.,

    2007) and, from their results, they conclude that we will have to deal with the confuse

    human moral cognition to build AMAs that productively interact with humans. They also

    ponder that moral machines should have a mechanism similar to common sense. That

    adds matter to the debate about Lethal Autonomous Systems, as Arkin points out in

    reflections in (ARKIN, 2013) and Asaro in (ASARO, 2012).

    Computational simulation of moral behavior may be approached through diverse con-

    texts. To exemplify the theme diversity, we detail three models:

    1. LIDA Model (WALLACH, 2010), (WALLACH et al., 2008), (WALLACH et al., 2010),

  • CHAPTER 2. BACKGROUND 26

    (FRANKLIN et al., 2014) and (FAGHIHI et al., 2015). As a computational and concep-

    tual model of human cognition, LIDA is described as a cognitive architecture de-

    signed to select an action after dealing with ethically pertinent information. There-

    fore, the LIDA model is expected to be able to deal with moral decisions. According

    to Wallach (WALLACH, 2010), an artificial moral agent under the LIDA architecture

    would be designed to, within the available time, select an action while taking into ac-

    count the maximum possible quantity of ethically relevant information. This model

    was influenced by the Global Workspace Theory (GWT) (BAARS, 1993 (1988)) and

    by the Pandemonium Theory (JACKSON, 1987) for the automation of action selec-

    tion. GWT would have detached itself as a human cognitive processing theory given

    to its interpretation of the nervous system as distributed in parallel with different

    specialized processes; and some coalitions of such processes would allow the agent

    to build a sense from its sensorial data (which would come from its current envi-

    ronmental situation). Other coalitions would inherit results from the sensorial data

    processing that would have competed for attention and would have won. Those

    would occupy the global workspace (GW), whose content would be transmitted to

    all other specialized processes. Under a functional point of view, the GW content

    would be conscious content and serve to recruit other processes to be used for action

    selection in response to the current situation. In both GWT and LIDA, learning

    would require and work through attention and would come in each conscious trans-

    mission. The LIDA model is based on a cognitive cycle. Then, the human cognitive

    processing would occur via continuous interaction of cognitive cycles, which would

    happen asynchronously. Various asynchronous cycles could have different simultane-

    ous parallel processes but that should respect the serial nature of the consciousness

    process, important to keep a stable and coherent world scenario. During each cy-

    cle, the LIDA agent would give sense to its current situation through updating its

    internal and external environmental representations. Through a competitive pro-

    cess, it would be decided which representing portion of the current situation should

    receive attention. That portion would then be transmitted, becoming the current

    content of consciousness and enabling the agent to choose an adequate action and

    execute it. The feelings in the conscious flow would participate within many ways of

    learning. New representations would be learned when generated in a cognitive cy-

    cle and those that were not sufficiently stressed during the concurrent cycles would

    disappear. Feelings would induce the action and the activation of environmental

    schemes. Thus, the behavior selection would be influenced by its relevance over

    the current situation, by the nature and importance of associated feelings and by

    their relation with other behaviors, some of them being necessary to the current

    behavior. To be executed, the selected behavior and feelings would be transmitted

    to the Sensory-motor memory. There, the feelings would participate in the action

  • CHAPTER 2. BACKGROUND 27

    execution, as feelings can influence parameters as strength and speed.

    2. EthEl Model (ANDERSON; ANDERSON, 2008b), (ANDERSON; ANDERSON, 2008a)

    and (ANDERSON; ANDERSON, 2011), which application is related to prima facie

    duties (duties that are mandatory, unless overpassed by stronger ones), was imple-

    mented and tested within the notification context. This means an analysis of when,

    how often, and whether to run a notification about a medicine to a particular pa-

    tient. A typical dilemma example comes from the rejection of a patient of taking

    the recommended medicine from a doctor. In what situation should the professional

    insist the patient changes his mind? If it is crucial that the patient do take the

    medicine, how many times should it be mentioned to the patient and when should

    the doctor be notified about the patient refuse? EthEl (Toward a Principled Ethical

    Eldercare Robot) (ANDERSON; ANDERSON, 2008b), (ANDERSON; ANDERSON, 2008a)

    and (ANDERSON; ANDERSON, 2011) is a model trained over a deontic context (con-

    cerning the duties of the health care professional), and is a prototype that applies

    ethical principles (established by learning) to choose an action. The prototype would

    have learned an ethical principle in its action taking in a particular kind of dilemma,

    the one that relates to prima facie duties. The duties would embody a philosophical

    problem relating to the absence of a decision procedure when the duties provide

    conflicting orientation. The inspiration to EthEl come from Rawls (RAWLS, 1951).

    The ethical dilemmas were presented to the prototype by a ordered set of values

    to each possible solution, whose values would reflect violation or duty satisfaction.

    EthEl uses the Inductive Logic (LAVRAC; DZEROSKI, 1994) to measure the decision

    principle that has to be used to deal with the proposed dilemmas. EthEl would

    have discovered a consistent decision principle that would indicate the correct ac-

    tion when specific duties place in different directions a particular kind of dilemma.

    Then, the professional should question the patient refuse if she/he is not completely

    autonomous and when there is no violation of the duty of non-maleficence or severe

    violation of the duty of beneficence. But EthEl would have established that vio-

    lations over the duty of non-maleficence should impact more than violations over

    duty of beneficence. The authors ponder that EthEl could also be used to other

    sets of prima facie duties to which there is agreement among the specialists about

    the correct actions.

    3. To reflect about the Moral Theory vis-à-vis the conflict Generalism versus Partic-

    ularism, Guarini (GUARINI, 2006) and (GUARINI, 2012) draws insights from Dancy

    (DANCY, 2010) while pondering if the moral reasoning, including learning, could be

    done without the use of moral principles. If so, models of artificial neural networks

    (ANN) could provide indications of how to do it, given the fact that ANNs would

    be able to generalize new cases from those previously learned - and do it without

  • CHAPTER 2. BACKGROUND 28

    principles of any kind. Thereby, ANNs are modeled to classify and reclassify cases

    with a moral purport, being the output (acceptable or not) an answer to moral

    dilemmas attached to the questions kill or let die. Dancy (DANCY, 2010) empha-

    sizes a mismatch between moral principles and the importance of the context for the

    analysis of what is morally acceptable: moral decisions would depend on the context

    and situation. The subject kill or let die from Guarini (GUARINI, 2006), (GUARINI,

    2012) would have come from a modified analogy from Thomson (THOMSON, 1971) -

    where, relating to an abortion from being pregnant by rape, it takes place a discus-

    sion about the difference between murderer and letting die. The modified analogy is

    as follows: there is only one person capable of keeping a particular man alive. That

    person is kidnapped and placed to filter the man’s blood and should stay there,

    connected to him, for nine months. After that, the man will survive and the person

    may be free from him. In short: after using violence, a life became dependent on

    the other. Then, would it be morally acceptable or not that the person decided to

    disconnect from the man before he could be saved (leading to the man’s death)?

    According to Guarini (GUARINI, 2006), (GUARINI, 2012), the results suggested that

    the classification of non-trivial cases from the absence of queries about moral prin-

    ciples would be more plausible than it might be supposed at first sight, although

    important limitations suggest the need for principles. Regarding a reclassification,

    which would be an important part of the reasoning in humans, simulations indicated

    the need for moral principles.

    The approaches from items 2 and 3 fall into an specific application domain: EthEl is

    tested in a notification context (analysis of when, how many times and if a notification

    shall be done). Item 3 relates to the concern of providing as output an answer to moral

    dilemmas related to the questions kill or let die. Research driven to deal with moral

    dilemmas is particularly important because it may be useful to design a morality mech-

    anism in machine learning (see Sect 5.1). Finally, the LIDA model is a complex project

    that was still under development when we started our project. Since none of the studied

    works matched our intentions, we searched for other bases (see Sect. 2.1.1) to guide our

    moral architecture design.

    2.1.1 MultiA

    We expect to obtain relevant decision making toward cooperation in MAS tasks by

    designing a computational architecture endowed with artificial emotions, feelings and

    moral behavior (through the empathy embodiment). We started designing our bioinspired

    computational multiagent architecture by using the ALEC architecture from Gadanho

    (GADANHO, 2003) as essential reference - we describe such an influence in Sect. 3.3. Our

  • CHAPTER 2. BACKGROUND 29

    multiagent architecture is called MultiA since it is driven for usage in multiagent systems

    (Multi) and the Alec architecture (−A) inspired it. To design a bioinspired computationalarchitecture, we studied biological and philosophical references while seeking to compu-

    tationally mimic rudimentary mechanisms related to both moral and immoral behaviors.

    Work has been done to establish the crucial role of emotions during the process of

    intelligent decision making and its importance in filtering information and awakening

    our attention mechanisms (see the Somatic Marker Hypothesis from Damásio (DAMÁSIO,

    1994)). The emotions and feelings vital role in rational decisions embraces social emotions

    (such as sympathy and its associated feeling of empathy) and are analyzed from the aspect

    of social interaction and homeostatic goals (DAMÁSIO, 2004). Damásio (DAMÁSIO, 2004)

    defined social emotions using the concept of moral emotions by Haidt (HAIDT, 2003) -

    we will follow it while designing the artificial emotions. Haidt (HAIDT, 2003) explains

    emotions as responses to a class of events perceived and understood by the self and so

    emotions usually provoke action tendencies. It is particularly important to differentiate

    social emotions from other emotions: social emotions trigger action tendencies during

    situations that do not represent direct harm or benefit to the self (disinterested action

    tendencies), other emotions, on the other hand, are more self-centered.

    The brain dexterity of internally simulate emotional states, establishing a basis for

    emotionally possible outcomes and emotion-mediated decision making, are also scruti-

    nized in Damásio (DAMÁSIO, 2004). Therefore, internal simulation takes place during

    the process along which sympathy emotion turns into the feeling of empathy. The social

    interaction would be done via mirror-neurons (discovered in the premotor cortex area of

    macaque monkeys by Pellegrino et al. (PELLEGRINO et al., 1992) and Rizzolatti et al. (RIZ-

    ZOLATTI et al., 1996)) by making our brain internally simulate the movement that others

    do while in our field of vision, for example. Such a simulation would enable us to predict

    the required movements to establish communication with the other (which will have its

    movements mirrored). Finally, the internal simulation about our own body (e.g. when

    we internally simulate ourselves executing different activities) could be as well related to

    the mirror-neurons.

    Gallese and Goldman (GALLESE; GOLDMAN, 1998) reflects over the human aptitude of

    simulating the mental states from others, and thus understanding their behavior, assigning

    to them intentions, goals or beliefs. There is a suggestion that what might have evolved

    to such a capacity is an action execution/observation matching system. Likewise, a class

    of mirror neurons would be playing its role on that. Moreover, a possible activity of the

    mirror-neurons would be to promote learning by imitation. Nowadays there exists the

    agreement that normal humans develop the capacity of representing mental states from

    others (the system representation oftentimes receives the name folk psychology). Finally,

    there is the consideration that fitness could be evaluated from such ability, as detecting

  • CHAPTER 2. BACKGROUND 30

    another agents goals and inner states can help the observer to predict the other future

    actions, which can be cooperative or not, or even threatening (researchers are continuously

    providing new insights from Mind Reading related experiments).

    While holding an emotional background, the dynamics involved in the empathy origin

    can be approached from a cognitive aspect (WAAL, 2009) (PROCTOR et al., 2013). The

    social emotion of sympathy feeds the feeling of empathy. But the social emotions benefit

    from the internal simulation improved by mirror-neurons that internally mirror the situ-

    ation of the other (learning by imitation may be related to the mirror neurons activity).

    But the feeling of empathy will be less or more intense depending on the importance of a

    particular other agent (DAMÁSIO, 2004). We seek to maintain negative emotions in low

    levels and the positive ones in high levels (DAMÁSIO, 2004) and the purpose of homeostasis

    would be to product a state of life better than neutral, to accomplish what we identify

    as well-being. Following the idea, MultiA will establish its preferences considering its

    own and peers well-beings. While designing the Empathy Module (see Sect. 3.2.2.2) of

    MultiA, we used the mirror-neurons as inspiration. Even though MultiA does not mir-

    ror its neighbors movements, MultiA mirrors its own emotions and preferences over the

    neighbor. Then, the current emotions of MultiA itself are applied while building an

    expectation about the well-being of a neighbor - and during that process, MultiA con-

    sider that the neighbor shares the very same emotional preferences (in Sect. 3.2.2.2, see

    I = {{IP}, {IN}}).

    Regarding the feeling of empathy, we have also been guided by the differentiation of

    three types of agents: the moral, immoral and amoral. By rudimentarilly mimicking those

    three patterns of morality, our agents will display different social interaction policies. The

    moral (MultiA, moral agents) tries not to take advantage from the others and cooperate;

    the immoral takes advantage from the others more easily and does not cooperate (MultiA,

    immoral agents). Unlike the others, the amoral is not guided by social emotions and

    feelings (MultiAA, Sect. 4.1.1). The entireMultiA architecture will be analyzed in Sect. 3.

  • 3 MultiA: A Computational Model

    for Simulation of Empathy and

    Moral Behavior

    We propose the MultiA computational architecture designed from reflections over the

    relevance of moral behavior in the search for a rational and cooperative biologically inspired

    artificial agent. We hypothesize that the simulation of emotions and moral behavior aiding

    the computational architecture to make decisions favors cooperation even in face of high

    reinforcements to selfish behavior. The analogy with moral behavior is implemented

    through a simulation of empathy, thus the agent can have the ability to select actions that

    may not be the best selfish option, but that help to enhance the interactions among agents.

    Since MultiA agents have the empathy more accessible for agents whose interactions have

    resulted in positive reinforcements, the reciprocity assumption introduces both moral and

    immoral agents, since the mechanism of action selection can be moved from two aspects.

    The first is related to cooperation, once the particular MultiA agent has a cooperative

    neighborhood. Then, the agent will cooperate by reciprocity. The second is related to

    non-cooperation, since the surrounding is non-cooperative (non-cooperative MultiA agent

    by reciprocity). Then, MultiA rudimentarilly mimics both moral and immoral agents.

    MultiA consists of three main systems (Fig. 3.1): the Perceptive System (PS), the

    Cognitive System (CS) and the Decision System (DS). The interactions among the

    three systems will result on action selection derived from sensations triggered by the

    environment while provoking environmental changes that will, on its turn, trigger new

    sensations, and so on. As input to PS, MultiA has artificial sensations that are triggered

    by reinforcements and indexed by the agent it is interacting with.

    3.1 MultiA Functioning: an Overview

    While designing the computational architecture from Gadanho (GADANHO, 1999), we

    took into account the animal behavioral characteristics analyzed in Hallam and Hayes (HAL-

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 32

    FIGURE 3.1 – The general scheme of the MultiA Architecture.

    LAM; HAYES, 1992) that could inspire a robotic design. Among those animal characteris-

    tics, there is homeostasis, the biological capability of body auto-regulation, such as keeping

    the temperature or the cells pH, in such a way that the internal conditions are kept under

    a stable and regular basis. Through its research on organic mechanisms of biological reg-

    ulation, Claude Bernard (1813-1878) used the concept Milieu Intérieur, the homeostasis

    precursor. Later, Cannon (CANNON, 1932) described the body steady states and some

    mechanisms to control them; Cannon (CANNON, 1932) also provided an analogy between

    social processes and body regulation. Therefore, it may be natural the association of

    homeostasis with a neutral, balanced, state. Nevertheless, according to Damásio (DAMÁ-

    SIO, 2004), life regulation would be designed to comprehend the homeostatic efforts to

    produce that state that we understand as well-being. The environment and our bodies

    evoke ongoing homeostatic reactions that will keep influencing us and our actions from

    which we will keep up changing our environment and ourselves. Homeostatic reactions

    may continue reflecting upon us even after the particular situation that caused them has

    ended.

    Through the inspiration from biological homeostasis and from Gadanho and Custó-

    dio (GADANHO; CUSTÓDIO, 2002), see Sect. 3.3, we designed the MultiA Perceptive Sys-

    tem. The artificial sensations feed emotions, feelings and, afterwards, through a weighted

    sum on feelings, the general environmental and internal perspective of a MultiA agent i

    (named Well-Being, Wi) about its own performance. MultiA follows its artificial home-

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 33

    ostatic goals, thus, it selects those actions that are expected to keep the feelings and

    emotions within a threshold, therefore, achieving high Wi levels. The history of a MultiA

    agent reflects on the current values of its Perceptive System and on the learning of match-

    ing emotional responses to actions. Therefore, the feelings maintenance on a threshold

    relies upon the selection of adequate actions in response to the environment. Wi is mod-

    eled as a function of the feelings, and internally represents the general condition of agent

    i. It is calculated with normalizing weights such that its value falls in the range [−1, 1].From another perspective, Wi enlightens how suitable has been the action selection (from

    DS) concerning the reinforcements received by the MultiA agent i itself and the remain-

    ing feelings, as empathy. In addition to Wi, MultiA also produces Wpi, a prospect about

    the current situation of other agents.

    MultiA then uses a set of its own emotions to provide itself a prospect about the

    current situation of other agents. Although there is some controversy about it (see for

    instance Hickok (HICKOK, 2014)), we used mirror neurons ( (PELLEGRINO et al., 1992),

    (RIZZOLATTI et al., 1996)) as inspiration on the mechanism for projecting MultiA own

    emotions to mirror other agents situation. Actions related to high empathy are designed to

    be avoided, since we consider that when an agent rouses high empathy levels it is because

    the agent itself may be disturbing the performance of the others. For the design of the

    Empathy module, we used the utilitarian calculus from Bentham (BENTHAM, 2007 (1789))

    as guideline. This way, MultiA agents have the empathy more accessible for agents whose

    interactions have resulted in positive reinforcements. Furthermore, if a MultiA agent has

    been receiving a high number of positive reinforcements, it is also more susceptible to

    cooperate. The empathy is represented by S4,ip: feeling number 4 of MultiA agent i for

    neighbor p; on Figure 3.1, see feeling number 4. As we designed the empathy to reflect

    the impact of the action selection of MultiA on its neighbors, the higher the empathy

    for a specific neighbor p, the lower is Wi, all the remaining variables that feed Wi kept

    constant. This means that, at a certain point, the MultiA agent may not have been

    selecting its actions appropriately, since it may be affecting negatively on the particular

    neighbor p, thus high empathy levels are an indication of inadequate action selection.

    Selected actions are considered adequate when they do generate positive reinforcements

    while not provoking high empathy levels. If p fires high empathy on i, p may be getting

    low reinforcements and therefore its neighbors, as i, should check their actions.

    Thus, MultiA is designed to seek those actions that will not increase its levels of

    empathy. Then, after applying the current emotions (from the PS) as input, the CS uses

    artificial neural networks (ANNs) to estimate the resulting Well-Being if the corresponding

    action is to be selected. The CS will then deliver the outputs from all ANNs to the DS

    to choose an action.

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 34

    3.1.1 MultiA and an Interaction Game

    Through following utility-based computational approaches, it is not trivial to model

    artificial agents that reject the opportunity of taking advantage of the others actions (e.g.

    the selection of actions only driven to obtain the highest reinforcements, no matter the

    consequences to others) and still commit with the choice of cooperating. In Sect.2 we in-

    troduced the public goods subject (including the related issue of, somehow, take advantage

    of the others actions, the free-riding) and mentioned that the Prisoner’s Dilemma Game

    (PDG) is generalized through public goods games (HARDIN, 1971), (WAKANO; HAUERT,

    2011). We developed MultiA with the aim of providing an architecture extensible to

    different domains and to show cooperation as an emergent property. Without loss of

    generality, let us hypothesize that each MultiA agent i is going to play the Prisoner’s

    Dilemma Game with another MultiA agent p. Hence, each MultiA agent will have to

    decide if it is going to cooperate or not with the other (to defect) - and a defector is

    highly rewarded for unilateral defection (defection vs. cooperation). In Sect. 3.2 we detail

    the MultiA architecture itself while in Ch. 4 we present the MultiA agents, the artificial

    learning agents under the MultiA architecture in a multiagent environment and task.

    3.2 The Systems of the MultiA Architecture

    3.2.1 Perceptive System (PS)

    We consider a model where reinforcements are non negative. As long as our research

    is grounded on moral behavior, we intend to test and study MultiA agents interacting

    among themselves. Thus, each MultiA agent i will keep a list of every agent it has

    interacted with (the neighbors of i). Sensations fall in the range [0, 1] and, together

    with the history provided by the CS, give rise to artificial emotions. MultiA artificial

    sensations are triggered by reinforcements, and by an identifying index for the neighbor it

    is interacting with. Indexing is defined in the following way: every MultiA agent has an

    identifying index i = {1, ..., N}, and the neighbors relating to each agent i also have anidentifying index p = {1, ..., Z}. A given p value thus refers to a particular neighbor thatis interacting with i. The CS delivers five sets of data (history of agent i) to the PS:

    1. The current number of neighbors of agent i;

    2. The reinforcement history of agent i;

    3. The number of times agent i has interacted with each neighbor p;

    4. The number of times interactions with p ended up in positive reinforcements (Mip);

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 35

    5. The value of Ypi, defined as follows. The CS accesses the current emotions from

    PS. Then, the Empathy Module EM (from the CS) produces Wpi: an assumption

    of i about the current condition of neighbor p. MultiA will then respond to the

    current condition of neighbor p (Wpi), producing Ypi. If the neighbor p is supposed

    to be facing low reinforcements, MultiA may have its empathy raised (depending

    on the Ypi value) to select less selfish actions and try to cooperate with the raise of

    the reinforcements of p. For details, see Sect. 3.2.2.2.

    There are basic emotions {Eb1,i, Eb2,i, ..., Ebd,i} and social emotions {Es1,i, Es2,i, ..., Esh,i},all normalized to [−1, 1]. The basic emotions are associated to the general condition ofthe MultiA agent itself. Social emotions are stimulated by neighbors and by the impact

    of the own agent actions on those neighbors. The artificial feelings {S1,i, S2,i, ..., Sz,i} alsofall in the range [−1, 1] and are fed by emotions. We used the reference (DAMÁSIO, 2004)as inspiration while shaping the artificial basic emotions. Table 3.1 explicits the particular

    biological emotions that inspired each MultiA basic emotion.

    TABLE 3.1 – Basic Emotions and the Artificial Basic Emotions of MultiA

    Biological Basic Emotion Artificial Basic EmotionAnger Eb1,i

    Sadness Eb2,iSurprise Eb3,i

    Fear Eb4,iHappiness Eb5,i

    Disgust Eb6,i

    The artificial basic emotions are:

    • Eb1,i: increases with the number of interactions of i in the same match. A matchis defined by every i interacting only once with each and all of its neighbors, and

    interactions are always ordered w.r.t. neighbor agent index. Once all neighbors have

    interacted, the match ends. It is calculated according to Eq. 3.1:

    Eb1,i = −1 + 2 ∗ (mti/V 1i ) (3.1)

    where t represents the (possibly unfinished) current match, V 1i is the initial number

    of neighbors of agent i at the first match, and mti is the number of concluded

    interactions of i with its neighbors during current match t.

    • Eb2,i: indicates the difference between the sum of reinforcements rt−1i , received by i

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 36

    during the match t− 1 (Eq. 3.2) and a threshold value R0,i (range [0, 1]).

    rt−1i =

    V t−1i∑j=1

    Rt−1i,j (3.2)

    where V t−1i is the number of neighbors of agent i at match t − 1 and Rt−1i,j is thereinforcement of i after interacting with neighbor j at t− 1.

    Eb2,i is then calculated as

    Eb2,i = rt−1i −R0,i (3.3)

    • Eb3,i: at each match t, it decreases with the number of lost neighbors (a neighbor islost when it stops interacting), Eq. 3.4:

    Eb3,i = 1− 2 ∗ ((V 1i − V ti )/V 1i ) (3.4)

    where V ti is the number of neighbors of agent i at match t. Note that, as MultiA

    social emotions are designed to be triggered by social interaction, we assume that

    V 1i > 0.

    • Eb4,i: indicates the difference between the current sum of reinforcements rti and athreshold value. That is measured by comparing the current sum of reinforcements

    rti and R0,i, Eq. 3.5:

    Eb4,i = rti −R0,i (3.5)

    • Eb5,i: the current rti during the current match t, see Eq. 3.6;

    Eb5,i = −1 + (2 ∗ rti). (3.6)

    • Eb6,i: it always starts a match with value = 1 and only decreases (during the currentmatch t) if the interaction with a neighbor does not render positive reinforcements,

    see Eq. 3.7:

    Eb6,i = Eb6,i − 2 ∗ (1/V 1i ) (3.7)

    In contrast with basic emotions, social emotions are driven by the neighbors and by

    the influence of the MultiA agent on those neighbors. The same way as we did about

    the basic emotions, we used the reference (DAMÁSIO, 2004) as inspiration to shape the

    MultiA artificial social emotions. Table 3.2 explicits the particular social emotions that

    inspired the artificial social emotions of MultiA.

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 37

    TABLE 3.2 – Social Emotions and the Artificial Social Emotions of MultiA

    Biological Social Emotion Artificial Social EmotionPride Es1,i

    Gratitude Es2,iCompassion Es3,iSympathy Es4,i

    The artificial social emotions of MultiA are:

    • Es1,i: emphasizes those behaviors relating to the social context that did not originatepositive outcomes to i but, still and to a minor degree, increases together with

    positive reinforcements of the agent. That way, Es1,i increases at any change on Eb5,i

    and, in a greater degree, at any change on Eb6,i. It always starts a match with value

    = −1; and s < (2/V 1i ) is a weight used to establish the importance of Eb5,i, see Table3.3.

    TABLE 3.3 – Updating Es1,i

    At any change on The value of Es1,i:Eb6,i E

    s1,i + (2/V

    1i )

    Eb5,i Es1,i + s

    • Es2,i: the average number of variations of Eb5,i per iteration, normalized to the range[−1, 1], from the first match until the current one. It starts with zero;

    • Es3,ip: calculated according to Eq. 3.8, is the average number of variations of Eb5,iper iteration with neighbor p:

    Es3,ip = −1 + (2 ∗Mip) (3.8)

    where Mip is provided by the CS and is the average number of variations of Eb5,i

    (i.e., average number of increases in rti) per interaction with neighbor p.

    • Es4,ip: is doubly fed, both by the reciprocity value addressed to neighbor p (Ypiprovided by the CS) and by the empathy feeling S4,ip

    t−1 (see Table 3.4) by p right

    after the last interaction with p (during last match at t− 1), a residual value fromthe past influencing the current emotion:

    Es4,ip = (ca ∗ S4,ipt−1) + ((1− ca) ∗ Ypi) (3.9)

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 38

    where ca = [0, 1] is a weight used to establish the importance of residual values of S4,ipt−1.

    The EM from the CS sends the Ypi value of agent i to p, see Sect. 3.2.2.2.

    Once in the PS, the Ypi will stimulate the social emotion Es4,ip (social emotion number

    4 of agent i for neighbor p; on Figure 3.1, see social emotion number 4), then reaching the

    empathy feeling S4,ip. The emotion Es4,ip is fed both by Ypi, and by the empathy feeling

    by p right after the last interaction with p, a residual value from the past influencing

    the current emotion. Then, right before a new interaction with p, the empathy feeling

    is fed both by the emotions Es4,ip and Es3,ip (social emotion number 3 of agent i for

    neighbor p; on Figure 3.1, see social emotion number 3). The last summarizes the utility

    of neighbor p: the average number of times interacting with neighbor p has resulted in

    positive reinforcements.

    The artificial feelings = {S1,i, S2,i, ..., Sn,i} fall in the range [−1, 1] and arise through aweighted sum of emotions. The weights are set according to the relevance of each emotion

    to the domain. Table 3.4 presents the set of emotions that feed each feeling (Eb1,i does not

    feed any feeling). Because of its feeding set of emotions, the only feeling that adapts to

    the interacting neighbor p is S4,ip. The well-being Wi uses feelings to internally represent

    the general situation of agent i. It is calculated (Eq. 3.10) with normalizing weights so

    that the final value falls in the range [−1, 1]:

    Wi =n∑

    j=1

    ajSj,i (3.10)

    where n is the number of feelings. The weights aj are set respecting the relevance of each

    feeling to the domain. For simplification, the p index of S4,ip is omitted from Eq. 3.10. Wi

    measures the performance of MultiA agent i in the environment, considering the empathy

    feeling for p. If the empathy reaches high levels, Wi will be low: probably the last selected

    actions may be causing bad outcomes to p; therefore the well-being Wi of agent i should

    be low, even though its reinforcements may be high.

    TABLE 3.4 – Artificial Feelings

    Feeling Fed by EmotionS1,i E

    b2,i and E

    b3,i

    S2,i Eb4,i, E

    b5,i and E

    b6,i

    S3,i Es1,i and E

    s2,i

    S4,ip Es3,ip and E

    s4,ip

  • CHAPTER 3. MULTIA: A COMPUTATIONAL MODEL FOR SIMULATION OFEMPATHY AND MORAL BEHAVIOR 39

    3.2.2 Cognitive System (CS)

    The CS consists of two Modules: Empathy (Sect. 3.2.2.2) and Learning(Sect. 3.2.2.1).

    The first is responsible for producing the Ypi value to be sent to the PS. Once there, Ypi

    ends up feeding the empathy feeling. The second module applies artificial neural networks

    (ANNs) to estimate the Well-Being Qipt(Eip

    t, k) that will result from the execution of a

    specific action k (k ∈ actions) in response to the current set of emotions Eipt. Observethat Eip

    t is the current set of all emotions (basic and social) of agent i at match t and

    for neighbor p. In Fig. 3.2 we illustrate the Learning Module functioning: at match t,

    agent i is going to interact with neighbor p and its current set of emotions is Eipt. Before

    the agent i takes an action, the Learning Module estimates the Well-Being values that

    will probably follow from the execution of each action k. In the example, agent i has two

    options of action: action A or B. If executed, action A is expected to obtain the higher

    estimated Well-Being value, as Qipt(Eip

    t, A) = 0.2 and Qipt(Eip

    t, B) = 0.1.

    FIGURE 3.2 – The Learning Module of agent i (represented by the black box) providesthe estimated Well-Being values for each available action if it is going to be executed inresponse to an interaction with neighbor p at match t.

    3.2.2.1 The Learning Module

    Two main references were considered while designing our Learning Module: Gadanho

    (GADANHO, 1999) and Lin (LIN, 1993). The Learning System from Gadanho (GADANHO,

    1999) received inspiration from Lin (LIN, 1993), which depicts the application of one ANN

    for each action available to the agent, and the action policy acquisition based on the Q-

    Learning algorithm (WATKINS, 1989). The ANNs from Gadanho (GADANHO, 1999) are

    feed-forward and trained t