Goal-Driven Autonomy Learning for Long-Duration Missions Héctor Muñoz-Avila

Goal-Driven Autonomy Learning for Long-Duration Missions

Héctor Muñoz-Avila

Goal-Driven Autonomy (GDA)

Key concepts: Expectation: the expected

state after executing an action.

Discrepancy: A mismatch between expected state and actual state.

Explanation: A reason for the mismatch.

Goal: State or desired states.

GDA is a model of introspective reasoning in which an agent revises its own goals

Where the GDA knowledge is coming from?

Objectives Enable greater autonomy and flexibility for unmanned

systems A key requirement of the project is for autonomous

systems to be robust over long periods of time Very difficult to encode for all possible circumstances

in advance (i.e., months ahead). Changes (e.g., environmental) over time

Need for learning: adapt to a changing environment. GDA knowledge needs to adapt to uncertain and

dynamic environments while performing long-duration activities

Goal-Driven Autonomy (GDA)

Concrete Objective: Learn/adapt the GDA

knowledge elements for each of the four components

Three levels: At the object (TREX) level, At the meta-cognition

(MIDCA) level, and At the integrative object and

meta-reasoning (MIDCA+TREX) level

Intend

Controller

Plan

Evaluate

Monitor

Interpret

Meta Goals

goalinsertionsubgoal

TaskHypotheses

Activations Trace

Meta-LevelControl

Introspective Monitoring

MemoryReasoning Trace

( )

Strategies

Episodic Memory

Metaknowledge

Self Model ( )

Mental Domain = Ω

Goal Managementgoal change goal input

World =Ψ

MemoryMission & Goals( )

World Model (MΨ)

Episodic Memory

Semantic Memory & Ontology

Plans( ) & Percepts ( )

ProblemSolving

Comprehension

goal change goal input

goalinsertion

Intend

Act (& Speak)

Plan

Evaluate

Perceive (& Listen)

Interpret

Goals

subgoal

Task

Actions Percepts

MΨ

Hypotheses

MΨ

MΨ

Goal Management Learning

An initial set of priorities can be set at the beginning of the deployment

But for a long-term mission such priorities will need to be adjusted automatically as a function of changes in the environment.

For example, by default we might prioritize

sonar sensory goals E.g., to determine potential hazardous

conditions surrounding the UUV

Goal Management Learning - Example Situation: four unknown contacts in area

Default: identification of each contact can be set as a goal Each goal can be associated with a priority as a

function of the distance to the UUV Unknown contact has some initial sensor readings

Adaptation: once the contact have been identified, system might change the priority of future contacts with same sensor readings E.g., giving higher priority to contact that could be a

rapid moving vessel Initial sensor readings for same target might change as

a result of changing environmental conditions

Goal Formulation Learning

New goals can be formulated depending on: the discrepancies encountered the explanation generated and the observations from state

Example: As before, unknown contact has some

initial sensor readings Contact turns out to be a large mass that moves very

close to the vehicle forcing trajectory change New goal: keep distance from contact with initial

readings

Explanation Learning Explanations are assumed to be deterministic:

Discrepancy Explanation

But these frequently assume perfect observability

Need to relax this assumption to handle sensor information

Associate priorities to explanations Need to be adapted over time Prior work studied underpinnings Need to consider sensor readings

Expectation Learning

Expectations need to consider time intervals

. Must take into account the plan look-ahead and

latency These two factors can be adapted over time by

reasoning at the integrative object and meta-reasoning (MIDCA+TREX) level

Conclusions Operating autonomously over long periods of time

is a challenging task: Too difficult to pre-define all circumstances in

advance. Conditions change over time

Our vision is for UUVs that adapt to uncertain and dynamic environments while performing long-duration activities By learning and refining GDA knowledge

Documents

Goal-Driven Autonomy Learning for Long-Duration Missions Héctor Muñoz-Avila