A Framework for Integrating Symbolic and Sub-symbolic ... · terich, 2000]; CRAM [Beetz et al., 2010], a cognitive robot abstract machine; and MALA [Haber and Sammut, 2012],a multi-agent

A Framework for Integrating Symbolic and Sub-symbolic Representations

Keith ClarkImperial College

Bernhard HengstUniversity of NSW

Maurice PagnuccoUniversity of NSW

David RajaratnamUniversity of NSW

Peter RobinsonUniversity of Queensland

Claude SammutUniversity of NSW

Michael ThielscherUniversity of NSW

AbstractThis paper establishes a framework that hierarchi-cally integrates symbolic and sub-symbolic repre-sentations in an architecture for cognitive robotics.It is formalised abstractly as nodes in a hierar-chy, with each node a sub-task that maintains itsown belief-state and generates behaviour. An in-stantiation is developed for a real robot buildingtowers of blocks, subject to human interference;this hierarchy uses a node with a concurrent multi-tasking teleo-reactive program, a node embeddinga physics simulator to provide spatial knowledge,and nodes for sensor processing and robot control.

1 IntroductionA physical symbol system as the sole basis for artificial in-telligence has been criticised by many researchers. AllenNewell and Herbert Simon introduced the physical symbolsystem hypothesis (PSSH) [Newell and Simon, 1976] imply-ing that human thinking is a kind of symbol manipulation pro-cess, and that we can build machines to mimic human intel-ligence. Detractors include Rodney Brooks who showed thatrobots with superior behaviour do not necessarily use higherlevel symbols at all [Brooks, 1990]. Recently the paradigmhas shifted more to probabilistic robotics [Thrun et al., 2005].Nilsson [2006] analyses some of the attacks against the PSSHand grants the need to supplement symbol systems with non-symbolic processes in intelligent systems, mostly for percep-tual and motor activities close to the environment.

Our architecture for cognitive robotics accommodates bothsymbolic and sub-symbolic representation. The architecturecomprises nodes operating at different spatial and temporalscales, linked in a hierarchy. Each node is a kind of sub-taskthat maintains a belief state about an abstract representationof part of the robot’s environment. It generates behaviourbased on the belief state. We are not proposing an expressivelanguage to include probabilities symbolically, but rather thatnodes can use different representations to interconnect sym-bolic or probabilistic models and behaviour.

The two main contributions of this paper are:1. The formalisation of a general architecture for cognitive

robotics and proof that cyclic updates of the hierarchy ofnodes are well defined.

2. The instantiation of the architecture with a Baxter robot1tasked to build multiple towers. The main features are:

• A symbolic node with a concurrent multi-taskingextension of Nilsson’s Teleo-Reactive (TR) rulebased robot agent programming language.

• A spatial node using a rigid-body simulator actingas the “mind’s eye” of the robot. The physics sim-ulator introduces common sense real-world spatialknowledge that would otherwise be cumbersome torepresent by a formal symbol system.

• Controller nodes that process robot sensory inputand generate robot motor actions.

Baxterinblocks-world

“Mind’sEye”physicssimulator

Ablockandend-effector

Camera

Figure 1: Baxter in blocks-world. The belief state is reflectedin the “mind’s eye” (physics simulator). A closeup of a blockand the arm end-effector showing the co-location of the grip-per and camera.

In the rest of this paper we position our robot architecturein related work and formalise the architectural framework us-ing a motivating example. We instantiate the architecture witha real-world concrete example, namely a two armed robotbuilding towers in a blocks-world environment. Finally, wediscuss robustness, limitations, and future work.

2 Related WorkSeveral cognitive architectures have been proposed. Promi-nent ones include SOAR [Laird et al., 1987] and ACT-R [An-

1The Baxter robot is built by Rethink Robotics, a companyfounded by Rodney Brooks.

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

2486

derson, 1993]. These architectures are based on the physicalsymbol system hypothesis, but have been extended to inter-face with an external environment. ICARUS [Langley andChoi, 2006] has many similarities to SOAR and ACT-R, andis directly grounded in perception and action, with concep-tual inference and skill more basic than problem solving. Awell known sub-symbolic reactive architecture is Subsump-tion [Brooks, 1986] in which higher-level layers can subsumethe roles of lower levels by suppressing their outputs.

Other related work includes: GWT [Baars, 1988], a cog-nitive architecture to account for a multitude of communicat-ing brain processes; RL-TOPS [Ryan and Pendrith, 1998],a hybrid system for combining teleo-reactive planning andreinforcement learning (RL); MAXQ task-hierarchies [Diet-terich, 2000]; CRAM [Beetz et al., 2010], a cognitive robotabstract machine; and MALA [Haber and Sammut, 2012], amulti-agent blackboard based cognitive architecture inspiredby Minsky’s Society of Mind [Minsky, 1986].

Our architectural commitments are closest to the RobotControl System (RCS) reference model architecture [Albusand Meystel, 2001]. The RCS architecture was also the inspi-ration for the triple-tower architecture [Nilsson, 2001] used toillustrate the operation of a teleo-reactive program for block-stacking tasks. The triple-tower instantiates the RCS architec-ture for modelling at the symbolic level. However, it providesno explicit representation of sub-symbolic processes, insteadassuming that there simply exists some underlying sensorysystem from which symbolic facts are generated.

In contrast to all of these existing architectures, which areeither described informally or without connection to real-world robotic implementations, we formally axiomatise ourmodel and instantiate it in a real robotic setting.

3 The Architectural FrameworkThe following formalisation presents a hierarchical systemconsisting of interconnected abstract nodes, each with theirown internal belief state and reasoning mechanism. We usethe following simple running example as an explanatory aid.Example. A self-driving car is tasked to park between 0.2mand 0.3m rear-to-wall from an arbitrary initial position. Thecar has camera and laser distance sensors providing Gaus-sian measurements to the wall. At each time-step it is able tomove forward or backward by a small displacement, or stop.

3.1 NodesNodes are tasked to achieve a goal, and have two primaryfunctions: world-modelling and behaviour-generation.World-Modelling. The notion of a world-model appears inmany disciplines under different names (e.g., internal state,hidden state, belief state). We adopt the term belief state asthe representation of a node’s internal world-model. The ter-minology for updating a world-model similarly varies (e.g,filtering, localisation, tracking, belief revision) and we adoptthe generic term: update. Update may depend on sensing ob-servations, or may be an expectation update based on a node’sactions. Finally, we distinguish between sensing and obser-vation. Sensing is the process of extracting observations, thelatter of which is used to update a belief state.

Behaviour-Generation. The belief state helps the node se-lect the next action. A function that maps states to actions isa policy. Policies may be provided directly or determined byplanning or decision-theoretic methods. Typical examples areGOLOG programs [Levesque et al., 1997], Teleo-ReactivePrograms [Nilsson, 1994], and RL policies.Definition 1. A cognitive language is a tuple L =(S,A, T ,O), where S is a set of belief states, A is a set of ac-tions, T is a set of task parameters, and O is a set of observa-tions. A cognitive node is a tuple N = (L,⇧,�, ⌧ , �, s0,⇡0)s.t:

• L is the cognitive language for N , with initial beliefstate s0 2 S .

• ⇧ a set of policies such that for all ⇡ 2 ⇧, ⇡ : S ! 2A,and an initial policy ⇡0 2 ⇧.

• A policy selection function � : 2T ! ⇧, s.t. �({}) = ⇡0.• A observation update operator ⌧ : 2O ⇥ S ! S .• An action update operator � : 2A ⇥ S ! S .Definition 1 provides a very abstract characterisation of a

node, allowing for an arbitrary representation and reasoningmechanism for individual nodes. For example, it can encap-sulate both stochastic and symbolic nodes.Example. Let E = {hx,�

x

i | x,�x

2R, 0 x <+1, and�1< �

x

<+1}, consist of position estimates representedas a Gaussian distribution with mean x and standard devia-tion �

x

. Stochastic node N1 defines the cognitive language(S1,A1, T 1,O1) as follows. The set of belief states consist of(car) position estimates: S1 = E, with the initial belief states01 = h0.8, 0.1i being a known starting position and standarddeviation. The set of actions represents small independentmoves: A1 = {�

x

2 {�0.01, 0.0, 0.01}}. The task param-eters represent the requirement to move forwards or back-wards: T 1 = {F1, B1}. Finally, the observations representcamera-laser pairs of observations: O1 = {hc, li | c, l 2 E}.

There are three policies: to move forwards, to move back-wards, and to stop: ⇧1 = {⇡

F

,⇡B

,⇡S

}, where ⇡F

(sx) ={0.01}, ⇡

B

(sx) = {�0.01}, and ⇡S

(sx) = {0.0}, for everysx 2 S1. Policy ⇡

S

is also the initial policy.The policy selection function �1 respectively maps sets

{F1}, {B1}, {}2 2T 1 to policies ⇡F

, ⇡B

, ⇡S

, with the actionupdate operator being:

�1({�x}, hx,�xi) = hx+ �x

,�x

+ viwhere v represents the increase in standard deviation due toprocess noise during the last time-step.

The observation update operator is:

⌧1({hhxc

,�xci, hxl

,�xlii}, hx,�xi) = hx0,�

x

0iwhere hx

c

,�xci, hxl

,�xli are independent camera and laser

distance observations, and hx0,�x

0i is the corrected state af-ter the observation update:

x0 =xc

�2x

�2xl

+ xl

�2x

�2xc

+ x�2xc�2xl

�2x

�2xl

+ �2x

�2xc

+ �2xc�2xl

�x

0 =�2xc�2xl�2x

�2xc

+ �2xl

+ �2x

2487

The action update and observation update operators define aKalman filter estimating the belief state of the car.

The example shows how a stochastic node can be modelledwithin our framework, where it is used to provide immediatesensing and control of a car. We now show how to representthe state and behaviour of the car at a symbolic level.Example. The symbolic node N2 contains the cognitive lan-guage (S2,A2, T 2,O2) as follows. The set of belief statesS2 = {too far, too close, on target}, where on target is theinitial belief state. The set of actions A2 = {F2, B2} yieldsaction sets {F2}, {B2}, {}, respectively to move the car for-wards, backwards, and stop. The task parameter set is emptyT 2 = {} as N2 is a top-level node. Hence, the set of poli-cies ⇧2 consists only of the initial policy ⇡0

2 : on target 7!{}, too far 7! {B2}, too close 7! {F2}. Finally, the set ofobservations is identical to the set of beliefs: O2 = S2.

The observation update operator, ⌧2, replaces the currentstate with an observation (i.e., for all x and s, ⌧2({x}, s) =x), while the action update operator, �2, makes no change tothe state (i.e., for all a and s, �2(a, s) = s).

Note, due to space restrictions a number of the functionsin the example are only partially defined. However, as part ofthe interconnection of nodes we shall ensure that the inputs tothese functions will be constrained to only the defined cases.

3.2 Cognitive HierarchyNodes in the architecture are interlinked in a hierarchy, sim-ilar to that of a task-graph [Dietterich, 2000]. Nodes in thehierarchy represents sub-tasks, equipped with their own staterepresentations and reasoning mechanisms. The lowest levelnode is a proxy for the external world, consisting of physi-cal sensors and actuators. Sensing information is passed upthe hierarchy to the top-level nodes, while actions are passeddown, eventually resulting in physical actions (Figure 2(a)).

WorldModelling

BehaviourGeneration

State

Actions

Sensing function Task parameter function

Sensing function Task parameter function

External-world Node

(a) Typical node hierarchy

SymbolicRepresentation

KalmanFilter

External-world

(b) Autonomous parking

Figure 2: Observations are propagated up the hierarchy, re-vising beliefs. Actions are generated from belief states andtranslate into task parameters for lower-level nodes. The nodewith two children (Diagram (a)) could represent concurrentlysetting behaviours for the two arms of a humanoid robot.

Definition 2. A cognitive hierarchy is a tuple H =(N , N0, F ) s.t:

• N is a set of cognitive nodes; N02 N is a distinguishednode corresponding to the external environment.

• F is a set of function pairs h�i,j

, j,i

i 2 F that connectnodes N

i

, Nj

2 N where:

– �i,j

: Si

! 2Oj is a sensing function, and

– j,i

: 2Aj ! 2T i is a task parameter function.

• Sensing graph: each �i,j

represents an edge from nodeN

i

to Nj

and forms a directed acyclic graph with N0

as the unique source node of the graph.

• Action graph: the set of task parameter functions formsa converse to the sensing graph such that N0 is theunique sink node of the graph.

Definition 2 models the connection between nodes as con-sisting of pairs of sensing and task parameter functions. Thesensing function extracts observations from a lower-levelnode, while the task parameter function translates a higher-level node’s action set to a task parameter set, which is thenused to select the active policy for a node. It is worth not-ing that the pairing of these functions is not a restrictive re-quirement and it is still possible to model a connection be-tween two nodes with actions but no sensing and vice-versa.This would simply correspond to a sensing, or task parameter,function that always returns the empty set.

Finally, the external world is modelled as a distinguishednode, N0. Sensing functions allow other nodes to observeproperties of the external world, and task parameter functionsallow actuator values to be modified, but N0 doesn’t “sense”properties of other nodes, nor does it generate task parametersfor those nodes. Beyond this external interface the internalbehaviour of N0 is considered to be opaque.

Continuing the running example we can now see how thenodes are connected (displayed graphically in Figure 2(b)).

Example. With reference to nodes N1 and N2 definedearlier, and Figure 2(b), we specify a cognitive hier-archy H = (N , N0, F ) where N = {N0, N1, N2}and F = {h�0,1, 1,0i, h�1,2, 2,1i}. For the pairh�1,2, 2,1i 2 F we define:

�1,2 : {(x < 0.2, ·)} 7! {too close},{(0.2 < x 0.3, ·)} 7! {on target},{(x > 0.3, ·)} 7! {too far}.

2,1 : {F2} 7! {F1}, {B2} 7! {B1}, {} 7! {}.

The internal representation of N0 is opaque. Instead weobserve that the sensing function from N0 to N1 takes a stateof N0 and returns camera and laser state observations. Sim-ilarly, the matching task parameter function simply sets a re-quired distance for the vehicle control sub-system:�0,1(s0 2 S0) = {hhx

c

,�xci, hxl

,�xlii}

1,0 : {0.01} 7! {0.01}, {�0.01} 7! {�0.01}, {0} 7! {0}3.3 Active Cognitive HierarchyA cognitive hierarchy captures only static properties and ad-ditional details are required to model dynamic components,such as the active belief state, policy, and actions.

Definition 3. An active cognitive node is a tuple Q =(N, s,⇡, a) where: 1) N is a cognitive node with S , ⇧, andA being the corresponding belief states, policies, and actions

2488

respectively, 2) s 2 S is the current belief state, ⇡ 2 ⇧ is thecurrent policy, and a 2 2A is the current set of actions.

Defining an active cognitive node naturally leads to an ac-tive cognitive hierarchy as a collection of active nodes.Definition 4. An active cognitive hierarchy is a tuple X =(H,Q) where H is a cognitive hierarchy with set of cognitivenodes N such that for each N 2 N there is a correspondingactive cognitive node Q = (N, s,⇡, a) 2 Q and vice-versa.

An active cognitive hierarchy captures the complete stateof an operational system at some instant. The initial state ofsuch a system is an initial active cognitive hierarchy such thatthe active cognitive nodes in the hierarchy are generated fromtheir corresponding cognitive nodes with each node’s initialbelief state, initial policy, and the empty set of actions.

3.4 Cognitive Process ModelSo far we have formalised an abstraction of an active cogni-tive hierarchy, but it remains to show how this hierarchy canchange over time. For this we introduce the notion of a pro-cess model. Nodes in an active cognitive hierarchy can beviewed as independent processes operating at synchroniseddiscrete time steps. At each time step the belief state for eachnode is updated and actions taken. The intention of a processmodel is that it allows a cognitive robot to operate in a phys-ical environment. Therefore we now construct a model thatallows for sensing and actions up and down the cognitive hi-erarchy. The model operates in two phases; a sensing updatephase followed by an action selection phase.

In order to characterise the sensing update for an activecognitive hierarchy we first characterise the process updateof the beliefs of a single active cognitive node.Definition 5. Let X = (H,Q) be an active cognitive hi-erarchy with H = (N , N0, F ). The sensing process up-date of X with respect to an active cognitive node Q

i

=(N

i

, si

,⇡i

, ai

) 2 Q, written as SensingUpdate0(M,Qi

) is:

• X , if there does not exist a node Nx

such thath�

i,x

, x,i

i 2 F ,

• an active cognitive hierarchy X 0 = (H,Q0) whereQ0 = Q\{Q

i

}[ {Q0i

} and Q0i

= (Ni

, ⌧i

(O, si

),⇡i

, ai

)s.t:

O =S{�

x,i

(sx

) | h�x,i

, i,x

i 2 F whereQ

x

= (Nx

, sx

,⇡x

, ax

) 2 Q}While somewhat complicated in structure, Definition 5 is

intuitively simple. An active cognitive hierarchy is updatedwith respect to a particular node by gathering the sensing ob-servations for all connected nodes that are lower in the hierar-chy and updating the node’s belief state through its observa-tion update function. By updating each node the belief stateof the system as a whole is updated.Definition 6. Let X = (H,Q) be an active cognitive hier-archy with H = (N , N0, F ) and � be the sensing graphinduced by the sensing functions in F . The sensing processupdate of X , written SensingUpdate(X ), is an active cogni-tive hierarchy:

X 0 = SensingUpdate0(. . . SensingUpdate0(X , Q0), . . . Qn

)

where the sequence [Q0, . . . , Qn

] consists of all active cog-nitive nodes of the set Q such that the sequence satisfies thepartial ordering induced by the sensing graph �.

Sensing update (Definition 6) successively updates the hi-erarchy ensuring that a node is only updated once the nodeson which its sensing depends are first updated. The order isintuitive but also guarantees a well-defined update process.Lemma 1. For any cognitive model X the sensing processupdate of X is well-defined.

Proof Sketch: Can be proved by induction on the update se-quence. Every valid sensing update sequence satisfies thesensing graph’s partial ordering, so a node’s belief state is up-dated only after lower level nodes are updated. Hence, eachnode’s update is fully determined.

Defining action update follows the same pattern as sensingupdate by first characterising update for a single node.Definition 7. Let X = (H,Q) be an active cognitive hi-erarchy with H = (N , N0, F ). The action process up-date of X with respect to an active cognitive node Q

i

=(N

i

, si

,⇡i

, ai

) 2 Q, written as ActionUpdate0(M,Qi

) isan active cognitive hierarchy X 0 = (H,Q0) where Q0 =Q\{Q

i

} [ {Q0i

} and Q0i

= (Ni

, �i

(a0i

, si

),⇡0i

, a0i

) s.t:

• if there does not exist node Nx

such that h�x,i

, i,x

i 2F then: ⇡0

i

= ⇡i

and a0i

= ⇡i

(si

),

• else:⇡0i

= �i

(T ) and a0i

= ⇡0i

(si

),T =

S{ x,i

(ax

) | h�i,x

, x,i

i 2 F whereQ

x

= (Nx

, sx

,⇡x

, ax

) 2 Q}Definition 7 is slightly more involved than Definition 5.

Firstly, the actions of the nodes higher in the hierarchy areused to generate the task parameters of the current node.From this set a new active policy is selected, which in turnis used to generate the current actions. Finally, these actionsare used to update the node’s belief state. From here, the def-inition of action update proceeds in the expected manner.Definition 8. Let X = (H,Q) be an active cognitive hierar-chy with H = (N , N0, F ) and be the action graph inducedby the task parameter functions in F . The action process up-date of X , written ActionUpdate(X ), is an active cognitivemodel:

X 0 = ActionUpdate0(. . .ActionUpdate0(X , Qn

), . . . Q0)

where the sequence [Qn

, . . . , Q0] consists of all active cog-nitive nodes of the set Q such that the sequence satisfies thepartial ordering induced by the action graph .

It is worth noting that action update (Definition 8) is almostidentical to sensing update except that the update takes placein the opposite order. Naturally, it is also well-defined.Lemma 2. For any active cognitive hierarchy X the actionprocess update of X is well-defined.

Proof. Identical pattern to proof for Lemma 1.

Finally, we can combine the sensing and action process up-dates for a model into a general process update for the model.

2489

Definition 9. Let X = (H,Q) be an active cognitive hierar-chy with H = (N , N0, F ). The process update of X , writtenUpdate(X ), is an active cognitive hierarchy:

X 0 = ActionUpdate(SensingUpdate(X ))

Theorem 3. The cognitive process model Update is well-defined.

Proof. Follows directly from Lemmas 1 and 2.

The cognitive process model performs what one would in-tuitively require of an operational cognitive system; updatingnodes up the hierarchy and propagating actions back downthe hierarchy, resulting in changes to the physical actuatorsof a robot. Furthermore the fact that it is well-defined guar-antees that there is a single unique update to the hierarchy.

4 Baxter and Blocks-WorldWe instantiate our architectural framework with an off-the-shelf Baxter robot tasked to build towers of uniquely labelledblocks. Baxter’s two arms can be operated concurrently andhave an overlapping work-space. Each arm end-effector isequipped with a two-pronged gripper and a camera that pointdown at the table as shown in Figure 1. There are three ta-bles. Each arm is able to reach its own table (designatedleft or right) and a shared table. Baxter’s sensor informationincludes the arm position and arm effort measurements, 2Dcamera images, and gripper states. Effector actions move thearms to various positions and open or close the gripper.

The framework is instantiated in ROS using five nodes: aController node for each arm to sense the world and con-trol arm movement, a Spatial node to model the blocks-worldscene and quantitatively specify arm pickup and putdowngoals, and a symbolic TR node to build multiple block towers.Baxter is taken to be the external world node.

Arm Controller NodesThe world model belief state for each arm includesthe position, visual persistence and identity of blockscurrently in view, the position of the arm, arm effort,and whether the arm is gripping a block. Specifi-cally the language for each arm node, L

Controller

={(blockId , (x, y, z, p)), arm(x, y, z), gripper , e↵ort},where: blockId 2 {A,B,C, L, T,O}; (x, y, z, p) representsthe position in metres of the centroid of the block in metresin Baxter’s torso coordinate frame; and p 2 {0, 1, 2, 3, 4}is the visual persistence that is incremented when a blockis identified in an image frame and decremented whennot; arm(x, y, z) is the position of the arm end-effector inmetres, also in the torso frame; gripper 2 {True, False}indicating whether the grippers are holding a block; ande↵ort measures the push-force of the arm.

The camera is used to locate and identify each block inthe visual field. We employ OpenCV to find squares repre-senting blocks and to identify block symbols. The size andlocation of a square in the image yields its 3D position rela-tive to the camera. Given fixed camera offsets from each armposition we can determine the block position in Baxter’s ref-erence frame. Arm position is determined by an action update

based on the last commanded pose of the arm. The observa-tion update from the external environment updates the visualpersistence to ensures that any block in view is not sensitive tomomentary occlusions, mis-identification, passing shadows,etc. Only blocks with a visual persistence of 4 are passed tothe Spatial node in a ROS message.

Pickup/Putdown

posi4ons(x,y,z)

GazeboModel:

Arm/Cameras,

Blocks,Tables

Quan4fyTR

act,findspace

ontablesPhysics

simula4on

Object

State

Le#andrightArmControllerNodes

BlockandCamera

posi4ons(x,y,z),Gripperstate

ArmPose,Gripper

Open/Close

Environment:Blocks,Tables,BaxterincludingSensors,Effectors

RGBimages,

Grippers,ArmSensors

Blocksinview

Camera/Arm

Pose,Gripper

Movearms&

Open/Close

Gripper

Arm

Pose

NextArm

Pose

Pickup/Putdown

predicates

Logicalstate

descrip4on

Teleo-Reac4ve

programto

buildTowers

OnandHoldingPredicates

Spa6alNode

TRNode

Servo

Control

Figure 3: Cognitive graph for Baxter in blocks-world.

The node behaviour-generation consists of picking up andputting down blocks at specific positions determined by theSpatial node. We rely on Baxter’s inverse kinematic functionto move arms to commanded poses. Baxter’s sensors and ef-fectors are not accurate enough to pick up or put down blocksby dead-reckoning. Instead, arm controller nodes use a visualfeedback control loop through the environment, adjusting thearm pose effector to target a block in the camera image. Inthis way grippers visually servo into the correct position forgrabbing a block. When putting blocks down, arm e↵ort isused in a feedback control loop to determine when the blockhas made contact with the table or another block.

Spatial NodeThe spatial node employs the Gazebo physics simulator torepresent the blocks and tables as 3D objects in Baxter’s torsocoordinate frame. The simulator is used to predict the nextbelief state based on its implicit knowledge of physics. Forexample, gravity will ensure that blocks are stacked correctly,even if the gripper releases the block slightly before it makescontact with the block or table below.

The belief state of the spatial node is essentially the scene-graph and includes the identity and quantitative position ofeach block, the quantitative position of the arm cameras, andthe state of the grippers. Each camera position belief stateis updated from the belief state of arm positions communi-cated from the arm controller nodes. The sensing update ofthe identity and position of blocks is predicated on whetherthe blocks are expected to be seen by each camera. For ex-ample, if a block is placed on a table (or removed) in view ofthe camera, the block will be created (or deleted) in the sim-ulator, subject to a high visual persistence. Blocks not in thecamera field-of-view will not be removed from the simulator,but assumed to have object permanence. A block that is in the

2490

field-of-view of a camera, but occluded by another block inthe tower will not be removed. In this way the spatial simu-lator sensing update operator maintains the belief state of theblocks scene by integrating and reasoning about the sensoryinformation from each of the two arms.

Policies move each arm to pickup or putdown blocks at po-sitions specified quantitatively as (x, y, z) locations in Bax-ter’s coordinate frame. After each such action the policymoves the arm to its respective home position over its ta-ble to help refresh the model. The task parameter functionmaps higher level symbolic TR move commands to quantita-tive values that select appropriate quantitative move policies.For example, putdown(right, shared table) selects the policyactioning the right arm to place the held block to a clear po-sition on the share table in the simulator, while pickup(left,A) selects a policy that results in an action for the left arm topickup block A from its position in the simulator.

The 3D quantitative representation of the physics simulatorcan perceptually anchor, relate and remember sensed objects,and provide a basis for abstracting symbolic representations.

Symbolic TR NodeThe TR node receives sensed observations of the spatialnode’s belief state in terms of an abstracted set of logical factssuch as on(A, left table) and holding(left, B). These are thenused as part of the TR node’s symbolic belief state.

A multi-tasking teleo-reactive programming language[Clark and Robinson, 2015] is used as the node’s only pol-icy to concurrently build user-specified towers of blocks.The language comprise sequences of Guard ! Action rulesgrouped into parameterised procedures. The Guards are de-ductive queries to a rapidly changing belief state that is up-dated through sensing. For multi-tasking, the granularity ofinterleaving robotic resources is specified by declaring certainprocedures as task atomic. A queuing convention for resourceacquisition prevents starvation, while tasks co-ordinate theiruse of resources using the agent’s belief state.

The multi-tasking features in our example use a single taskatomic procedure – to build a tower of blocks. It can be usedby each task of an agent building any number of block tow-ers using two robot arms in parallel. Both arms may need tobe used by each task at different times since the blocks aredistributed over three tables and each arm can only reach twotables. Clashing of arms over the shared table is primarilyavoided by making the tables extra resources.

The node sends symbolic commands such aspickup(left, A) and putdown(right, shared table) tothe Spatial node, which are interpreted quantitatively in theSpatial node’s language.

ResultsWe conducted several experiments tasking Baxter to buildtowers of blocks starting from different initial configurations.Our aim was to test the robustness and limits of the instanti-ated architecture at each level of the abstraction hierarchy.

Persistence of vision keeps the world from going pitchblack every time we blink our eyes. In the Controller nodes,the persistence parameter in the belief state plays a similarrole. Blocks do not disappear in the belief state when a handis moved rapidly between the camera and a block, or when a

human manually places one block on top of another, momen-tarily blurring the robot’s view. Letting the camera dwell on aposition for a longer duration than (an approximate) blink ofan eye, results in the creation or destruction of blocks in thebelief-state of the Controller node.

Object permanence is the understanding that objects con-tinue to exist even when they cannot be observed. TheGazebo simulator in the Spatial node shows object perma-nence even when objects are outside the visual field of eithercamera, or occluded by other objects. Figure 1 shows block“B” on the right table in the simulator, even though it is oc-cluded by block “A”. Block “C” on the shared table is notseen by either camera, yet retained in the simulator.

We found that blocks may not be recognised in poor light-ing conditions and that grasping may fail if blocks are notoriented orthogonally to the reference frame as assumed. Thephysics model will incorrectly model block behaviour if sup-porting blocks have not been previously seen, or are mistak-enly seen elsewhere. The TR program fails if unknown blocksymbols are introduced.

The TR node is demonstrated to handle unforeseen contin-gencies. A video2 of Baxter building the tower “LAB” on theright table and “COT” on the left table, shows the TR programcoping with both help and hindrance from a human.

5 ConclusionIn this paper we developed a formal framework for the inte-gration of symbolic and sub-symbolic components in a cog-nitive hierarchy. Formalised at an abstract level the frame-work provides both a model for the construction of cognitiverobotic systems, as well as a tool for the analysis of existingarchitectures (e.g., SOAR), for example, in how they inte-grate different reasoning components. Consequently, the for-malism complements, rather than competes with, these exist-ing architectures and systems. Finally, due to its well-definedproperties, the framework is an important step towards the de-velopment of provable safety guarantees in robotic systems.This is especially important for implementing robots that op-erate in unstructured real-world environments.

A real-world instantiation of the framework was outlined,consisting of a multi-node Baxter robot capable of buildingtowers of blocks. This implementation was evaluated em-pirically to determine the robustness and limitations of theimplementation at each level of the hierarchy.

Our framework lays the foundation for spatially modellingother physical agents, including humans, to investigate multi-agent systems in adversarial and collaborative settings. Ourinterests include general game playing, machine learningand autonomous adaptation at both the symbolic and sub-symbolic levels, and trust studies between humans and ma-chines in mixed-task initiatives. The inclusion of a physicssimulator provides opportunities for the agent to predict out-comes and explore courses of action without committing tothem in the real world. Human modelling also opens the wayto study communication via gesturing, and theory-of-mind.

2http://robocup.web.cse.unsw.edu.au/baxter/20160127-LABCOT-HIx4.mp4

2491

AcknowledgmentsWe thank the anonymous reviewers of an earlier versionfor their helpful comments and suggestions that led to amore concise formalisation of our model. This material isbased upon work supported by the Asian Office of AerospaceResearch and Development (AOARD) under Award No:FA2386-15-1-0005. This research was also supported un-der Australian Research Council’s (ARC) Discovery Projectsfunding scheme (project number DP 150103035). MichaelThielscher is also affiliated with the University of WesternSydney.

DisclaimerAny opinions, findings, and conclusions or recommendationsexpressed in this publication are those of the authors and donot necessarily reflect the views of the AOARD.

References[Albus and Meystel, 2001] James S. Albus and Alexan-

der M. Meystel. Engineering of Mind: An Introductionto the Science of Intelligent Systems. Wiley-Interscience,2001.

[Anderson, 1993] John R. Anderson. Rules of the Mind.Lawrence Erlbaum Associates Inc, New Jersey, July 1993.

[Baars, 1988] Bernard J. Baars. A Cognitive Theory of Con-sciousness. Cambridge University Press, 1988.

[Beetz et al., 2010] Michael Beetz, Lorenz Mosenlechner,and Moritz Tenorth. CRAM – a cognitive robot abstractmachine for everyday manipulation in human environ-ments. In IEEE/RSJ International Conference on Intel-ligent Robots and Systems, pages 1012–1017, Taipei, Tai-wan, October 2010.

[Brooks, 1986] Rodney A. Brooks. A robust layered controlsystem for a mobile robot. Robotics and Automation, IEEEJournal of, 2(1):14–23, Mar 1986.

[Brooks, 1990] Rodney A. Brooks. Elephants don’t playchess. Robotics and Autonomous Systems 6, pages 3–15,1990.

[Clark and Robinson, 2015] Keith Clark and Peter Robin-son. Robotic agent programming in TeleoR. IEEE Inter-national Conference on Robotics and Automation (ICRA),pages 5040–5047, May 2015.

[Dietterich, 2000] Thomas G. Dietterich. Hierarchical re-inforcement learning with the MAXQ value function de-composition. Journal of Artificial Intelligence Research(JAIR), 13:227–303, 2000.

[Haber and Sammut, 2012] Adam Haber and Claude Sam-mut. A cognitive architecture for autonomous robots. Ad-vances in Cognitive Systems, 2:257–275, December 2012.

[Laird et al., 1987] John E. Laird, Allen Newell, and Paul S.Rosenbloom. SOAR: An architecture for general intel-ligence. Artificial Intelligence, 33(1):1–64, September1987.

[Langley and Choi, 2006] Pat Langley and Dongkyu Choi.A unified cognitive architecture for physical agents. InProceedings of the 21st National Conference on ArtificialIntelligence (AAAI), pages 1469–1474. AAAI Press, 2006.

[Levesque et al., 1997] Hector J. Levesque, Raymond Re-iter, Yves. Lesperance, Fangzhen. Lin, and Richard B.Scherl. GOLOG: A logic programming language for dy-namic domains. Journal of Logic Programming, 31:59–84, 1997.

[Minsky, 1986] Marvin Minsky. The Society of Mind. Simon& Schuster, Inc., New York, NY, USA, 1986.

[Newell and Simon, 1976] Allen Newell and Herbert A. Si-mon. Computer science as empirical inquiry: Symbolsand search. Communications of the ACM, 19(3):113–126,March 1976.

[Nilsson, 1994] Nils J. Nilsson. Teleo-reactive programs foragent control. Journal of Artificial Intelligence Research(JAIR), 1:139–158, 1994.

[Nilsson, 2001] Nils J. Nilsson. Teleo-reactive programs andthe triple-tower architecture. Electronic Transactions onArtificial Intelligence, 5:99–110, 2001.

[Nilsson, 2006] Nils J. Nilsson. The physical symbol sys-tem hypothesis: Status and prospects. In Max Lungarella,Fumiya Iida, Josh C. Bongard, and Rolf Pfeifer, editors,50 Years of Artificial Intelligence, Essays Dedicated to the50th Anniversary of Artificial Intelligence, volume 4850 ofLecture Notes in Computer Science, pages 9–17. Springer,2006.

[Ryan and Pendrith, 1998] Malcolm R. K. Ryan andMark D. Pendrith. RL-TOPS: An architecture for modu-larity and re-use in reinforcement learning. In Proceedingsof the Fifteenth International Conference on MachineLearning (ICML), pages 481–487, San Francisco, CA,USA, 1998. Morgan Kaufmann Publishers Inc.

[Thrun et al., 2005] Sebastian Thrun, Wolfram Burgard, andDieter Fox. Probabilistic Robotics. MIT Press, 2005.

2492

Documents

A Framework for Integrating Symbolic and Sub-symbolic ... · terich, 2000]; CRAM [Beetz et al., 2010], a cognitive robot abstract machine; and MALA [Haber and Sammut, 2012],a multi-agent