6
How humans optimize their interaction with the environment: The impact of action context on human perception Agnieszka Wykowska 1 , Alexis Maldonado 2 , Michael Beetz 2 , and Anna Schub¨ o 1 1 Department of Experimental Psychology, Ludwig Maximilians Universit¨ at, M¨ unchen, Germany 2 Computer Science Department, Chair IX, Technische Universit¨ at, M¨ unchen, Germany Abstract. Humans have developed various mechanisms to optimize interaction with the envi- ronment. Optimization of action planning requires efficient selection of action-relevant features. Selection might also depend on the environmental context in which an action takes place. The present study investigated how action context influences perceptual processing in action planning. The experimental paradigm comprised two independent tasks: (1) a perceptual visual search task and (2) a grasping or a pointing movement. Reaction times in the visual search task were measured as a function of the movement type (grasping vs. pointing) and context complexity (context varying along one dimension vs. context varying along two dimensions). Results showed that action context influenced reaction times, which suggests a close bidirectional link between action and perception as well as an impact of environmental action context on perceptual selection in the course of action planning. Such findings are discussed in the context of application for robotics. Key words: Action Context, Action Planning, Human Perception 1 Introduction When humans perform a particular action such as, for example, reaching for a cup in a cupboard, they need not only to specify movement parameters (e.g., the correct width of the grip aperture) but also to select movement-related information from the per- ceptual environment (e.g., the size and orientation of the cups handle - but not its color - are rele- vant for grasping). Moreover, the context in which the grasping action is performed may also have an impact on both the action performance and the pre- vailing selection processes of the agent. If the cup is placed among other cups of different sizes and han- dle orientations, selection might be more difficult as compared to when the cup would be placed among plates. In the first case, the context varies along at least two dimensions that are relevant for grasping a cup (size and orientation of handles). In the sec- ond case, the cup is embedded in a homogeneous context also consisting of dimensions irrelevant for grasping a cup (breadth of plates). Therefore, the two environmental contexts might result in different processing speed of the environmental characteris- tics. Several authors have investigated the influence of intended actions on perceptual processes. For ex- ample, Craighero and colleagues [2] showed that when agents were asked to grasp a tilted bar, on- set latencies of their movement depended on the characteristics of a visually presented stimulus that signaled when the movement should begin (a so- called ”go signal” ). If the ”go signal” shared action- related features with the to-be grasped object (e.g., was of the same orientation), the onset of the move- ment occurred faster as compared to the condition when the ”go signal” differed from the to-be grasped object. These results support a close link between perception and action. Similar results were reported by Tucker and El- lis [11]. The authors conducted a study in which participants were presented with natural objects (e.g., grape, cucumber) or manufactured objects (e.g., screw, hammer). The objects could be smaller (grape, screw) or larger (cucumber, hammer) im- plying either precision grip (small objects) or power grip (larger objects). The task was to categorize the objects as natural or manufactured. Half of the par- ticipants had to respond with a power grip to natu- ral objects and precision grip to manufactured ob- jects; the other half had opposite instructions. The results showed that although size of the objects was completely orthogonal to the categorization task, it influenced performance: Precision grips were faster to small objects relative to large objects and power grips were faster to large objects compared to small objects. This suggests that size was implicitly pro- cessed as an action-related feature of an object and, as such, had an impact on behavior. More recent studies demonstrated that visual detection processes are highly dependent on in- tended action types [1] and that the percep- tual system can bias action-relevant dimensions if they are congruent with the performed action [3].

How humans optimize their interaction with the environment: The

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

How humans optimize their interaction with the environment:

The impact of action context on human perception

Agnieszka Wykowska1, Alexis Maldonado2, Michael Beetz2, and Anna Schubo1

1 Department of Experimental Psychology, Ludwig Maximilians Universitat, Munchen, Germany2 Computer Science Department, Chair IX, Technische Universitat, Munchen, Germany

Abstract. Humans have developed various mechanisms to optimize interaction with the envi-ronment. Optimization of action planning requires efficient selection of action-relevant features.Selection might also depend on the environmental context in which an action takes place. Thepresent study investigated how action context influences perceptual processing in action planning.The experimental paradigm comprised two independent tasks: (1) a perceptual visual search taskand (2) a grasping or a pointing movement. Reaction times in the visual search task were measuredas a function of the movement type (grasping vs. pointing) and context complexity (context varyingalong one dimension vs. context varying along two dimensions). Results showed that action contextinfluenced reaction times, which suggests a close bidirectional link between action and perceptionas well as an impact of environmental action context on perceptual selection in the course of actionplanning. Such findings are discussed in the context of application for robotics.

Key words: Action Context, Action Planning, Human Perception

1 Introduction

When humans perform a particular action such as,for example, reaching for a cup in a cupboard, theyneed not only to specify movement parameters (e.g.,the correct width of the grip aperture) but also toselect movement-related information from the per-ceptual environment (e.g., the size and orientationof the cups handle - but not its color - are rele-vant for grasping). Moreover, the context in whichthe grasping action is performed may also have animpact on both the action performance and the pre-vailing selection processes of the agent. If the cup isplaced among other cups of different sizes and han-dle orientations, selection might be more difficult ascompared to when the cup would be placed amongplates. In the first case, the context varies along atleast two dimensions that are relevant for graspinga cup (size and orientation of handles). In the sec-ond case, the cup is embedded in a homogeneouscontext also consisting of dimensions irrelevant forgrasping a cup (breadth of plates). Therefore, thetwo environmental contexts might result in differentprocessing speed of the environmental characteris-tics.

Several authors have investigated the influenceof intended actions on perceptual processes. For ex-ample, Craighero and colleagues [2] showed thatwhen agents were asked to grasp a tilted bar, on-set latencies of their movement depended on thecharacteristics of a visually presented stimulus thatsignaled when the movement should begin (a so-

called ”go signal” ). If the ”go signal” shared action-related features with the to-be grasped object (e.g.,was of the same orientation), the onset of the move-ment occurred faster as compared to the conditionwhen the ”go signal” differed from the to-be graspedobject. These results support a close link betweenperception and action.

Similar results were reported by Tucker and El-lis [11]. The authors conducted a study in whichparticipants were presented with natural objects(e.g., grape, cucumber) or manufactured objects(e.g., screw, hammer). The objects could be smaller(grape, screw) or larger (cucumber, hammer) im-plying either precision grip (small objects) or powergrip (larger objects). The task was to categorize theobjects as natural or manufactured. Half of the par-ticipants had to respond with a power grip to natu-ral objects and precision grip to manufactured ob-jects; the other half had opposite instructions. Theresults showed that although size of the objects wascompletely orthogonal to the categorization task, itinfluenced performance: Precision grips were fasterto small objects relative to large objects and powergrips were faster to large objects compared to smallobjects. This suggests that size was implicitly pro-cessed as an action-related feature of an object and,as such, had an impact on behavior.

More recent studies demonstrated that visualdetection processes are highly dependent on in-tended action types [1] and that the percep-tual system can bias action-relevant dimensions ifthey are congruent with the performed action [3].

2 Action context influences human perception

Wykowska, Schubo and Hommel [14] conducteda series of experiments in which they observedaction-related biases of visual perception alreadyat early stages of processing. Importantly, thesestudies showed action-perception links in a situa-tion where action and perception were completelyunrelated and decoupled but had to be performedin close temporal order. Participants’ task was todetect a visually presented target while they werepreparing for a grasping or pointing movement. Thetarget could be an object that deviated in size orin luminance from the other objects. Wykowskaand colleagues found that performance in the visualsearch task was influenced by the intended move-ment although the movement was not executed butonly planned. That is, detection of perceptual di-mension (size or luminance) was better when ac-companied by the preparation of a congruent move-ment (e.g., grasping for size and pointing for lumi-nance) as compared to the preparation of an incon-gruent movement. These results indicate a close linkbetween action and perception that merely coincidein time. Moreover, action preparation affected per-ceptual processing at the level of early mental rep-resentations.These observations support the Theory of EventCoding [5] which postulates a common representa-tional code for perception and action. We claim thatsuch a common code implies bidirectional links thatare used in such a way that action can have an im-pact on perception no less than perception can havean impact on action, for example, when the missingperceptual features of the cup are being specifiedfor the grasping movement. It is obvious that ac-tion planning is informed and controlled by percep-tual processes that provide information about theobjects we act upon and the changes of our environ-ment that we intend to achieve. Perhaps less obvi-ous is the need for perceptual processes to be con-strained and guided by action planning. Successfulacting requires the focusing and selection of action-relevant information, so to adapt the action to theenvironmental conditions. That is, setting up a par-ticular action plan may increase the weights for per-ceptual dimensions that are relevant for that action- intentional weighting in the sense of [5]. The aimof the present study was to investigate not only theimpact of action planning on perceptual processesbut also the impact of the context in which the ac-tion occurs. The action context might influence notonly behavior and performance of an agent [12] butalso the agents perceptual stages of processing.

2 Experimental Paradigm

A paradigm was used in which participants wereasked to perform a perceptual task (an efficient vi-sual search task) in close temporal order to a motortask. The visual search task consisted in detectingthe presence of either a size or a luminance tar-get (in separate blocks of trials) embedded in aset of other, distracting items. The motor task con-sisted in grasping or pointing to a specific object onthe Movement Execution Device. The motor taskshould be planned before the visual search task butexecuted only after the search task was completed.In other words, while participants were perform-ing the visual search task, the grasping or point-ing movement was prepared but not yet executed.The action context consisted of objects of varioussizes and surface luminance that were mounted ona movement execution device (MED), see Figure 2.The action context could either be simple in whichthe objects varied along only one dimension (lumi-nance) or complex in which the objects varied alongtwo different dimensions: size and luminance.

2.1 Participants

Nineteen paid volunteers (16 women, mean age:23.3) took part. Participants were asked to performa practice session on the day preceding the exper-iment. The practice session consisted of the motortask only, without the visual search task.

2.2 Stimuli and apparatus

The search display contained 28 grey circles posi-tioned on three circular arrays. The target couldappear on one of four positions on the middle cir-cular array at the upper left/right or lower left/rightfrom the middle point. The target was defined ei-ther by luminance (lighter grey, cf. Figure 1, left)or by size (smaller circle, cf. Figure 1, right).

Figure 1. Examples of visual search displays that contained aluminance target (left) or a size target (right)

Below the computer screen, the movement ex-ecution device (MED) was positioned. It consistedof a box containing eight holes positioned also on acircular array. Round plastic items that could vary

Action context influences human perception 3

in luminance and size each covering a LED couldbe attached and detached from the box (cf. Figure2). For the purpose of this experiment, we used thefollowing combination of objects:

1. The simple action context (luminance dimen-sion only) consisted of four grey, medium-sizedobjects and four objects that differed in lumi-nance: two being darker and two being lighterthan the standard elements (cf. Figure 2a).

2. The complex action context (luminance andsize dimensions) consisted of four grey, medium-sized items and four grey items that differedin size, two being smaller and two being largerthan the standard elements (cf. Figure 2b).

Figure 2. Movement Execution Device (MED). Simple actioncontext (A) consisting in objects that vary along the luminance

dimension (LED lighting up and the objects on MED are alldefined by luminance) and complex action context (B)

consisting in objects that vary along the luminance dimension(LED lighting up) and size dimension (items on MED).

2.3 Procedure

Participants were seated at 80 cm distance from thecomputer screen on which the visual search task waspresented. The MED was positioned below the com-puter screen, in front of the participants (cf. Figure3).

Figure 3. Experimental setup. Participants were asked toperform the movement task (grasping or pointing) on the MEDwith their left hand. The movement ended when participants

returned with their hand to the starting position (space bar onthe keyboard). The search task was to be performed with theright hand on the computer mouse (target present: left mouse

key; target absent: right).

Each experimental trial began with a fixationcross displayed for 500 ms. Subsequently a move-ment cue (cf. Figure 4) appeared for 1000 ms. Par-ticipants were instructed to prepare for the move-ment but not execute it until a signal from the MEDwould appear. The cue was followed by a blank dis-play (500 ms) and, subsequently, by the search dis-play presented for 100 ms. A blank screen followedthe search display and remained on the computerscreen until participants responded to the searchtask. Upon response to the search task, one of theLEDs on the MED lit up for 300 ms. This eventconstituted the signal for participants to executethe prepared movement, i.e., to either point to orgrasp the object that lit up. After the experimenterregistered the participants movement with a mousekey, a new trial began (cf. Figure 4).

Movement execution in different contexts

Light on“ on MEDuntil

response

Blank screen

Light „on“ on MED

300

response

Blank screen,target detection task

til

300 ms

Search display untilresponse

100 msMovement cue

1000 msFixation

500 ms

Figure 4. A trial sequence. Each box represents an event of atrial (e.g., presentation of a fixation cross, of a movement cue).Presentation time for each of the event is shown on the right

side of the figure.

Participants were asked to be as fast and as ac-curate as possible in the search task. Speed wasnot stressed in the movement task. The experimentconsisted of one practice block and 8 experimentalblocks with 112 trials each. The two target types(size and luminance) were presented in separateblocks and also the configuration of objects on theMED was changed block-wise. The order of blockswas balanced across participants. The movementtask was randomized within each block of trials.

2.4 Data Analysis

The analysis focused on the impact of context com-plexity on perceptual processing. Mean reactiontimes (RTs) and error rates in the search task weresubmitted to statistical analyses. Trials with incor-rect movement execution and incorrect search trialswere excluded from the analysis. From the remain-ing data, individual mean RTs in the search taskwere submitted to analysis of variance (ANOVA)with: target presence (target present vs. absent),

4 Action context influences human perception

target type (luminance vs. size), movement type(point vs. grasp), and action context (one vs. twodimensions) as within-subject factors.

If the environmental action context of a move-ment has an impact on perceptual processing ofaction-relevant characteristics, then the simplercontext (varying along one dimension) should yieldbetter performance in the search task as comparedto the context consisting of two-dimensions.

3 Results

The statistical analysis showed that action contexthad a significant effect on search performance, F (1,18) = 5.5, p <.05 indicating that the complex actioncontext elicited slightly longer RTs in the searchtask (M = 398 ms, SEM = 15) as compared to thesimple context (M = 392 ms, SEM = 16). More-over, this effect was more pronounced for the grasp-ing movement (simple context: M = 390 ms, SEM

= 15; complex context: M = 398 ms, SEM = 15) ascompared to the pointing movement (simple con-text: M = 395 ms, SEM = 15; complex context:M = 398 ms, SEM = 16), as indicated by the sig-nificant interaction of movement type and actioncontext, F (1, 18) = 5, p <.05 (cf. Figure 5).

420Simple context

400

RTs

(m

s)

Complex context

380Mea

n

360

Point Grasp

Figure 5. Mean reaction times in the search task as afunction of movement type and action context. Error bars

represent standard errors of the mean (SEM).

Analogous analysis for error rates in the searchtask showed no significant effect, all p >.5. However,the small absolute differences were in line with theeffects in RTs (cf. Table 3).

Moreover, the present results replicated previ-ous findings [5] by observing the influence of in-tended movements on perceptual dimension: inter-action of movement type and target type was signifi-cant, F (1, 18) = 5, p <.05 indicating again improve-ment in visual search performance for congruentmovements as compared to incongruent movements:Size detection was faster when a grasping movementwas prepared relative to the size-incongruent move-ment (pointing).

4 Discussion

The aim of the present study was to investigate theinfluence of action context on perceptual processesduring movement preparation. Results showed thataction context influences already early stages of pro-cessing, namely visual selection. Target detectionin the visual search task was faster when the con-text in which the movements were executed con-sisted of only one dimension (luminance) as com-pared to two dimensions (size and luminance). In-terestingly, the action context effect was more pro-nounced for the more complex movement (grasp-ing) as compared to the simpler pointing movement.Importantly, the design of the present experimentmade the movement execution (together with itscontext) completely decoupled from the perceptualtask. Therefore, the influences of action context onperformance cannot be attributed to late stages ofthe whole processing stream, i.e., as perceptual con-sequences of the movement execution. Rather, thepresent results show that action context influencesalready early perceptual processing. Such influencesuggests that the perceptual system makes use ofenvironmental hints when selecting action-relevantfeatures of the environment. It is important to notethat the present results were obtained in a ratherrestricted laboratory setting where all parameterswere strictly controlled so that other factors suchas prior knowledge or experience with the respec-tive action context could have no influence on theresults. This might have resulted in relatively smallabsolute differences in reaction times. These effectswould probably become larger once they were exam-ined in more natural settings, such as, for example,a real kitchen scenario.

The present findings can be interpreted in abroad, evolutionary context. Humans must havedeveloped mechanisms that optimize their inter-action with the environment. When planning ac-tions, humans select specifically action-relevant in-formation from environment. This might have beenobtained by means of developing a common rep-resentational code [5] that allows for fast and ef-ficient action-perception transition. Evidence forsuch codes comes from behavioral data [5] and fromneurophysiology [9]. Moreover, the discovery of mir-ror neurons, e.g., [7] provided strong support for thecoupling of action and perception. Although someauthors postulate that the fast ”how/where” path-way for action is separate from the slower ”what”pathway for perception and conscious processing[4], recent evidence shows much closer interactions[8]. A prominent example of evidence for such afast neuronal route linking perception with actionwas described by Humphreys and Riddoch [6]. The

Action context influences human perception 5

Mean error rates (%) and SEM

(in brackets)

Movement type

Action context Point Grasp

Simple context 2.85 (0.5) 2.92 (0.7) Complex context 2.86 (0.6) 2.96 (0.8)

Table 1. Mean error rates in the search task as a function of movement type (point or grasp) and action context.

authors examined a patient MP who suffered fromunilateral neglect. Following damage to the fronto-temporal-parietal regions of the right cerebral hemi-sphere, the patient was unable to direct his atten-tion to objects presented in his left visual field:a symptom typical for unilateral neglect patients.When presented with a simplified visual search task,he was impaired in finding objects defined by theirperceptual features when they were presented in hisneglected field. For example, when asked to detecta red object, he was unable to detect a red cup.Importantly, the patient was not impaired on colordiscrimination in general (that is, patient MP wasable to name the objects color when presented as asingle item - in a condition when neglect symptomsusually do not occur). Interestingly, when the tar-gets were defined by action-related features of theobjects (e.g., look for something to drink from), thepatients performance improved markedly also in theneglected field. . This indicates that there might bea fast and efficient link between action and percep-tion, a so-called pragmatic processing route that fa-cilitates extraction of action-relevant features of theenvironment.The present results suggest that the interactions be-tween perception and action might be even morecomplex. Our results showed that not only intendedactions play a role in perceptual selection processesbut also environmental contexts in which the ac-tions are performed. If the context varies alongseveral dimensions, perceptual selection is presum-ably more difficult. When humans are preparing amovement, they try to optimize their performancethrough efficient selection of perceptual informationthat might be relevant for that movement. This op-timization process might be especially importantfor complex movements as they are more difficult.Therefore, humans might benefit from the environ-mental setting (i.e., a simple action context) morein case of more difficult interaction (more complexmovement) as compared to a movement that is sim-ple enough not require such a level of optimization.

5 Implication for Robotics

As argued above, humans have developed certainmechanisms that allow for optimization of actioncontrol. When designing a robotic system, onemight take into account similar solutions especiallywhen dealing with fast real-time systems that pro-duce very rich and fast data streams.

Due to the limited computing power availableon current robots, it is advantageous to use the con-text information to choose which algorithms processthe sensory stream. For example, in the case of therobots that serve as a demonstration scenario in ourkitchen environment (cf. Figure 6), the high-levelcontrol system informs the camera-based perceptionsystem about the action that will be performed, andthe needed update speed. For the task of cleaningthe table, this activates a system that fits 3D CADmodels of cups and other typical objects in the im-age stream, which is very expensive computation-ally. After the objects are detected, the processingpower is devoted to the sensory information comingfrom the robotic arm and hand to correctly graspthe object. Both data processing systems can stillrun in parallel, with priority being given to the onethat is most important for the task at hand.

Our current research draws inspiration from thefindings described in this paper to design a systemfor grasping using the multi-finger DLR-HIT hand(cf. Fig. 6) that has paired control/perception mod-els. A set of grasping controllers was developed, onefor each type of basic grasp movement (e.g. powergrasp, precision pinch, and others). We learn an as-sociated perception model on the sensory streamfrom the fingers (joint position and torque), andtrain it using examples of correct grasps and incor-rect ones. When executing a grasp, only the associ-ated model is fed with the sensory stream comingfrom the hand, and the system can monitor the suc-cess of the action. This design also complies with theideas of Wolpert and Kawato of multiple intercon-nected forward and inverse models [13].

6 Action context influences human perception

Figure 6. Two robots for manipulation in humanenvironments. Left: a B21r robot with Powercube robotic

Arms. Right: Kuka LWR arm with a DLR-HIT hand. Bothperform pick and place tasks in a kitchen at the Intelligent

Autonomous Systems Laboratory, TUM. The high amount ofsensory data and limited computing power on board the robots

make it necessary to apply only the specific algorithms thatdeliver information needed for the current action.

Finally, it is important to note that success-ful human-robot interaction requires good tempo-ral synchronization, especially when joint attentionmight prove beneficial. Imagine the following: arobot assists a human in the kitchen. The humansattention is captured by a falling cup behind therobots back. In order to rescue the cup, not onlymust the robot make use of human-specific atten-tion directing hints (such as, for example, gaze di-rection) [10] but also its processing system needsto be temporally synchronized (within the millisec-onds range) with the humans system. Moreover,in order optimize the movement the robot needsto select the most relevant information from theenvironment for efficient fast processing and ac-tion programming. Therefore, implementing mech-anisms analogous to those governing human per-ception and action planning into artificial systemsmight prove indispensable for human-robot interac-tion.

6 Conclusions

The present experiment showed that not only ac-tion planning but also the environmental context inwhich an action takes place affects perceptual pro-cessing. The present results have been interpreted ina broad evolutionary context postulating that hu-mans have developed means of optimizing actionplanning by efficient selection of relevant informa-tion from the environment. Selection depends onthe complexity of the task at hand and the con-gruency between the intended action and the givenenvironmental setting. Artificial systems designedto interact with humans need to take such interac-tions and constraints into account to allow for tem-poral compatibility and fluency between human androbot movements in joint action tasks.

Acknowledgements

This study was supported by the DeutscheForschungsgemeinschaft (Excellence Cluster Cogni-

tion for Technical Systems (CoTeSys), Project #301).

References

1. H. Bekkering and S. F. W. Neggers. Visual search ismodulated by action intentions. Psychological Sci-

ence, 13:370–374, 2002.

2. L. Craighero, L. Fadiga, G. Rizzolatti, and C. A.Umilta. Action for perception: a motor-visual atten-tional effect. Journal of Experimental Psychology:

Human Perception and Performance, 25:1673–1692,1999.

3. S. Fagioli, B. Hommel, and R. I. Schubotz. Inten-tional control of attention: Action planning primesaction related stimulus dimensions. Psychological

Research, 71:22–29, 2007.

4. M. A. Goodale and A. Milner. Separate visual path-ways for perception and action. Trends in Neuro-

sciences, 15:20–25, 1992.

5. B. Hommel, J. Musseler, G. Aschersleben, andW. Prinz. The theory of event coding (tec): Aframework for perception and action planning. Be-

havioral and Brain Sciences, 24:849–937, 2001.

6. G. W. Humphreys and M. J. Riddoch. Detectionby action: neuropsychological evidence for action-defined templates in search. Nature Neuroscience,4:84–89, 2001.

7. G. Rizzolatti and L. Craighero. The mirror-neuronsystem. Annual Review of Neuroscience, 27:169–192, 2004.

8. Y. Rossetti, L. Pisella, and A. Vighetto. Opticataxia revisited: visually guided action versus im-mediate visuomotor control. Experimental Brain

Research, 153:171–179, 2003.

9. R. I. Schubotz and D. Y. von Cramon. Predict-ing perceptual events activates corresponding mo-tor schemes in lateral premotor cortex: An fmristudy. Neuroimage, 15:787–796, 2002.

10. N. Sebanz, H. Bekkering, and G. Knoblich. Jointaction: bodies and minds moving together. Trends

in Cognitive Sciences, 10:70–76, 2006.

11. M. Tucker and R. Ellis. The potentiation of grasptypes during visual object categorization. Visual

Cognition, 8:769–800, 2001.

12. R. M. Turner. The context-mediated behavior forintelligent agents. International Journal of Human-

Computer Studies, 48:307–330, 1998.

13. D. M. Wolpert and M. Kawato. Multiple paired for-ward and inverse models for motor control. Neural

Networks, 11:1317–1329, 1998.

14. A. Wykowska, A. Schubo, and B. Hommel. Howyou move is what you see: Action planning biasesselection in visual search. Journal of Experimental

Psychology: Human Perception and Performance,in press.