Retrieval of Motion Composition in Film

This article was downloaded by: [175.143.160.248]On: 05 January 2013, At: 18:12Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office:Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Digital CreativityPublication details, including instructions for authors and subscriptioninformation:http://www.tandfonline.com/loi/ndcr20

Retrieval of motion composition in filmMatthias Zeppelzauer a , Maia Zaharieva a , Dalibor Mitrović a & Christian

Breiteneder aa Interactive Media Systems Group, Vienna University of TechnologyVersion of record first published: 04 Jan 2012.

To cite this article: Matthias Zeppelzauer , Maia Zaharieva , Dalibor Mitrović & Christian Breiteneder (2011):Retrieval of motion composition in film, Digital Creativity, 22:4, 219-234

To link to this article: http://dx.doi.org/10.1080/14626268.2011.622282

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantialor systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that thecontents will be complete or accurate or up to date. The accuracy of any instructions, formulae,and drug doses should be independently verified with primary sources. The publisher shall notbe liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever orhowsoever caused arising directly or indirectly in connection with or arising out of the use of thismaterial.

http://www.tandfonline.com/loi/ndcr20

http://dx.doi.org/10.1080/14626268.2011.622282

http://www.tandfonline.com/page/terms-and-conditions

Retrieval of motioncomposition in filmMatthias Zeppelzauer, Maia Zaharieva,Dalibor Mitrovic and Christian Breiteneder

Interactive Media Systems Group, Vienna University of Technology

[email protected]; [email protected];[email protected]; [email protected]

Abstract

This article presents methods for the automatic retrievalof motion and motion compositions in movies. We intro-duce a user-friendly sketch-based query interface thatenables the user to describe desired motion compo-sitions. Based on this abstract description, a tolerantmatching scheme extracts shots from a movie with asimilar composition. We investigate and evaluate twoapplication scenarios: the retrieval of motion compo-sitions and the retrieval of matching motions (a tech-nique in continuity editing). Experiments show thatthe developed methods accurately and promptly retrieverelevant shots. The presented methods enable new waysof searching and investigating movies.

Keywords: motion retrieval, query-by-motion, motioncomposition, matching action, motion segmentation

1 Introduction

Motion characterises the style of a movie and sig-nificantly influences the way it is perceived byviewers. Motion controls tension and tempo in ascene, creates visual rhythm and gives a movietemporal continuity. Furthermore, motion givesevidence about the genre of a movie and aboutthe director’s style.

Motion is of special interest to film-makers andfilm theorists—however from different perspec-tives. Film-makers employ motion to increasethe tension in a scene, for example. This may beachieved by using a shaky camera (steadicam)and showing fast-moving objects. Film theoristsotherwise reverse the process of film-making andmanually analyse in detail; for example, theusage of camera and object motion in order toinvestigate the progression of tension over anentire movie.

Automatic methods for motion retrievalsupport both film theorists and film-makers intheir creative work by allowing efficient searchand retrieval of particular types of motions andmotion compositions. In particular, film theoristsmanually analyse motion shot by shot and ingreat detail, which is a tedious, time-consumingand error-prone task. Automatic motion retrievalsupports the expert in finding typical motion direc-tions, locations and combinations of interest moreefficiently and sometimes also more accurately.Film-makers benefit from motion retrieval as itsupports editors and directors in searching foralternative ways of motion editing. Motion

Digital Creativity2011, Vol. 22, No. 4, pp. 219–234

ISSN 1462-6268 # 2011 Taylor & Francishttp://dx.doi.org/10.1080/14626268.2011.622282http://www.tandfonline.com

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

retrieval may further enhance the post-productionof movies by automatically checking the quality ofmotion continuity, detecting errors and jumps.Additionally, motion analysis and retrieval maybe incorporated into production tools which auto-matically generate movies from large movie andvideo databases.

Different approaches for the automatic analy-sis and retrieval of motion have been proposedin literature. However, they are hardly suitablefor the applications of film theorists and film-makers for different reasons. Most methods arebased on the analysis of object motions only.They first segment and track the objects in a shot(Chang et al. 1998, Buzan et al. 2004) and thencompute a motion trajectory for each object(Dagtas et al. 2000). Retrieval is then performedby matching the object trajectories with trajec-tories provided by the user (Bashir et al. 2007).However, some types of motion which are impor-tant for motion compositions cannot be rep-resented by these approaches because they aredifficult to segment, such as groups of objects(e.g. people, cars), motion of water (e.g. rivers)and smoke. Other approaches skip object segmen-tation and focus on the retrieval of camera motionsonly (Ardizzone et al. 1996, Hanis and Sziranyi2003). Such methods rely on quantitative rep-resentations of motion which are too coarse andinaccurate for the retrieval of motion compo-sitions.

This article introduces a more generalapproach to motion retrieval. The presentedmethods build upon a technique for motion seg-mentation that we originally presented in 2010(Zeppelzauer et al. 2010). The technique clusterspreviously tracked motion trajectories into mean-ingful motion segments. Motion segments rep-resent object motions as well as camera motionsand are a convenient basis for the retrieval ofmotion. We define a novel type of query for thedescription of user-defined motion compositions.Based on the query, a tolerant matching schemeextracts the shots in a movie which are mostsimilar. The method enables searching for typicalcamera motions, object motions and characteristicmotion directions. Finally, we extend the method

to the retrieval of matching motion (see Section3.3) by adapting the query model and refiningthe matching scheme.

We demonstrate the performance of motioncomposition retrieval in the context of historicaldocumentaries, since they contain numerous(repeated) complex compositions. The retrievalof matching motion is demonstrated with the con-temporary movie Run Lola Run by Tom Tykwerfrom 1998. Run Lola Run is a thriller where thedirector makes extensive use of matching onaction throughout the film to create the impressionof a seamless storyline.

2 Motion analysis

A movie consists of different hierarchical levels.The basic visual unit in a movie is the frame—asingle still image on the strip of film. Severalframes make up a shot. In the phase of shooting,a shot is ‘one uninterrupted run of the camera toexpose a series of frames. Also called a take’(Bordwell and Thompson 2008, p. 8). Later, inthe edited film a shot is ‘the length of film fromone splice or optical transition to the next’(Beaver 2009, p. 230). The latter definition is rel-evant for us, since we analyse already editedmovies. Two shots (after editing) are connectedeither by an abrupt transition (cut) or by agradual transition (e.g. dissolve and fade). Thenext higher structural level is the scene. A sceneis ‘a unit of a motion picture usually composedof a number of interrelated shots that are unifiedby location or dramatic incident’ (Beaver 2009,p. 223).

We first temporally partition the movie into itsshots by the method proposed in Zeppelzauer et al.(2008). For each shot motion tracking and motionsegmentation is then performed separately.

2.1 Motion trackingDifferent methods exist for the estimation andtracking of motion. The basis for motion segmen-tation is usually a dense motion field obtainedfrom an optical flow estimator (e.g. Horn andSchunck 1981). Optical flow methods estimatethe motion between two successive frames at

Zeppelzauer et al.

220

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

each pixel. This results in a dense motion field foreach frame. However, high-quality optical flowestimators are too time-consuming to be appliedto a full-length feature film (Brox et al. 2004).

Feature trackers are more suitable for this tasksince they can operate in real-time. Feature trackersfirst select distinct points in a frame (e.g. corners ofobjects) and then attempt to trace these points overtime in subsequent frames. Points that cannot betracked any further (e.g. because they move outof the frame or get occluded) are replaced continu-ously by new points. Figure 1 shows the trajec-tories of tracked feature points for an examplesequence where two cars move towards eachother and finally crash.

Feature tracking generates trajectories thatrepresent the motion performed by the selectedfeature points. In contrast to optical flow, theresulting motion field is sparse (trajectories existonly for a few points and not for all pixels).We employ the Kanade–Lucas–Tomasi featuretracker (KLT) because of its efficiency and itsability to robustly track feature points across mul-tiple frames (Shi and Tomasi 1994).

2.2 Motion segmentationMotion segmentation aims at grouping motion tra-jectories that originate from the same source (anobject or a camera movement). This segmentationallows an abstract description of the motion

content in a shot and can be used for the retrievalof typical motions and motion compositions (seeSection 3).

Segmentation of sparse motion fields obtainedfrom feature tracking is more complex than ofdense fields. The trajectories are different inlength and have varying begin and end times.Due to occlusions and tracking errors, the trajec-tories often break off. Conventional segmentationmethods cannot be applied to such heterogeneousdata because they cannot cope with missing andincomplete data. This motivated the developmentof a novel method for the segmentation of frag-mented motion fields (Zeppelzauer et al. 2010).The idea is to segment the entire sparse field of tra-jectories directly by iteratively grouping tem-porally overlapping trajectories. The process isillustrated in Figure 2. There are two stages. Thefirst stage (iterative clustering) groups temporallyoverlapping trajectories with similar velocitydirection and magnitude. The second stage(merging) connects the segments from the firststage into temporally adjacent segments coveringlarger time spans.

A basic assumption is that trajectories thatperform similar motion at the same time belongto the same motion segment. According to this defi-nition, a segment can represent motion of a singleobject, a group of several similarly movingobjects, as well as motions of the camera. Conse-

Figure 1. Motion trajectories obtained from the KLT feature tracker.Credit: Stadtkino Filmverleih; mark-up: Matthias Zeppelzauer. Copyright by Stadtkino Filmverleih.

Retrieval of motion composition in film

221

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

quently, segmentation is not restricted to a particu-lar type and source of motion which is importantfor the retrieval of arbitrary motion compositions.

The first stage of the approach (iterative cluster-ing) starts with the selection of expressive trajec-tories (representatives) for the initialisation of newmotion segments. A good representative is a trajec-tory that travels a significant distance. In the nextstep, all temporally overlapping trajectories whichare similar (in terms of velocity direction and mag-nitude) to the selected representative are added tothe new segment. Finally, all trajectories thatlie temporally fully inside the new segment areremoved from the original motion field. Trajec-tories that are temporally not fully covered by the

segment remain in the motion field. That enablestrajectories to be assigned to multiple temporallyadjacent segments in further iterations. This is animportant prerequisite for the creation of long-term segments in the second stage of the algorithm.

The next iteration starts with the selection of anew representative trajectory from the motionfield. The algorithm terminates when no more tra-jectories are left in the motion field. Iterative clus-tering generates a set of overlapping motionsegments. The temporal extent of the segmentstends to be rather short (it is limited by the temporalextent of the representative trajectories). Conse-quently, the iterative clustering yields an over-segmentation of the motion field, as illustrated inFigure 2.

The goal of merging is to reduce the over-seg-mentation by connecting temporally adjacent seg-ments that represent the same motion. This isperformed by hierarchically merging overlappingsegments. Beginning with the smallest segment,we search for the one which shares the most trajec-tories with this segment. If the portion of sharedtrajectories exceeds a certain threshold they aremerged. In the following, this process is repeatedsuccessively with the next larger segment untilall segments have been considered for merging.The result is a smaller set of segments withlarger temporal extent. Further iterations (see thearrow labelled ‘iterate’ in Figure 2) repeatedlymerge newly created segments until no segmentscan be merged any more. The ultimate result ofthe merging procedure is a small set of segmentswhich represent long-term motion.

An example for the segmentation of differentobject motions is illustrated in Figure 3. The shotshows a group of people walking up a hill. Thepeople in the group first move away from thecamera, towards the hill (in the lower rightquarter), then turn to the left, walk up the hilland finally vanish behind the hill. At the end ofthe shot a horse enters at the top of the hill in theopposite direction (annotated with the shortarrow in Figure 3(c)). From the three keyframes3(a)–3(c) we observe a number of artefacts(flicker, scratches and dirt). Since motion trackingis sensitive to intensity variations like flicker, the

Figure 2. The process of motion segmentation.

Zeppelzauer et al.

222

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

trajectories are noisy and frequently break off.However, the approach is able to create temporallycoherent motion segments. The movement of thegroup of people away from the camera is rep-resented by segment 1 in Figure 3(d). Segment 2represents the motion of the people walking upthe hill. The third segment represents the horsethat enters at the end of the shot from the left.The result shows that the approach is able tosegment large groups of objects as well as smallindividual objects.

3 Motion retrieval

The obtained motion segments allow for acompact and expressive description of themotion contained in a shot. The segments describediverse types of motion and enable the develop-ment of novel applications for retrieval and analy-sis of motion in a movie.

We investigate two different application scen-arios that build upon the previously extractedmotion segments. In the first scenario the goal isthe retrieval of shots with a particular motion com-position consisting of camera motions, objectmotions or both (see Section 3.2). The secondscenario targets the retrieval of pairs of shots that

are connected by matching action (see Section3.3). In both application scenarios the user has todefine a query. The design of the query is acrucial factor since it defines in which way theuser has to specify the motion content of interest.Different types of queries exist in retrieval, suchas textual queries, example-based queries andsketch-based queries. In the following, we definean intuitive type of sketch-based query thatenables the user to specify arbitrary searchrequests.

3.1 Query formulationA number of systems incorporating motion forretrieval have been introduced, for exampleVideoQ (Chang et al. 1998), MovEase (Ahangeret al. 1995), Picturesque (Dagtas et al. 2000)and the system described by Dimitrova and Gol-shani (1995). These systems require the user todefine trajectories that represent the motion ofthe objects contained in the sequence of interest.Trajectory-based motion queries can becomevery complex with an increasing number of avail-able degrees of freedom (Ahanger et al. 1995).Each moving object may be described, forexample, by its trajectory together with its pro-jected size, direction, speed and acceleration at

Figure 3. Motion segmentation results. Images 3(a)–3(c) are keyframes from the beginning, middle and end of the shot (white arrowsrepresent the motion directions). Figure 3(d) shows the clustered trajectories and image 3(e) illustrates the three resulting motionsegments (arrows indicate the primary motion direction of a segment).Source: Austrian Film Museum; mark-up: Matthias Zeppelzauer. Copyright Austrian Film Museum.


223

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

each time instant. The definition of such a querycan become counter-intuitive and time-consumingfor complex motion compositions. Tolerant retrie-val is difficult due to the large amount of detailcontained in the motion description. Furthermore,the definition of such a motion query requiresdetailed knowledge about the sequences of inter-est, which reduces the exploratory capabilities ofthe corresponding systems.

We envision a query that is much easier (andfaster) to define and that allows the user to inte-grate more variability into the motion description.We perform experiments with six test users whoare all film experts. They are asked to sketch themotion content of selected shots on a piece ofpaper. The resulting sketches reveal that the mostimportant information for the participants is thedirection of the motion followed by its spatiallocation. Velocity and acceleration are neglectedby most users. Even the size of the movingobjects plays a secondary role. Furthermore,most test persons sketch motions of groups ofobjects as one coherent motion.

Based on these findings, we develop a querythat enables sketching the motions of interest asvectors in a sketch-pad window. The absolute pos-ition and the length of a vector coarsely specify theregion where the corresponding motion occurs(see Figure 5 for example queries). For simplicity,we do not consider velocity magnitude and accel-eration in the queries. Large moving objects orgroups of objects can be specified by drawingseveral nearby vectors with similar direction. Forspatially distributed motion like camera motion,the user simply sketches several arrows with thedesired direction(s) distributed over the entirequery window.1 The proposed query allows forexpressive motion descriptions and at the sametime provides enough freedom for tolerantretrieval.

The numerical description of a query containsthe directions of all provided query vectors.Additionally, a region (aligned rectangle orellipse) around each vector is extracted that rep-resents the area covered by the correspondingmotion. These two parameters (per query vector)are sufficient to represent motion compositions.

3.2 Retrieval of motion compositionsThe retrieval of motion compositions requires thematching of the query with the previously com-puted motion segments. Therefore, we extractrepresentative information from the motion seg-ments that is structurally similar to the parametersderived from the query vectors. For each motionsegment two parameters are computed: a represen-tative motion direction and the spatial regioncovered by the segment.

The segments obtained from motion segmenta-tion comprise a sparse set of fragmented trajectories.The extraction of representative information forsuch a segment is difficult, since the trajectorieshave different begin and end times and are spatiallydistributed. We extract the median direction of alltrajectories, which is a robust estimate for the domi-nant motion direction of the segment. The secondextracted parameter is the region covered by asegment. Therefore, we compute the polygon thatencompasses all trajectories of the segment.

In the next step, a match between the query andthe motion segments is established based on theextracted directional and spatial information. Option-ally, temporal parameters (start time and duration of amotion) can easily be incorporated as additional con-straints. During matching, all motion segments of ashot are compared with the query. A query Q isdefined as a set of N query vectors qi with directionsui and assigned regions Ri. A shot S contains Mmotion segments mj with representative (median)directions wj and regions (surrounding polygons)Mj. Matching between a query and the motion seg-ments of a shot is performed in three stages.

In the first stage a matching score si,j betweeneach query vector qi and each motion segment mj

of the shot is computed as:

si,j =ui ·wj

‖ ui ‖ · ‖wj ‖· |Ri > Mj|

|Ri|· 1−|Mj\Ri|

|Mj|

( )(1)

The first term is the cosine similarity of the queryvector’s direction ui and the motion segment’smedian direction wj. The second term is theportion of intersection between the region coveredby the query vector Ri and the motion segment’s

Zeppelzauer et al.

224

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

region Mj. The spatial intersection is negativelyweighted (penalised) by the area of the motionsegment not covered by the query vector: |Mj\Ri|.Matching each query vector with each motionsegment yields a set of scores si,1...M for eachquery vector qi. The scores are in the range [–1;1], where a score of 1 signifies a perfect matchand a score below 0 represents a negative match.

In the second stage, all positive scoresfor a query vector qi are summed:si =

∑Mj=1 max(si,j,0). This allows one query

vector to score on several motion segments. Nega-tive scores obtained due to negative cosine simi-larity are ignored. Finally, in the third stage anoverall score s is obtained by taking the sum ofthe scores si of all query vectors. The shots withthe highest overall scores for the query arereturned to the user.

The matching procedure is tolerant, since itdoes not require all query vectors to match witha retrieved shot. Additionally, a query vectoraccumulates scores from all matching motion seg-ments, which makes matching more robust under

noisy conditions, where motions are sometimessplit into multiple motion segments. The amountof desired tolerance depends on the application.Matching can be performed more strictly by intro-ducing penalties for non-matched and poorlymatched query vectors.

3.2.1 EvaluationFor the evaluation of the proposed method weemploy archive film material. The films are histori-cal artistic documentaries from the late 1920s. Thefilms exhibit twofold challenges that originatefrom their technical and from their artistic nature.From the technical point of view, the film materialcontains numerous artefacts such as flicker,shaking and dirt that impede the process ofmotion tracking and segmentation. From an artis-tic point of view, the applied documentary tech-nique is highly innovative and employs complexmotion compositions; for example, hammering,camera travellings and contrapuntal movementsas shown in Figure 4. See Zeppelzauer et al.(2008) for a detailed description of the material.

Figure 4. Typical motion compositions.Source: Austrian Film Museum; mark-up: Maia Zaharieva. Copyright Austrian Film Museum.


225

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

We test the proposed method with queriesrepresenting camera motions, object motions,motions of groups of objects and combinationsthereof. Experiments show that the method per-forms well for object motions. However, for theretrieval of spatially distributed motion (e.g.camera motions) which is represented by severalquery vectors the ranking of the retrieved shotsis suboptimal. That means that the most relevantshots are indeed retrieved but do not yield thehighest scores. This can be improved by incorpor-ating the total number of scoring query vectorsinto the matching function from Equation 1. Weweight the score s with the number of scoringquery vectors Pq : stotal = Pq · s. This increasesthe score of matches with a larger number ofscoring query vectors and generates rankings thatbetter correspond with the user’s expectations.

We present the results of three heterogeneousqueries in Figure 5. For each query, the figureshows keyframes of four of the returned shots. Thefirst query describes large-scale motion, such as agroup of objects moving from right to left or acamera motion (pan or travelling) to the right.2

The best match, Figure 5(b), is a tracking shotwhere the camera passes through under a bridge

from left to right. Similarly, the result in Figure5(c) is a tracking shot where the camera ismounted orthogonally to the direction of travel andcaptures the passing by environment. The thirdshot in Figure 5(d) shows a train passing by (fromright to left) captured from a static camera. Aremarkable result is the fourth shot, Figure 5(e). Itcaptures a large crowd of people which disorderlypushes and shoves to the left.

The second query in Figure 5(f) contains adiagonal motion from left to right which istypical for the film-maker of the analysed films.The best match is a tracking shot, where thecamera is mounted approximately 45 degrees tothe direction of travel (Figure 5(g)). This yieldsa dominant motion component in the queryvector’s direction resulting in a high matchingscore. The remaining shots show groups ofobjects (vehicles, people and horses) moving diag-onally towards the static camera (Figures 5(h)–5(j)). The motion directions in the returned shotsslightly deviate from the query vector’s direction.This demonstrates the tolerance of the matchingscheme.

The proposed method is able to retrieve evenmore complex motion compositions. The third

Figure 5. Three example queries. For each query (first column) keyframes of four top-ranked result shots are shown. Dashed arrows inthe keyframes mark camera motions and solid arrows are object motions.Source: Austrian Film Museum; mark-up and illustrations: Matthias Zeppelzauer. Copyright Austrian Film Museum.

Zeppelzauer et al.

226

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

query represents a combination of opposed (poss-ibly cyclic) vertical motions in the upper region ofthe frame. The best match is a shot of a trumpeterwho moves his trumpet up and down whileplaying (Figure 5(l)). The second retrieved shotshows several workers walking up and down astairway. The shot in Figure 5(n) shows factoryworkers moving up and down their arms whilethey pull and push a rod. The last shot in Figure5(o) shows three workers who hammer down apin into a rock.

We perform a quantitative evaluation to obtainrepresentative and objective performance measures.For this purpose, we define seventeen differentmotion queries. The queries represent typicalmotion patterns of camera motions, small andlarge objects motions, contrapuntal and rhythmicalmotion compositions. For each query, we assess thetwelve top-ranked returned shots as either relevantor not relevant. The portion of relevant shots inthis result set gives a performance measure for theaccuracy (precision) of the method and can be com-puted for differently sized result sets (e.g. from 1 to12). The corresponding measures are termed“prec@1” to “prec@12”. These measures enablethe evaluation of the obtained retrieval performanceas well as the ranking generated by the approach. Agood ranking is represented by a high prec@1(which means that the top-ranked result is mostoften relevant) and monotonically decreasing pre-cisions for larger result sets (prec@2, prec@3,. . .). The average precisions (for result set sizesfrom 1 to 12) over all seventeen queries areshown in Figure 6. We observe that the precisionis generally high and decreases for increasingresult set sizes which proves that the ranking isreasonable. The first returned shot is relevant with94% in average. Among the twelve top-ranked

shots of the seventeen queries in average 65% arerelevant to the user.

The evaluation confirms that the generatedmotion segments adequately represent themotion content of the analysed film material.The simple and intuitive queries combined withthe tolerant matching scheme enable the efficientsearch for particular motion compositions. Themethod can be further employed in post-pro-duction and editing for finding shots with particu-lar motion content, e.g. shots with high/lowmotion activity, shots with distributed/localisedmotion, etc. Thus, the method supports the workof both, film creators and film theorists.

3.3 Retrieval of matching actionsContinuity editing plays an important role in film-making. It ‘refers to the matching of individualscenic elements from shot to shot so that detailsand actions, filmed at different times will edittogether without error’ (Beaver 2009, p. 59). Con-tinuity editing assures that consecutive shots in ascene fit seamlessly together and that the conveyedstory is presented consistently to the viewer. Animportant device for achieving continuity ismatching on action (also cutting on action,cutting on motion). Matching on action aims atkeeping the screen direction (the motion directionof objects from the perspective of the camera)between successive shots consistent. Directionalcontinuity is important to avoid confusion for theobserver. In scenes presenting a chase, forexample, the motion direction is usually consistentamong several shots, e.g. from left-to-right toconvey the impression of a continuous action. Ashot that contains right-to-left motion wouldconfuse the observer’s orientation. Two examplesof matched actions are shown in Figure 7. The firstexample in Figure 7(a) and 7(b) shows a characterthat turns its head from left to right. During thismovement the director cuts to a close up of theface. The continued motion between both shotsmakes the transition between the different shotscales appear seamless. Figures 7(c) and 7(d)show an example of the entrance–exit pattern(Beaver 2009). An actor exits the frame at theright side and in the successive shot enters theFigure 6. Precisions for different result set sizes.


227

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

frame from the left but perhaps at another time andlocation. This pattern can be utilised to create acontinuous transition between different scenesand locations.

We develop a method for the automatic analy-sis of continuity editing in a movie. The methodfinds shots with matching action based on a user-defined query. For this purpose, we extend thepresented query model and adapt the matchingscheme from Section 3.2. Figure 8 illustrates theretrieval process. First, the query window is splitvertically into two halves: A and B. In query Athe user sketches the motion present at the end ofan arbitrary shot and query B contains the contin-ued motion at the beginning of the next shot. Forthe retrieval of matched motion we define ananalysis window (the grey rectangle in Figure 8)of a few seconds (two seconds in the experiments)around each cut. Query A is then matched only

with the left half of the analysis window (end ofshot N) and query B only with the right half(beginning of shot N + 1). The restriction to thewindow is necessary to get more accurate results.Matching queries A and B with the entire shotsN and N + 1 (as in Section 3.2) could takemotion into account which is temporally notlocated around the cut and thus is not relevantfor continuity. If both queries positively scoreon the according halves of the analysis windowwe combine both scores and return shots N andN + 1 as a result to the user. All returned resultsare then ranked according to their combined score.

For the retrieval of matched motions a finermatching scheme is necessary that restricts thecomparison to the analysis window only. Weadapt the matching scheme from Section 3.2 asfollows. First, we remove all motion segmentsthat do not coincide with the analysis window.

Figure 7. Two examples of matched action.Source: Krista Zeppelzauer.

Zeppelzauer et al.

228

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

For the remaining motion segments we extractdirectional and spatial parameters. Since themotion segments may be only partially insidethe analysis window, the median direction of theentire segment (as used previously) is not a repre-sentative parameter with regard to the analysiswindow. Instead, we compute the median direc-tion of the segment for each frame in the analysiswindow separately. For an analysis window thatcovers 2 · D frames, this results in D directionsfj1...D for each segment which allows for a moreprecise matching. Directional matching betweena segment and a query vector is then performedby matching the direction of the query vectorwith each median direction of the segment. Wecompute the mean of the cosine similarities (seefirst term in Equation 2) between the queryvector’s direction ui and each direction fjk of thesegment. Spatial matching is performed as in

Section 3.2. The modified scoring function isdefined as:

si,j =1

D

∑D

k=1

fjk · ui

‖ fjk ‖ · ‖ ui ‖

( )· |Ri > Mj|

|Ri|

· 1 − |Mj\Ri||Mj|

( ). (2)

We compute overall scores for both halves ofthe analysis window sA and sB and combine themby taking their product. This measure yields ahigh overall score only when both scores arehigh. Additionally, the scores are weighted by theportion of scoring query vectors Pq from queriesA and B and by the portion of scoring motionsegments Ps in the analysis window: stotal =sA · sB · Pq · Ps.

The weighting increases the score where thequery vectors correspond particularly well with

Figure 8. Retrieval of matching actions. Each query (A and B) is matched with the corresponding half of the analysis window. In thisexample the query represents a typical entrance–exit pattern.


229

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

the actual motion content. This weightingimproves the ranking of the retrieved sequences.

3.3.1 EvaluationWe evaluate the performance of the proposedmethod with the movie Run Lola Run by TomTykwer from 1998. The movie is a thriller thatmakes extensive use of matching on action.There are numerous scenes where motion is con-sistently carried across several cuts such aschasing scenes and journeys. Many scenes, for

example, show the leading character Lolarunning through the streets from different view-points. They are all connected by matchingaction to create the impression of a continuousjourney. Another frequent pattern is subsequentdolly forward shots joined with matching actionwhich show Lola’s view during running. Themovie further contains characteristic sequencesof shots with discontinuous motion. The directorconnects contrapuntal motions over consecutiveshots, for example by alternating dolly forward

Figure 9. Three example queries that describe different scenarios of matched action together with a corresponding result from RunLola Run.Credit: Stadtkino Filmverleih; mark-up and illustrations: Matthias Zeppelzauer. Copyright Stadtkino Filmverleih.

Zeppelzauer et al.

230

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

and backward movements. The movie is com-posed of three episodes that show three differentversions of the same story. Consequently, formany scenes there exist three different versionswith similar motions but varying content. Thismakes the material well-suited for the evaluationof the proposed approach.

Prior to the evaluation, film experts manuallysearched for matching actions in the movie andannotated them. We evaluate the retrieval perform-ance for matched actions with different queries.Figure 9 shows results for three example queries.The first one in Figure 9(a) describes a localmotion in the right half of the frame directeddownwards that is carried across a cut. This is avariation of the entrance–exit pattern. The firstreturned result shows Lola’s friend Manni in aphone booth talking to Lola. At the end of theshot Manni sinks down in resignation and hishead leaves the frame at the bottom. The followingshot continues this motion from another cameraangle and shows Manni’s head moving in fromthe top of the frame.

The second query (see Figure 9(c)) describesa spatially more distributed motion with a screendirection pointing downwards. This can eithercorrespond to downwards motion or motiontowards the camera. The query matches wellwith a scene of shots showing an ambulanceapproaching the camera that just crashed into aglass plate. While the ambulance comes closerto the camera the director cuts to a wide angleshot that continues the motion (until the ambu-lance stops) and shows what has happened inthe surrounding of the ambulance after the acci-dent (see Figure 9(d)). This example demon-strates how matching action can be applied tocreate a seamless transition between two differentscales of a shot.

The third example query in Figure 9(e) rep-resents the motion pattern produced by a zoomin or dolly forward motion. The method returnsseveral pairs of shots with a continued dollyforward motion. One example is shown inFigure 9(f). The first shot is a medium shot ofManni. While the camera slowly movestowards Manni, the director cuts to a mediumshot of Lola where the camera continues itsmovement at the same speed towards Lola.This transition directs the attention towards thetwo main characters and increases the tensionin the scene.

In most cases the returned results match wellwith the expert annotations. However, there arealso results that do not match the underlyingquery at the first glance. An example is shots cap-tured with a shaky camera (steadicam) which oftenappear in chasing scenes. In some cases theshaking of the camera produces a pattern of con-tinued motion between two consecutive shots.Such sequences of shots usually do not conveythe impression of a matched action. Othersources of confusion are background movementsthat match well with the query (e.g. a car movingin the background). Such results may surprisethe user because the background motion is oftennot perceived consciously by the viewer. Conse-quently, the viewer would not recognise thematching motion in this case. However, the pro-posed method detects such matching backgroundmotions since it does not distinguish betweenforeground and background motion or between‘salient’ and ‘non-salient’ motion. Althoughmatching background motions may not be recog-nised by the viewer as examples of matchedactions at the first glance, the question ariseswhether or not the movements are matched onpurpose by the director.

Figure 10. A matched action recovered by the proposed method.Credit: Stadtkino Filmverleih; mark-up: Matthias Zeppelzauer. Copyright Stadtkino Filmverleih.


231

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

We further observe that the proposed method isable to retrieve matching actions that were notrecovered by the film experts during annotation.An example is shown in Figure 10. It shows anexcerpt of cross cut shots between Manni andLola talking on the phone. The cut between theshots in Figure 10 is positioned in a way that thedownwards movement of Lola’s head is continuedseamlessly by the downwards movement ofManni’s head. This generates a transition thatleads the viewer smoothly from one shot to thenext. The matching action lasts for approximatelya second, which makes it difficult to detect manu-ally. This example shows that the proposedmethod has the potential to detect patterns of inter-est that human viewers are likely to overlook. Inthis manner, the method is able to assist the inves-tigation and exploration of continuity editing in amovie.

The analysis of matched motion with continu-ous screen direction is only one possible appli-cation scenario. We can, for example, retrievesequences of shots with contrapuntal motionwhere shots with different (possibly opposing)motion directions alternate. Such sequences canbe employed to create the impression of objectsor people moving away from each other. Further-more, contrapuntal motion is a device for the cre-ation of rhythmic motions. In Run Lola Run, forexample dolly forward movements are frequentlyfollowed by dolly backward movements and viceversa. Retrieval results show that this combinationoften appears in the movie in situations where Lola

is running (see Figure 11 for an example). The firstshot of such a combination typically shows theworld from Lola’s perspective translated in adolly forward when she is running. In the sub-sequent shot the camera is positioned in front ofLola and moves backwards showing her running.

Ultimately, we want to point out that the appli-cability of the method is not limited to thepresented examples. The method can beemployed for the retrieval of arbitrary combi-nations of consecutive motions since the defi-nition of the query is up to the user. This makesit well-suited to investigations of film theoristsand archivists (e.g. for the study of characteristicsof different film genres). Furthermore, the methodcan be easily extended to automatically retrievearbitrary matching actions without the need of aquery. This functionality could be integrated intoautomatic production tools such as ‘nm2’ whichenables the generation of personalised documen-taries and summaries based on predefinedmontage rules (Alifragkis and Penz 2006). Theintegration of motion analysis would enable thegeneration of movies with motion continuity andwith preferred motion compositions. The strongerincorporation of motion would improve the narra-tion and probably the aesthetic value of themovies generated.

4 Conclusion

This article presents novel methods for the retrie-val of motion in movies. Starting with the tracking

Figure 11. A typical contrapuntal composition of motion that appears repeatedly in the movie together with the according query.Credit: Stadtkino Filmverleih; mark-up and illustrations: Matthias Zeppelzauer. Copyright Stadtkino Filmverleih.

Zeppelzauer et al.

232

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

of feature points and the generation of trajectories,motion segments are created that describe themotion content of a shot in a meaningful way.The connection of robust motion segmentationwith an intuitive query model and a tolerantmatching scheme enables the retrieval of motioncompositions and matching actions. The proposedquery model is easy to understand and does notrestrict the user to a predefined vocabulary. Conse-quently, it does not only support searching foralready known motion patterns but also enablesthe exploratory search of motion compositions.

The developed methods perform well in theinvestigated retrieval scenarios. In some cases,we even gain new insights about the analysedmovies by experimenting with the retrievalsystems. The presented methods have the potentialto assist film analysis and to improve searchingmovie databases.

In the future we plan to extend the evaluationto a heterogeneous set of film genres, forexample action movies which usually containvery fast motions, but also slow-paced genreswith less motion activity (e.g. romance films andmelodramas). Additionally, we will perform userstudies with different user groups (theorists andfilm-makers) to assess the performance of the pro-posed methods on a wider basis. A next step on thetechnical side will be the detection and assemblyof arbitrary matching actions (without query) forthe generation of movies with motion continuity.

Acknowledgements

We want like to thank Adelheid Heftberger forproviding film annotations and film theoreticexpertise. This work received financial supportfrom the Vienna Science and Technology Fund(WWTF) under grant number CI06 024.

Notes1 Note that for camera motion the direction of the query

vectors has to be reversed because the vectors alwaysrepresent motion relative to the image content. Acamera pan to the right, for example, is representedby several arrows that point to the left because theimage content on the screen actually moves to theleft during a pan to the right.

2 Note that this query may retrieve camera motions aswell as object motions. We do not intend todistinguish between camera and object motion.

ReferencesAhanger, G., Benson, D., and Little, T.D., 1995. Video

query formulation. Storage and Retrieval forImage and Video Databases III, 2420 (1), 280–291.

Alifragkis, S. and Penz, F., 2006. Spatial dialectics:montage and spatially organized narrative in storieswithout human leads. Digital Creativity, 17 (4),221–233.

Ardizzone, E., La Cascia, M., and Molinelli, D., 1996.Motion and color-based video indexing and retrie-val, In: Proceedings of the International Conferenceon Pattern Recognition, 25–29 August 1996Vienna, Austria. IEEE Computer Society, 135–139.

Bashir, F., Khokhar, A., and Schonfeld, D., 2007. Real-time motion trajectory-based indexing and retrievalof video sequences. IEEE Transactions on Multime-dia, 9 (1), 58–65.

Beaver, F., 2009. Dictionary of film terms: the aestheticcompanion to film art. 4th ed. New York: Peter Lang.

Bordwell, D. and Thompson, K., 2008. Film art: anintroduction. 8th ed. New York: McGraw Hill.

Brox, T., et al., 2004. High accuracy optical flow esti-mation based on a theory for warping. In: TomasPajdla and Jiri Matas, eds. Proceedings on the Euro-pean Conference on Computer Vision, 11–14 May2004 Prague, Czech Republic. Springer, vol. 3024,25–36.

Buzan, D., Sclaroff, S., and Kollios, G., 2004. Extrac-tion and clustering of motion trajectories in video,In: Josef Kittler, Maria Petrou and Mark Nixon,eds. Proceedings of the International Conferenceon Pattern Recognition, 23–26 August 2004Cambridge, UK. IEEE Computer Society, 521–524.

Chang, S., et al., 1998. A fully automated content-basedvideo search engine supporting spatiotemporalqueries. IEEE Transactions on Circuits andSystems for Video Technology, 8 (5), 602–615.

Dagtas, S., et al., 2000. Models for motion-based videoindexing and retrieval. IEEE Transactions on ImageProcessing, 9 (1), 88–101.

Dimitrova, N. and Golshani, F., 1995. Motion recoveryfor video content classification. ACM Transactionson Information Systems, 13 (4), 408–439.

Hanis, A. and Sziranyi, T., 2003. Measuring the motionsimilarity in video indexing. In: Mirislav Grigic and


233

Dig

italC

rea

tivity,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

Sonja Grigic, eds. Proceedings of the EURASIP Con-ference focused on Video/Image Processing and Mul-timedia Communication, 2–5 July 2003 ZagrebCroatia. IEEE Computer Society, vol. 2, 507–512.

Horn, B. and Schunck, B., 1981. Determining opticalflow. Artificial Intelligence, 17, 135–203.

Run Lola Run. 1998. Film. Directed by Tom Twyker.Stadtkino Filmverleih. DVD. X Filme CreativePool GmbH.

Shi, J. and Tomasi, C., 1994. Good features to track, In:Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition,21–23 June 1994 Seattle, USA. IEEE ComputerSociety, 593–600.

Zeppelzauer, M., Mitrovic, D., and Breiteneder, C.,2008. Analysis of historical artistic documentaries,In: Proceedings of the 9th International Workshopon Image Analysis for Multimedia Interactive Ser-vices, 7–9 May 2008 Klagenfurt, Austria. IEEEComputer Society, 201–206.

Zeppelzauer, M., et al., 2010. A novel trajectory cluster-ing approach for motion segmentation. In: SusanneBoll, et al. eds. Proceedings of IEEE Multimedia Mod-elling Conference, 6–8 January 2010 Chongqing,China. Springer, 433–443.

Matthias Zeppelzauer is a PhD student at theInteractive Media Systems Group at the ViennaUniversity of Technology. He received theMaster of Science degree in Computer Sciencein 2005 from the Vienna University of Technol-ogy. He has been employed in different researchprojects focusing on content-based audio andvideo retrieval. His research interests includecontent-based retrieval of video and sound, timeseries analysis, data mining and multimodalmedia understanding.

Maia Zaharieva is pursuing a PhD at the Interac-tive Media Systems Group at the Vienna Universityof Technology, Austria. Her research interestsinclude visual media analysis, processing andretrieval. Zaharieva received her MSc in businessinformatics from the University of Vienna.

Dalibor Mitrovic is a teaching and researchassociate with the Institute of Software Technol-ogy and Interactive Systems at the Vienna Univer-sity of Technology. He received a Master ofScience degree in Computer Science in 2005from the Vienna University of Technology,Austria. Dalibor Mitrovic pursues a PhD in com-puter science focusing on multimodal informationretrieval. His research interests include audioretrieval, real-time feature extraction and compu-ter vision.

Christian Breiteneder is a full professor for Inter-active Systems with the Institute of Software Tech-nology and Interactive Systems at the ViennaUniversity of Technology. Christian Breitenederreceived the Diploma Engineer degree in compu-ter science from the Johannes Kepler Universityin Linz in 1978 and a PhD in computer sciencefrom the University of Technology in Vienna in1991. Before joining the institute he was associateprofessor at the University of Vienna and had post-doctorate positions at the University of Geneva,Switzerland, and GMD (now Frauenhofer) inBirlinghoven, Germany. His current researchinterests include interactive media systems,media processing systems, content-based multi-modal information retrieval, 3-D user interaction,and augmented and mixed reality systems.

Zeppelzauer et al.

234

Dig

italC

rea

tivity

,V

ol.

22

,N

o.

4

Dow

nloa

ded

by [

175.

143.

160.

248]

at 1

8:12

05

Janu

ary

2013

Documents

Retrieval of Motion Composition in Film