Computer Vision based Dance Visualisation

VSMM’05, Belgium

Computer Vision based Dance Visualisation

Sanjay Goel, Chirag Gupta, T. Gnana Swaroop, Gaurav Jain, Tarang Gupta and Shoma Chatterjee

Jaypee Institute of Information Technology, Noida, India

Abstract :- In this paper we discuss a Computer Vision based tool for dance scholars. The tool aims to use computer vision to enable the analyst to concentrate on body movements. The processed video highlights the main body motion by extracting body contour. The tool provides the ability to add and display additional textual information with each frame. It also facilitates juxtaposition of original video with extracted video.

1. Introduction The fascination for Indian dance all over the world is indicative of the deep-felt need to use the human body to express and celebrate the great universal truths. It illuminates India's culture in a direct manner, playing on the sensibilities of the onlooker. Dance in India has seeped into several other realms like poetry, sculpture, architecture, literature, music and theatre. All Indian dance forms are thus structured around the nine rasas or emotions [1], hasya (happiness), krodha (anger), bhibasta (disgust), bhaya (fear), shoka (sorrow), viram (courage), karuna (compassion), adbhuta (wonder) and shanta (serenity). Very little contribution other than video storage and dissemination has been made by the fast growing computer technology in the field of dance and none for Indian classical dance. This paper discusses a tool that is under development to help dance scholars to analyze solo dance performances.

1.1 Literature Survey: Computers find various uses in Dance [5]. Some important non-graphics uses are described in administration, lighting control, and competition scrutineering. Graphical applications include notation, choreography, teaching, and performance. One of the earlier works [2] explores the nature of creative composition particularly as it applies to dance, and describes the development of interactive computer based tools to assist the composer. The hierarchical nature of the composition process calls for an interface which allows the composer the flexibility to move back and forth between alternate views and conceptual levels of abstraction. COMPOSE, an interactive system for the composition of dance has been implemented on Silicon Graphics and Apple workstations, [2]. The user visually composes in space and in time using menus of postures and sequences. The animation of the dance composition allows the final result to be evaluated.

One of the first Dance Technology composition, choreographed in 1994, researchers used Motion Interactive (MINT) - a special motion - capture program they developed - to translate dance into computer animation, [3]. Two video cameras captured the movement of reflective

VSMM’05, Belgium

markers at 27 points on a dancer’s body. The researchers digitized the video, using it to create a computer model of the dancer [3]. For another performance, the researchers employed infrared cameras to track emitters hidden on a dancer’s costume. This data is fed into a high - speed graphics workstation in real time, the animation video resulted in animated trails of the dancer’s movements by projection of real-time graphics onto a translucent screen [3].

A collaboration between the Atlanta Ballet and Georgia Tech’s Interactive Media Technology Center (IMTC), the Dance Technology Project featured combining ballet and computer animation techniques [3]. The project dealt with video costuming. That is, a camera and computer system track the motions of the dancers on stage while a second graphics computer is used to create their ‘virtual costumes’ which are projected onto them, in exact registration to their body orientations - even as they dance. Other activities were computer generated dancers intermingling with real dancers, and computer - generated art ‘created’ by the dancers as the performance progressed [3].

The work reported in [4] deals with phrase structure detection in contemporary western

dance. Phrases are a sequence of movements that exist at a higher semantic abstraction than gestures. The problem is important since phrasal structure in dance, plays a key role in communicating meaning, [4]. They detect fundamental dance structures that form the basis for more complex movement sequences.

Computed dancing figures have also been proposed as an aid in teaching dance [5]. For example, the computer could be used to show idealised movements slowly of fast steps, that are impossible to demonstrate slowly because of problems with balance or momentum. Computers could also be used as a teaching aid for student to classify for themselves steps with complex alternatives [5].

The authors in [6] have come up with an algorithm of synthesizing music that can appropriately express emotions in dances. This algorithm can help one compile music suitable for dance movies or animation films, and is also applicable to any entertainment systems that use music or dance. The algorithm is composed of three modules. The first is the module of computing emotions from an inputted dance, the second that of computing emotions from music in the database and last that of selecting music suitable for inputted dance via an interface of emotion, [6].

An experimental dance performance featuring live-motion capture, real-time computer graphics, and multi-image projection was produced by a cross-departmental team of faculty and students at Purdue University, [7]. Dancers occupied and traversed performance mediums or ‘frames’ including a virtual performance frame occupied by a 3D character, driven by a dancer in motion-capture equipment. Developing and facilitating the relationships between the dancers in various performance frames was a primary focus of the project.

A multimodal information system method for a basic dance training system is discussed in [8]. The system targets on beginners and enables them to learn basics of dances easily. One of the most effective ways of learning dance is to watch a video showing the performance of dance masters. However, some information cannot be conveyed well

VSMM’05, Belgium

through video. One is the translational motion, especially that in depth direction. One cannot tell exactly how far does the dancers move forward or backward, [8]. Another is the timing information. Although one can tell how to move arms or legs from video. It is difficult to know when to start moving them. The first issue is solved by introducing an image display on a mobile robot [8]. One can learn the amount of translation just by following the robot. They introduced active devices for the second issue [8]. The active devices are composed of some vibro-motors and are developed to direct action-starting cues with vibration, [8].

1.2 Scope of project The main objective of our project is to exploit the potential of digital image processing and computer vision techniques to serve some of the common and regular needs of scholars and students of Indian classical dance. At present, dance students and scholars learn or analyze dance movements by observing performances of professional dancers. Video recordings of performances are popular for later reference and analysis. Often the dance scholar needs to concentrate on specific aspects like hand movement and so on. In the absence of any tool to filter out distracting details, such scholarly analysis becomes a tedious task. User friendly software tool(s) can help dance scholars analyze and annotate the recorded performances, add subtitles and annotation for specific frame sequence and store the annotated video in a regular format viewable on any regular media player. Availability of such software will encourage scholars to add more information in recorded videos which can be accessed, understood and appreciated by a common man. Such software will allow the scholars to create a well documented archive of dance videos with searchable annotations.

2. Outline of the Algorithm

This section discusses design of our Computer Aided Dance Analysis and Visualization tool. The tool allows users to view the original video and the processed video simultaneously. It also allows the users to add information to every frame of the dance video. All forms of Indian Classical dance depict a story. Frame specific information can be added with each frame as frame annotation. It allows the users to filter out distracting details by extracting body contour and image skeleton.

We have designed a sample interface using Visual C++. Matlab has been used as an

intermediate test environment [14], where we tested the various image processing algorithms which have later been migrated to Visual C++. The phases involved in the design of the dance tool and the results are underlined below. The main processes are segmentation, edge detection and skeletonization.

2.1 Segmentation and Edge Detection We extract frames from the video, as in Fig. 1, and segment to separate the dancer from the frame. We use the Region Growing algorithm [9] for this because we need to separate the dancer on the basis of color as well as region. For applying multi pass region growing algorithm, we convert the image to 256 level grayscale. We initially select all pixels

VSMM’05, Belgium

of a frame as seed points. Then we compare alternate seeds (s1) across the height and width of the frame with all four seed neighbors (north,south,east,west). The initial threshold range is kept to be within eight graylevel difference. If the two compared seeds are found similar i.e. within the threshold range, then we mark the neighbor with the value of seed s1 . Next we continue with the same process in subsequent passes by doubling the threshold range in subsequent passes until it reaches 128 as we need a binary output. The output of this process is the image of dancer separated from the rest of image as shown in Fig. 2. We convolve the segmented image with the Laplacian Mask for boundary detection [9] as in Fig. 3.

Fig 1. Input Video Frame

Fig 2. Clustered Frame

VSMM’05, Belgium

2.2 Skeletonization

The next and the thinning algorithms results. In Medial Awe find its closest belong to medial athe output of this a

Skeleton from Me

Also the complexitypixel in the boundawhy we had to m

Fig 3. Edge extracted/Boundary detected

most important algorithm is skeletonization. We tried using different like Medial Axis Transform [9] but none of them gave statisfactory xis Transformation of a region R with border B, for each point p in R, neighbor in B. If p has more than one such neighbor, it is said to xis of R. The Medial Axis Transformation does not serve our purpose as lgorithm for an L-shaped figure would be as shown in Fig. 4.

Expected Skeleton dial Axis Transformation

Fig. 4

of the Medial Axis Transformation is very high as it compares every ry of the region to each of the pixels in the image. This was the reason odify the algorithm to suit our purpose. In the modified algorithm

VSMM’05, Belgium

instead of calculating the distance of each pixel from the boundary pixel we calculate the distance of each boundary point from its horizontally opposite boundary point. Unlike Medial Axis Transform where we compare all the points (whether boundary or region) with each other, we select only the boundary points. We select a point in the boundary and find its opposite (horizontal) boundary point. One horizontal line may have portions of more than one body parts. The odd numbered boundary points on every horizontal line mark the beginning and the corresponding subsequent even numbered boundary point mark the end of the body part. We mark the center of these two points as the skeleton point for all such pairs of boundary points. S represents the set of all skeleton points. The output we get from this is not the perfect skeleton in all positions but this along with the boundary is enough for user to visualize the movements. This allows us to get closer to the expected skeleton as in Fig. 5 with much lower complexity allowing us to process same video in a lesser time.

2.3 Object Tracking

The final image processobject tracking. The mathe body. These anchor one or two points fortraditional classical dancto use the anchor point

Traditional object trafor tracking simple objecomplex objects (body designed our own algoritmatching and tracks objeFirst of all we mark a poiinto 32 gray levels. Wresults in slight changes

Fig 5. Skeletonized Frame

ing algorithm in the project is synchronized multiple inter-connected in objective of this step is to mask and track the anchor points of points comprised of head, neck, shoulders, elbows, palms, waist and legs. The points for legs were kept low keeping in mind the es where the female dancers wear saris. In our future work, we plan s for creating Vector Stick diagram of the dancer.

cking [15,16] failed in our case because such algorithms are made ct in a video. In our case we had to track multiple inter-connected parts) in a video and that too in a synchronized way. We have hm for this purpose. This algorithm is based on the principle of pattern cts in the input video using the output from the Skeletonizing algorithm. nt on the object which we want to track as p1. We quantize the image hen an object moves in a video some blurring is caused which in color of the object. To correct these errors we quantize the

VSMM’05, Belgium

image into 32 gray levels as they are adequate to track major object motions. Now we take a 11x11 pixel window (w1) on frame (j) with center as p1. Then we take 121, 11x11 pixel search windows from frame (j+1) with centers lying on each pixel of the corresponding 11x11 window (w1) on frame (j+1). We compare the histograms of all these search windows on frame (j+1) with (w1) and identify the window of minimum difference as the region (wr ) in which motion has taken place. The comparison of histograms is done according to the following formula:

Diff = ∑ [ƒ1 (binx) - ƒ2 (binx)]; where x extends from 1 to 32 gray scales

ƒ1(x): number of pixels of bin x gray scale in primary window.

ƒ2(x): number of pixels of bin x gray scale in search window. Now the problem is to search one point out of these 121 points in the window (wr).

Firstly, we find the difference along the x and y axis between p1 and the center of wr (center of wr - p1) to identify the quarter as given in Table 1 with (0,0) being considered as the top left corner of the image.

X Difference Y Difference Quarter Positive Positive Bottom right Negative Positive Bottom left Positive Negative Top right Negative Negative Top left

Table 1 : Quarter Identification in the search window

Now we search the identified quarter for skeleton points in top-down, left-right order and mark the first skeleton point found. Marking the point on the skeleton makes sure that the point does not move out of the body. If the point is not found in this quarter we scan the full window for skeleton points and mark the first skeleton point found. If there is no point of skeleton in this window we simply mark the center of this window as the corresponding point in frame (j+1). This process is applied to all the consecutive frames with respect to immediate predecessor frame, hence tracking the object as shown in Fig. 6.

VSMM’05, Belgium

Fig. 6. Locus of tracked finger

2.4 Interface

The design and interface of the tool was created in Visual C++ following the Document View Architecture. Our tool extensively used Multithreading in Visual C++. The processed video highlights the main body motion by extracting the body contour and also provides the ability to add and display additional textual information about the dance video with each frame for the user. It facilitates juxtaposition of original video with extracted video as shown in Fig. 7.

2.5 Future Scope

The work on the vectorization of the dancer’s stick diagram has been intiated. We have also realised an interface for easy ( and precise ) access of dance videos from the Digital Video Archive. The major obstacle is to do this without consuming huge bandwidth. Our interface for Digital Video Archives is based on the skeletonization algorithm reported in this paper. Details of this interface will be discussed in a future paper.

VSMM’05, Belgium

Screen Shot of the Main Application

r

Player Controller

Input Video Displayed Here

Subtitle Addition

Fig. 7 Acknowledgements

We are extremely thankful to Maria, a dance teacher who runshad a very fruitful discussion and we got many new interesting perspeSome of the relevant outcomes of this discussion were, extracting the hiding irrelevant information like color of dress etc., and applying enstudy the dance movements better and ability to compare two dance persimilar performances by different dancers.

Slider Toolba

Processed Output Video Displayed Here

her own dance school. We ctives to look at our problem. dancer from the dance video, hancements on the dancer to formances of same dancer or

VSMM’05, Belgium

References

[1] Visual dictionary of Hastas for Indian dance - hand gestures of Indian Dance http://www.kanakasabha.com/hastas/index.htm

[2] T. Schiphorst, et al., Tools for Interaction with the Creative Process of Composition. Centre for Systems Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada., CHI 90 Proceedings, pp 167 – 174.

[3] Reseach for the Games. Georgia Tech forged strong ties to Atlanta’s 1996 Olympic Games. Compiled by Lea McLees. [4] V. M. Dyaberi, et al. Phrase Structure Detection in Dance, Proceedings of the 12th annual ACM international conference on Multimedia, Oct 2004, pp. 332 - 335. [5] Dance and the Computer : A Potential for Graphic Synergy. Technical Report 422. Basser Department of Computer Science. University of Sydney, Oct 2003. [6] Hirofumi Morioka , et al.Proposal of an Algorithm to Synthesize Music Suitable for Dance. Proceedings of the 2004 ACM SIGCHI International Conference on Advances in Computer Entertainment Technology, Sept 2004, pp. 296 – 301. [7] W. Scott, et al. Mixing Dance Realities: Collaborative Development of Live - Motion Capture in a Performing Arts Environment. ACM Computers in Entertainment ( CIE ), vol 2, issue 2, April 2004. [8] Akio Nakamura, et al. Multimodal Presentation Method for a Dance Training System. Saitama University, JAPAN. CHI 2005, pp 1685 – 1688. [9] R.C. Gonzalez and R.E. Woods - Digital Image Processing, second edition, Pearson Education. [10] The Open Video Project http://www.Open-Video.org [11] Digital Video Archives: Managing Through Metadata [12] Informedia Digital Video Library System – Carnegie Mellon University [13] Fabio Chestani - Video Retrieval Interfaces [14] Building GUIs with Matlab version 5 from MathWorks

[15] Darrell D. Demirdjian T. Ko T. - Constraining Human Body Tracking,

Artificial Intelligence Laboratory, MIT [16] Minden Gary, Niehaus Doug and Roberts James -

The Digital Video Library System: Vision and Design

http://www.kanakasabha.com/hastas/index.htm

Business

Computer Vision based Dance Visualisation