18
MCML: motion capture markup language for integration of heterogeneous motion capture data Hyun-Sook Chung * , Yilbyung Lee AI Laboratory Department of Computer Science, Yonsei University, 134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea Received 23 February 2003; received in revised form 27 May 2003; accepted 31 May 2003 Abstract Motion capture technology is widely used for manufacturing animation since it produces high-quality character motion similar to the actual motion of the human body. However, motion capture has a significant weakness due to the lack of an industry-wide standard for archiving and exchanging motion capture data. It is difficult for animators to reuse and exchange motion capture data with each other. In this paper, we propose a standard format for integrating different motion capture file formats. Our standard format is called Motion Capture Markup Language (MCML). It is a markup language based on eXtensible Markup Language (XML). The purpose of MCML is not only to facilitate the conversion or integration of different formats, but also to allow for greater reusability of motion capture data, through the construction of a motion database storing the MCML documents. D 2003 Elsevier B.V. All rights reserved. Keywords: Motion capture file format; MCML; Markup language; XML 1. Introduction Motion capture technology is frequently used to solve problems in real-time animation, because it enables motion data to be made easily and creates physically perfect motions. These days, motion cap- ture plays an important role in the making of movies or games by the larger production or game companies. However, motion capture has a significant weak- ness. Firstly, it has low flexibility. The high-quality motion data created with motion capture technology are solidified data designed for specific characters or circumstances, so that it is not easy to edit or modify them for other purposes. Secondly, the captured motion data can have different data formats depending on the motion capture system which was employed. Each capture system defines its own data format to express the captured contents, ranging from a simple format in segment form to a complex format in hierarchical structure form. Thirdly, commercially available motion capture libraries are difficult to use, as they often include hundreds of examples, which can only be browsed by using the names of the actions they contain [3]. In this paper, we define a standard format for integrating motion capture data with different for- mats. Our standard format for motion capture data 0920-5489/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0920-5489(03)00071-0 * Corresponding author. Tel.: +82-2365-4598; fax: +82-2365- 2579. E-mail address: [email protected] (H.-S. Chung). www.elsevier.com/locate/csi Computer Standards & Interfaces 26 (2004) 113– 130

MCML: motion capture markup language for integration of heterogeneous motion capture data

Embed Size (px)

Citation preview

Page 1: MCML: motion capture markup language for integration of heterogeneous motion capture data

www.elsevier.com/locate/csi

Computer Standards & Interfaces 26 (2004) 113–130

MCML: motion capture markup language for integration of

heterogeneous motion capture data

Hyun-Sook Chung*, Yilbyung Lee

AI Laboratory Department of Computer Science, Yonsei University, 134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea

Received 23 February 2003; received in revised form 27 May 2003; accepted 31 May 2003

Abstract

Motion capture technology is widely used for manufacturing animation since it produces high-quality character motion similar

to the actual motion of the human body. However, motion capture has a significant weakness due to the lack of an industry-wide

standard for archiving and exchanging motion capture data. It is difficult for animators to reuse and exchange motion capture data

with each other. In this paper, we propose a standard format for integrating different motion capture file formats. Our standard

format is called Motion Capture Markup Language (MCML). It is a markup language based on eXtensible Markup Language

(XML). The purpose of MCML is not only to facilitate the conversion or integration of different formats, but also to allow for

greater reusability of motion capture data, through the construction of a motion database storing the MCML documents.

D 2003 Elsevier B.V. All rights reserved.

Keywords: Motion capture file format; MCML; Markup language; XML

1. Introduction are solidified data designed for specific characters or

Motion capture technology is frequently used to

solve problems in real-time animation, because it

enables motion data to be made easily and creates

physically perfect motions. These days, motion cap-

ture plays an important role in the making of movies

or games by the larger production or game companies.

However, motion capture has a significant weak-

ness. Firstly, it has low flexibility. The high-quality

motion data created with motion capture technology

0920-5489/$ - see front matter D 2003 Elsevier B.V. All rights reserved.

doi:10.1016/S0920-5489(03)00071-0

* Corresponding author. Tel.: +82-2365-4598; fax: +82-2365-

2579.

E-mail address: [email protected] (H.-S. Chung).

circumstances, so that it is not easy to edit or modify

them for other purposes. Secondly, the captured

motion data can have different data formats depending

on the motion capture system which was employed.

Each capture system defines its own data format to

express the captured contents, ranging from a simple

format in segment form to a complex format in

hierarchical structure form. Thirdly, commercially

available motion capture libraries are difficult to use,

as they often include hundreds of examples, which

can only be browsed by using the names of the actions

they contain [3].

In this paper, we define a standard format for

integrating motion capture data with different for-

mats. Our standard format for motion capture data

Page 2: MCML: motion capture markup language for integration of heterogeneous motion capture data

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130114

is a markup language that can express motion

capture data based on eXtensible Markup Language

(XML) [11], and is called Motion Capture Markup

Language (MCML). MCML defines a set of tags to

integrate Acclaim Skeleton File (ASF)/Acclaim Mo-

tion Capture data (AMC) [1], Biovision Hierarchi-

cal data (BVH) [2] and Hierarchical Translation-

Rotation (HTR) [6]. These three motion capture

data formats are the most popular formats and have

recently become supported by many kinds of mo-

tion software. MCML has an extensible structure,

by means of which new capture file formats can be

easily added.

A motion capture library is a collection of motion

capture files. It consists of a corpus of motion

capture files and descriptions of the actions contained

in these files. If there is a motion capture library, an

animator can search files with action names and

navigate files according to the category information.

However, a motion capture library is not as good at

storing and retrieving motion capture files as a data-

base. The method of searching a motion capture library

is such that only the action name can be specified, and

the user doesn’t have the ability to retrieve specific

frames or motion clips within a set of motion capture

files containing similar motions. Furthermore, the size

of the motion capture library may become very large

due to the duplication of files containing the same

capture data. This problem occurs because of each

motion capture file being stored in the same library

but in different formats.

We solve these problems by defining a standard

motion capture data format, MCML, which can be

used to store motion capture files in a database

and retrieve the motion clips from the database

using a query expression. By having a standard

format, we can eliminate the duplication of motion

capture files and create a compact-sized motion

database.

The structure of this paper is as follows. In

Section 2, we look at other, related studies. Section

3 summarizes the structure and contents of the

different motion capture data formats. Section 4

describes the design goals and scope of MCML.

In Section 5, the structure and contents of MCML

are explained in detail. Section 6 describes the

design and implementation of the core modules of

the MCML-based motion capture data management

system. Finally, Section 7 concludes this paper with

future research directions.

2. Related works

In this section, we summarize the related stud-

ies dealing with new markup language develop-

ment using XML as the method of representing

data in character animation, virtual reality and

other fields.

Morales [3] proposed a motion capture data

storage method based on XML. In our method,

motion capture data is stored by being converted

into XML data format, in order for animation staff

to be able to access the data in a mutually cooper-

ative environment, e.g. in the web-based environ-

ment. The system was designed in such a way that

the motion capture data could easily be used, and

XML and Active Server Page (ASP) technologies

were used for this purpose. In contrast to our study,

Morales [3] dealt only with motion capture data

stored in a simple format based on segments and did

not consider hierarchical structure. Moreover, it did

not suggest the use of a standard markup language

for motion capture data, such as the MCML lan-

guage proposed in this paper, but only alluded to the

possibility of data conversion using XML.

Secondly, the Virtual Human Markup Language

(VHML) [10], which builds on existing standards

such as those specified by the W3C Voice Browser

Activity is based on XML/XSL (XML Stylesheet

Language). The intent of VHML is to facilitate the

realistic and natural interaction of a Talking Head/

Talking Human with a user. VHML is not directly

related with motion capture data, but allows for the

emotional expression of characters using facial

expressions, gestures and body language, by means

of a markup language based on XML.

In addition, Virtual Reality Modeling Language

(VRML) [12] is a language designed to simulate

three-dimensional environments on the web, and H-

ANIM [13] is a standard specification established

by the Web3D Consortium, and which describes the

structure to be used for the three-dimensional mod-

eling of an Avatar. The specification of humanoids

in H-ANIM follows the standard method of repre-

senting humanoids used in VRML 2.0. The struc-

Page 3: MCML: motion capture markup language for integration of heterogeneous motion capture data

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 115

ture of the Humanoid node of H-ANIM is similar

to the structure of motion capture data, because that

node serves as the overall container for the Joint,

Segment, Site and Viewpoint nodes, which define

the skeleton, geometry and landmarks of the human

figure. The particular interest of our system is in

the archiving and exchanging of motion capture

files in different formats. H-ANIM is a good

language for representing human beings in an

online virtual environment, but it is too complex

to be a standard motion capture format, because it

has too many additional features.

3. Overview of motion capture file formats

Motion capture file formats can be roughly divided

into two kinds, the Tracker Format and the Skeleton

Format, according to the method used for processing

the motion capture data. The former only has three-

dimensional location values and accepts the Adaptive

Optics Associates (AOA), Coordinate 3D (C3D) and

Tracked Row Column (TRC) formats. The latter has

skeleton information as well as three-dimensional

location values and accepts the BVH, Biovision data

(BVA), HTR, ASF/AMC, Lamsoft Magnetic format

BRD, Polhemous DAT files and Ascension ASC files

formats [7,9]. These file formats have varying data file

structures, therefore, only the most commonly used

file structures are taken into account in this paper.

The .bvh file format of Biovision hierarchical BVH

was developed by Biovision, a motion capture data

service company, and this format also provides skel-

eton hierarchy information as well as motion data.

Motion Analysis Corporation’s motion capture file

has the .trc format. Acclaim Motion Capture System’s

file format, .asf, is based on the definition of a

skeleton in the form of joints and bones, and a

hierarchical structure and features based upon joint

rotation data. These data file structures provide skel-

eton hierarchy information as well as motion data.

These files contain the three-dimensional coordinate

values of all the markers corresponding to the frames,

and the human body hierarchy structure in the motion

consists of a 23-segments system.

The BVH Format of Biovision is divided into the

HIERARCHY SECTION and the MOTION SEC-

TION. The HIERARCHY SECTION defines the skel-

eton structure of the Avatar. The skeleton defined in

BVH is made up of a total of 18 joints. It is composed in

such a way that the hip assumes the role of the root and

each segment is jointed toward the left lower part, right

lower part and upper part, in that order. The MOTION

SECTION is structured with Euler’s angle applied to

each joint [7].

4. Overview of MCML

4.1. Weakness of motion capture

Motion capture seems to provide the best way of

inputting realistic, natural motion into a computer

when a skilled animator is not available. However,

motion capture has one major weakness, in that it is

very difficult to edit the captured motion without

degrading its quality. Because each frame is a key-

frame, it causes any changes made by the animator to

result in jerky motion, if the animator does not rebuild

the motion curves. Commercially available motion

capture libraries are difficult to use as they often

include hundreds of examples which can only be

browsed by using the name of the actions they

contain. The motion capture library is not a database

but just a collection of motion capture files [3,5,8].

A significant problem which arises when using

motion capture in the production environment stems

from the lack of integration among motion capture

hardware/software developers and animation package

developers. For example, Maya requires the use of a

third party MEL script to import BVH or ASF/AMC

files. Thus, an animator must try to get the program to

work properly with the supplied motion files. It also

restricts the animator to using a specific program once

he or she has decided to use motion capture [3].

4.2. Goals of MCML

The purpose of MCML is not only to facilitate the

conversion or integration of different formats, but also

to allow for greater reusability of motion capture data,

through the construction of a motion database storing

the MCML documents. To construct a motion data-

base based on a relational or XML database, the

motion capture files must first be converted into the

corresponding MCML documents. Thus, the primary

Page 4: MCML: motion capture markup language for integration of heterogeneous motion capture data

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130116

goal of MCML is to facilitate the storage and retrieval

of motion capture data to/from a database.

The second goal of MCML is to provide a common

format which enables the exchange of motion capture

files among animators. If commercially available

animation software packages provide support for

MCML, animators will not need to worry about

whether or not he or she has the appropriate plug-

ins for his or her animation software.

4.3. Scope of MCML

Motion capture formats are divided into ASCII

data type and binary data type. ASCII files can readily

include descriptions associated with parameters, and

are readily manipulated by the use of common text

editors. However, they are very inefficient for storage

and access, and very large files may pose problems for

some editors. Also, ASCII files must generally be

accessed sequentially and are very inefficient if the

files need to be read non-sequentially.

Binary files are efficient in terms of data storage

and access and may also contain parameters and

associated descriptions, but not in a form casually

accessible to the user. Also, the file organization is

specific to the type of data stored, i.e. the data and any

associated parameters may only be accessed by spe-

cifically written programs which have a detailed

knowledge of the file structure.

To design MCML, we chose to analyze the

structure of the three most popular types of ASCII

files, ASF/AMC, BVH and HTR, which are industry

standard formats and are supported by many kinds of

motion software. These formats contain skeleton

information and are superior to segment-based for-

mats such as BVA and TRC. In addition, most

motion capture files contained in commercially avail-

able motion capture libraries are made with these

three formats. Although the C3D format is a binary

format, we are able to handle C3D files, because

these files can be converted into BVH or ASF/AMC

files.

5. MCML DTD specifications

MCML document Type Definition (DTD) defines

the logical structure of an MCML document. DTD

defines the elements which are allowed, and a

validating parser compares the DTD rules against a

given document to determine the validity of the

document.

In this section, we describe the tags and element

structure ofMCML. First, we define the tag names after

analyzing the bone names and keywords contained in

the motion capture file formats. Second, we define the

logical structure of an MCML document.

5.1. Tags of MCML

5.1.1. Tags for header data

The motion data file has a header data area

containing supplementary information, such as the

file type, version, comments, etc. Of course, some

files, such as BVH files, do not have header infor-

mation. MCML provides a set of tags so that it can

include all the header information of these different

kinds of files. Table 1 shows the mapping of the

header data.

MCML defines a set of tags for the header infor-

mation, in order to maintain all of this information,

and new header information can be additionally

defined.

5.1.2. Bone names of skeleton

ASF/AMC, BVH and HTR formats describe the

skeleton which is composed of a number of bones,

usually in a hierarchical structure. The bone is the

basic entity used when representing a skeleton. Each

bone represents the smallest segment within the mo-

tion that is subject to individual translation and

orientation changes during the animation. These three

formats have different bone names and different

hierarchical structures for the skeleton. To integrate

these into one unified format, we create a more

detailed hierarchical structure for the skeleton and

define the bone names. Table 2 shows the MCML

bone names and the corresponding names of these

three file formats.

ASF/AMC file formats use the names of the

human bones and the BVH file format uses the

names of marker locations to represent the joints of

the body. Although HTR uses the names of the

human bones, it has only a few names since it is a

format released in an initial stage. In order to be able

to integrate these different types of files, MCML has

Page 5: MCML: motion capture markup language for integration of heterogeneous motion capture data

Table 1

Mapping between the tags of MCML and the keywords of BVH, ASF/AMC and HTR Files, in order to represent the header information of the

motion capture data

MCML ASF/AMC BVH(1) BVH(2) HTR

1 filetype undefined undefined undefined FileType

2 datatype undefined undefined undefined DataType

3 filename undefined undefined undefined undefined

4 version version undefined undefined FileVersion

5 skeleton_name name undefined undefined undefined

6 units units undefined undefined undefined

Attribute: Attribute:

mass mass

length length

angle angle

7 num_segments undefined undefined undefined NumSegments

8 num_frames undefined undefined undefined NumFrames

9 dataframe_rate undefined undefined undefined DataFrameRate

10 euler_rotation_order undefined undefined undefined EulerRotationOrder

11 calibration_unit undefined undefined undefined CalibrationUnits

12 rotation_unit undefined undefined undefined RotationUnits

13 global_axis_of_gravity undefined undefined undefined GlobalAxisogGravity

14 bone_length_axis undefined undefined undefined BoneLengthAxis

15 scale_factor undefined undefined undefined ScaleFactor

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 117

extended power of expression so that it can contain

all of these three formats. The contents of Table 2

are used for mapping between MCML and the

various capture formats in the system implemented

in this paper.

5.1.3. Tags for character skeleton

Motion capture data contains hierarchy informa-

tion about the modeled character. That is, the

character to which the motion will be applied is

defined in the same file. The ASF/AMC motion

capture file of Acclaim manages the character and

motion separately. The ASF file contains a hierar-

chy and initial location information for the charac-

ter, and the AMC file contains the motion

information for the character. The advantages of

this separation are the possibility to apply the same

motion to other characters of similar size and

skeleton, and the potential reuse of characters. On

the other hand, the BVH and HTR file formats

contain character information and motion informa-

tion in the same file, so that they have low reuse

rates. MCML can separate human body joint infor-

mation from header information, joint hierarchy

information and motion information. MCML can

also mix the various parts and make them into one

file. Therefore, animators can attain high reuse rates

and avoid duplication.

Body hierarchy information includes the cor-

responding location and joint angle information of

each joint bone, in order to be able to perform

modeling of the human body from the root. If there

is any error in this hierarchy information, the modeled

character will have a strange appearance.

MCML has element and attribute sets that can

represent the hierarchical structure, joint length and

comparative distance from the root and each joint for

the modeling of a human character. The name element

of MCML in Table 3 has the name of each joint, and

the value of this name element should be one of the

bone names defined in Table 2.

The MCML document of Fig. 1 shows an example

of the hierarchical structure of a character. The root

element designates the initial location of the character

and the skeleton designates the hierarchy structure,

location and size of each joint. The hierarchy structure

can be expressed using a comprehension relation of

the bone elements.

Page 6: MCML: motion capture markup language for integration of heterogeneous motion capture data

Table 2

Mapping between the bone names of MCML and the bone names of ASF/AMC, BVH and HTR File formats expressing human joints

MCML ASF/AMC BVH(1) BVH(2) HTR

1 root h_root root root undefined

2 head h_head(head) head head head

3 neck1 h_neck1(upperneck) neck neck undefined

4 neck2 h_neck2 undefined undefined undefined

5 left_shoulder h_left_shoulder(lclavicle) leftcollar lshoulderjoint undefined

6 left_up_arm h_left_up_arm(lhumerus) leftuparm lhumerus lupperarm

7 left_low_arm h_left_low_arm(lradius) leftlowarm lradius llowarm

8 left_wrist (lwrist) undefined undefined undefined

9 left_hand h_left_hand(lhand) lefthand lwrist lhand

10 left_fingers h_left_fingers(lfingers) undefined undefined undefined

11 left_finger_one h_left_finger_one(lthumb) undefined undefined undefined

12 left_finger_two h_left_finger_two undefined undefined undefined

13 left_finger_three h_left_finger_three undefined undefined undefined

14 left_finger_four h_left_finger_four undefined undefined undefined

15 left_finger_five h_left_finger_five undefined undefined undefined

16 right_shoulder h_right_shoulder(rclavicle) rightcollar rshoulderjoint undefined

17 right_up_arm h_right_up_arm(rhumerus) rightuparm rhumerus rupperarm

18 right_low_arm h_right_low_arm(rradius) rightlowarm rradius rlowarm

19 right_wrist (rwrist) undefined undefined undefined

20 right_hand h_right_hand(rhand) righthand rwrist rhand

21 right_fingers h_right_fingers(rfingers) undefined undefined undefined

22 right_finger_one h_right_finger_one(rthumb) undefined undefined undefined

23 right_finger_two h_right_finger_two undefined undefined undefined

24 right_finger_three h_right_finger_three undefined undefined undefined

25 right_finger_four h_right_finger_four undefined undefined undefined

26 right_finger_five h_right_finger_five undefined undefined undefined

27 torso_1 h_torso_1(upperback) chest1 upperback torso

28 torso_2 h_torso_2(thorax) chest2 thorax undefined

29 torso_3 h_torso_3(lowerback) undefined undefined undefined

30 torso_4 h_torso_4 undefined undefined undefined

31 torso_5 h_torso_5 undefined undefined undefined

32 waist h_waist undefined undefined undefined

33 hips undefined hips hips undefined

34 left_hip h_left_hip(lhipjoint) undefined undefined undefined

35 left_up_leg h_left_up_leg(lfemur) leftupleg lfemur lthight

36 left_low_leg h_left_low_leg(ltibia) leftlowleg ltibia llowleg

37 left_foot h_left_foot(lfoot) leftfoot lfoot lfoot

38 left_toes h_left_toes(ltoes) undefined undefined undefined

39 left_toe_one h_left_toe_one undefined undefined undefined

40 left_toe_two h_left_toe_two undefined undefined undefined

41 left_toe_three h_left_toe_three undefined undefined undefined

42 left_toe_four h_left_toe_four undefined undefined undefined

43 left_toe_five h_left_toe_five undefined undefined undefined

44 right_hip h_right_hip(rhipjoint) undefined undefined undefined

45 right_up_leg h_right_up_leg(rfemur) rightupleg rfemur rthight

46 right_low_leg h_right_low_leg(rtibia) rightlowleg rtibia rlowleg

47 right_foot h_right_foot(rfoot) rightfoot rfoot rfoot

48 right_toes h_right_toes(rtoes) undefined undefined undefined

49 right_toe_one h_right_toe_one undefined undefined undefined

50 right_toe_two h_right_toe_two undefined undefined undefined

51 right_toe_three h_right_toe_three undefined undefined undefined

52 right_toe_four h_right_toe_four undefined undefined undefined

53 right_toe_five h_right_toe_five undefined undefined undefined

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130118

Page 7: MCML: motion capture markup language for integration of heterogeneous motion capture data

Table 3

The mapping between the tags of MCML and the keywords of ASF/

AMC, BVH and HTR files, in order to represent the character

skeleton of the motion capture data

MCML ASF/AMC BVH HTR

1 root root root undefined

Attribute: Attribute: Attribute:

order order channels

axis axis channels

position position offset

orientation orientation

2 skeleton bonedata hierarchy

3 bone bonedata hierarchy SegmentName

Attribute: Attribute: and Hierarchy

id id

4 name name undefined undefined

5 direction direction undefined undefined

6 length length undefined BoneLength

7 position position undefined undefined

8 axis axis undefined undefined

9 order order undefined undefined

10 dof dof undefined undefined

11 limits limits undefined undefined

12 bodymass bodymass undefined undefined

13 cofmass cofmass undefined undefined

14 offset undefined offset undefined

15 channels undefined channels undefined

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 119

5.1.4. Tags for motion data

Motion data is composed of the total number of

frames, the time per frame, number of translations

Fig. 1. An example of a character sk

per frame, number of rotations per frame, etc.

Table 4 shows the mapping of the motion data

and Fig. 2 shows an example of the motion

element in a MCML document. Motion data

describes the animation of each bone over a period

of time. We can examine a series of lines pertain-

ing to a frame of animation for each of the

segments defined in the skeleton in motion capture

files. The animator may not understand the move-

ment of the character in each frame when exam-

ining the frame data in the files, because motion

capture files contain a large number of frame lines

and the structure of frame data is complex. The

separation and dislocation of a specific frame and

the readjustment of the frame lines are difficult

tasks to perform manually.

It is relatively easy to pick out a specific frame or

specific joint motion and also possible to perform

dislocation, combination or separation by separating

each frame and the joint of each frame in MCML.

The frame_bone, the subelement of the frame ele-

ment of MCML, has translation and rotation values

for each joint angle in a particular frame. The

motion_name describes the motion shown in this

particular frame for each joint. Even though motion

capture data is saved in permanent storage, it is

difficult to find specific motions. Especially, if a

motion is stored in a motion capture file format other

eleton described with MCML.

Page 8: MCML: motion capture markup language for integration of heterogeneous motion capture data

Table 4

The mapping between the names of MCML, the names of ASF/

AMC, BVH and HTR iles in order to represent the motion

information of the motion capture data

MCML ASF/AMC BVH HTR

1 motion motion

2 frames frames

3 frametime frametime

4 frame frame frame frame

Attribute: Attribute: Attribute: Attribute:

id – #Fr frame#

Tx Tx Tx

Ty Ty Ty

Tz Tz Tz

Rx Rx Rx

Ry Ry Ry

Rz Rz Rz

5 frame_bone

Attribute:

name

Tx

Ty

Tz

Rx

Ry

Rz

6 frame_name

7 motion_name

Attribute:

start_frame

end_frame

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130120

than MCML, it is difficult to determine the motion of

specific motion data without checking it with a

viewer.

If a set of motion data is stored after being

converted into MCML, individual frames can be

retrieved through the use of an XML query language,

such as XQuery, or the use of a regular path ex-

pression language, such as XPath. To make searching

for specific motions possible, motion names cor-

responding to each motion are stored in each frame

so that searching with the motion name can be

performed.

An animator does not need to perform motion

capture for his or her new character, rather he or she

can search for the desired motion among the existing,

stored motion data by means of the motion name (e.g.

left-arm_abduction for the opening motion of the left

arm) and apply existing motion data to the new

character easily by modifying it.

The standard terms used to represent the motion of

the human body are as follows:

� flexion: bending motion, bending finger, bending

oneself� extension: stretching motion, stretching finger,

stretching oneself� abduction: opening, opening arms, opening legs� adduction: closing, closing arms, closing legs� medial (internal) rotation: turning to inside� lateral (external) rotation: turning to outside� left or right rotation: turning neck or body to left or

right

Other motions of every part of the human body,

such as the legs, body, head and shoulders are also

defined. At this time, impossible motions of the

human body should be avoided. For example, a

motion such as left-shoulder-abduction (opening

shoulder) cannot be made, so the corresponding

motion name should not be defined.

5.2. MCML document structure

The mcml element, which is the root element of

MCML, is composed of the meta element, which

expresses the metadata, the header element, the skel-

eton element and the motion element.

5.2.1. Meta element

MCML metadata is based on eight elements and

describes the contents of an MCML document. This is

depicted in Fig. 3.

The title element is the name given to the MCML

document by the creator. The creator element is the

person(s) or organization(s) which created the origi-

nal motion capture file. The subject element is the

topic of the actions contained in the motion capture

file (for example, ballet, dance, etc.). The description

element is a textual description of the actions

contained in the motion capture file. The date element

is the creation date of the motion capture file. The

format element is the data representation of the

motion capture file, such as ASF/AMC, BVH or

HTR. The duration element is the playing time of

the frames contained in the motion capture file. It is

equal to the number of frames multiplied by the frame

rate. The category is used to classify the type of

Page 9: MCML: motion capture markup language for integration of heterogeneous motion capture data

Fig. 2. Part of an MCML Document which shows an example of a character’s motion.

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 121

motion capture data (for example, sports, locomotion,

human interaction, etc.)

5.2.2. Header element

The header element is composed of 15 subele-

ments which are depicted in Fig. 4. These elements

Fig. 3. The structure of

are relevant to the HTR file format. The header

element is used to convert an MCML document into

an HTR file. However, we can specify the values of

the filetype, filename, num_frames, dataframe_rate

elements during the conversion process between

ASF/AMC and BVH files and MCML documents.

MCML metadata.

Page 10: MCML: motion capture markup language for integration of heterogeneous motion capture data

Fig. 4. The structure of MCML header element.

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130122

5.2.3. Skeleton element

The MCML skeleton element represents the hier-

archical structure of a human figure. In Fig. 5, we

examine the logical structure of the skeleton element.

The root element describes the parent of the

hierarchy. The axis and order attributes describe the

order of operations for the initial offset and root node

transformation. The position attribute describes the

root translation of the skeleton and the orientation

attribute defines the rotation.

The skeleton element has one or more bone ele-

ments. The skeleton element may have one bone

element as its direct children in the case of the

hierarchical structure formats such as ASF/AMC,

BVH and HTR. For any formats which do not contain

skeleton information, such as the BVA and TRC

formats, the skeleton element may directly contain

the various bone elements. The BVA file format

created by Biovision is very simple because it lists

all nine possible transformations without allowing for

any changes in the order. The TRC file format is

generated by Motion Analysis optical motion capture

systems and contains translational data only without

hierarchy definition.

The hierarchical structure of the bone element may

be recursive to represent the skeleton information. The

Fig. 5. The structure of MC

PCDATA of the name element is the bone name

according to the bone naming rule shown in Table

2. There is no nesting structure for bone elements in

the case of simple formats such as BVA and TRC.

5.2.4. Motion element

The MCML motion element is composed of one or

more frame elements and zero or more motion_name

elements. This element actually describes the anima-

tion of each bone over time. The logical structure of

the motion element is shown in Fig. 6.

The frames element is the number of frames and the

frametime element is the playing time for each frame.

The frame element has one or more frame_bone

elements to represent the actions of each bone defined

in the skeleton element. One frame_bone element

represents one frame in the motion capture files.

The frame_name and motion_name elements are

used to specify the names of the actions contained in the

motion data. The frame_name is the name of the

primary action for each frame and the motion_name

is the name of the motion sequence. The start_frame

attribute points to the start frame of the motion se-

quence. Also, the end_frame attribute points to the end

frame of the motion sequence (for example, <motion_

name start_frame = ’’5’’ end_frame = ’’24’’> sitting af-

ML skeleton element.

Page 11: MCML: motion capture markup language for integration of heterogeneous motion capture data

Fig. 6. The structure of the MCML motion element.

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 123

ter walking < /motion_name>). We can use these ele-

ments to retrieve the specific motion clips in a query

expression if the motion capture data is stored in a

database.

6. Motion data management based on MCML

6.1. System architecture

The system referred to in this paper is composed of

a front-end and a back-end. The front-end has some

modules which are used to process conversion be-

tween motion capture data files and MCML docu-

ments. The back-end is a storage server used to store

and retrieve MCML documents. We can construct a

database of motion capture data using this storage

server.

The front-end of the system consists of the Mocap

Syntax Analyzer, the Mapping Manager, the MCML

Converter, the MCML Editor, and the Motion View-

er. The Mocap Syntax Analyzer analyzes the syntax

of the imported motion capture data file and gener-

ates tokens that are stored in the token table. The

Mapping Manager manages the mapping table that

has two kinds of mapping information, and which

arbitrates between the motion capture data formats

and the MCML tag set. One kind of information is

the mapping information between the joint names of

the motion capture data formats and the joint names

of MCML. The other is the mapping information

between the keywords of the motion capture data

formats and the tags of MCML. The MCML Con-

verter takes charge of the conversion between the

motion capture data files and the MCML documents.

The MCML Editor provides functions to edit the

MCML documents and the Motion Viewer provides

the animated motion from the motion capture data

files.

The back-end of the system consists of the MCML

Storage Wrapper and the database of MCML docu-

ments and provides services to store and retrieve

MCML documents (Fig. 7).

The main goal of our system is not to improve on

the motion editing functionality of commercial ani-

mation software, but to enhance the reusability of

motion capture data through the use of an XML-

based motion database. So, the core functions of our

system are the automatic generation of MCML docu-

ments from motion capture data files and the storage

and retrieval of MCML documents contained in the

motion database.

6.2. Core modules

6.2.1. Mocap syntax analyzer

If a motion capture data file of ASF/AMC, BVH or

HTR format is imported, the Mocap Syntax Analyzer

Page 12: MCML: motion capture markup language for integration of heterogeneous motion capture data

Fig. 7. System architecture for converting, retrieving, editing, reprocessing and retargeting motion capture data based on MCML.

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130124

starts syntax analysis. The syntax analyzer extracts

tokens and values while scanning the file and it stores

a pair of tokens and values in a token table. A token

table is composed of header tokens, skeleton tokens

and motion tokens. The MCML Converter uses this

token table and the mapping table described below to

generate the MCML documents.

The Mocap Syntax Analyzer is implemented using

component-based programming for extensibility.

Each of the above data formats, ASF/AMC, BVH

and HTR, has its own syntax analyzer. The Mocap

Syntax Analyzer incorporates these three syntax

analyzers. When a motion capture data file is

imported into the system, the Mocap Syntax Analyz-

er checks the data format of the file and invokes the

appropriate syntax analyzer to process this particular

format. If a new motion capture data format needs to

be added, we can implement the requisite syntax

analyzer which can interpret this new format without

changing the system and simply add it to the Mocap

Syntax Analyzer. The Mocap Syntax Analyzer pro-

vides the common interface required to implement

syntax analyzer classes.

6.2.2. Mapping manager

The mapping table consists of the tag mapping

table and the joint mapping table. The joint mapping

table stores the information required to provide map-

ping between the MCML joint (bone) names and the

joint (bone) names of the various motion capture data

formats depicted in Table 2. The tag mapping table

stores the information required to provide mapping

between the MCML tags and the keywords of the

various motion capture data formats depicted in

Tables 1 and 3). These two tables constitute the

dictionary which the MCML Converter refers to

during the document conversion process. The Map-

ping Manager provides a management function which

enables the system administrator to manage these

mapping tables.

Page 13: MCML: motion capture markup language for integration of heterogeneous motion capture data

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 125

6.2.3. MCML converter

MCML document conversion is divided into for-

ward conversion and reverse conversion. Forward

conversion consists of receiving a token table which

is generated by the Mocap Syntax Analyzer, the map-

ping table, and the imported motion capture data file

and generating an MCML document corresponding to

that file.

Reverse conversion involves converting an MCML

document which is loaded from the MCML repository

or created from the results of a query to a specific

motion capture data format.

The MCML Converter is implemented with compo-

nent based programming, as in the case of the Mocap

Syntax Analyzer, and consists of various subcom-

ponents—ASF/AMC_to_MCML, BVH_to_MCML,

HTR_to_MCML, MCML_to_ASF/AMC, MCML_

to_BVH, and MCML_to_HTR. If a new motion cap-

ture data format is introduced, we can implement a

new component which handles the conversion be-

tween the new format and MCML documents and

add it to the MCML Converter.

Fig. 8 depicts a conversion process between motion

capture data files and MCML documents. When the

MCML Converter receives the tokens from the Mocap

Syntax Analyzer, it checks which section includes

these tokens and finds the corresponding MCML tags

Fig. 8. Process to convert motion capture data file in BV

in the tag mapping table. Then, the MCML Converter

creates the MCML document. A DTD document called

mcml.dtd is used to create the MCML documents.

Some empty frame_name and motion_name ele-

ments are created during the conversion process. The

animator uses the MCML Editor after the generation

of the MCML document to assign values to these

empty elements. We can describe the movement in a

specific frame using frame_name elements. Also, we

can describe motion sequences using motion_name

elements, which have start_frame and end_frame

attributes. These attributes point to the start frame

and end frame of the motion sequence.

6.2.4. MCML Editor

An animator can edit the contents of an MCML

document using the MCML Editor. The MCML

Editor is not designed for editing the motion of a

character, but for editing MCML documents. We

can merge MCML fragments contained in multiple

MCML documents into a new MCML document.

The MCML Editor also provides the same function-

ality that ordinary XML editors have, i.e. open,

save, and print document and cut, copy, and paste

block.

The MCML Editor shows the contents of an

MCML document either as a tree structure or in text

H Format into corresponding MCML document.

Page 14: MCML: motion capture markup language for integration of heterogeneous motion capture data

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130126

form. Using the tree structure allows the skeleton of a

character and the frames representing the motion to be

easily understood.

6.2.5. Motion viewer

We can animate the motion of a character using the

Motion Viewer. The Motion Viewer can only process

motion capture data files. When a motion capture data

file is imported or generated by reverse conversion

from an MCML document, we check the motion

using this module. The Motion Viewer shows the

markers and joints of a character and controls the

playing of frames.

6.2.6. MCML Repository

The MCML Repository takes charge of the storage

and retrieval of an MCML document. We can imple-

ment the MCML repository using a relational data-

base, an object-oriented database, or an XML

database. In this study, we used both a relational

database and an XML database as the MCML Repos-

itory. The eXcelon DXE Manager [4] is used for

Fig. 9. The relational database schema

storing the MCML documents. The eXcelon DXE

Manager supports searching for XML documents, as

well as the storage of XML documents. To handle the

problems derived from the difference between the

relational data model and the MCML data model,

we used the MCML Storage Wrapper, which is

located between the MCML Converter and the

MCML Repository. The MCML Storage Wrapper

provides storage independence for the MCML man-

agement system.

6.3. Storage and retrieval of MCML documents

To create the motion database, we convert motion

capture data files to MCML documents and store

them in a relational database or XML database. Fig.

9 shows a schema used to store MCML documents

in a relational database. The DocInfo table is a

master table that contains information about an

MCML document. The SkeletonRoot and Skeleton

tables contain the skeleton information for each

character. The FrameInfo and Frames tables contain

for storing MCML documents.

Page 15: MCML: motion capture markup language for integration of heterogeneous motion capture data

Fig. 10. MCML documents are stored in the XML database, eXcelon DXE Manager.

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 127

the frame information for each character. The Join-

tMap table contains the mapping information be-

tween the MCML joint names and the joint names

of the motion capture data formats. The TagMap

table contains mapping information between the

MCML tags and keywords of the motion capture

data formats.

An MCML document is a subset of an XML docu-

ment and its tags are fixed. An animator can’t define

additional tags or modify the MCML tags. So we pro-

pose a schema which can be used to store MCML

structures in a relational data model directly, without

having to map the tree structure to the relational data

model.

To store MCML documents in a relational data-

base, the MCML Storage Wrapper extract tags and

values when parsing MCML documents and creates

SQL statements which are used to insert the MCML

documents into the relational tables.

Fig. 10 shows a list of MCML documents and

the contents of a selected document stored in an

XML database. We use DXE Manager of eXcelon

as the MCML Repository in this study. A stored

MCML document can be viewed either in the form

of text or as a hierarchy structure in a database.

Therefore, the animator can examine the contents of

an MCML document without loading it into main

memory.

SQL, Xpath or XQuery can be used to retrieve

motion data from the motion database. SQL is

used to retrieve the motion data stored in the

relational database. XPath and XQuery are used

to retrieve the motion data stored in the XML

database.

An animator can specify constraints on the

query statements used to retrieve the desired mo-

tion data, such as the length of motion, where the

body or individual joints should be or what the

Page 16: MCML: motion capture markup language for integration of heterogeneous motion capture data

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130128

body needs to be doing at particular times. An

animator can also specify the scope of the search

as being either from one document or multiple

documents.

For example, query 1 is the Xpath expression used

to retrieve motion data whose length is less than 180

frames and for which the character motion looks like

‘‘walking’’. The num_frames element contains the

number of frames in the MCML document and the

motion_name element has two attributes which spec-

ify the start frame and end frame of a specific motion

sequence and the motion name relating to that motion

sequence.

After the execution of query 1, the result is

returned. Part of this result is depicted in Result 1.

It tells us that there is a motion sequence with frames

from 1st frame to 12th frame which is named

‘turning’ motion, and a motion sequence with frames

from 13th frame to 91st frame which is named

‘walking’ motion.

Query 2 is the XPath expression used to retrieve the

skeleton data of the character that satisfies the con-

straints related to this skeleton. Result 2 shows a part of

the result obtained after the execution of query 2.

7. Conclusion and future works

The motion capture method, as one of the motion-

creating technologies employed in three-dimensional

animation, is widely used for manufacturing animation

since it produces high-quality character motion similar

to the actual motion of the human body. However,

motion capture has a significant weakness due to the

lack of an industry-wide standard for archiving and

exchanging motion capture data. It has low flexibility.

Creating all of the required motions using capture is

nonproductive and sometimes impossible due to the

difficulties and costs involved. Therefore, animators

frequently try to create slightly different motions, by

reusing previously captured motion data, and to create

new composite motions which are the synthesis of

Page 17: MCML: motion capture markup language for integration of heterogeneous motion capture data

Hyun-Sook Chung received her BS degree

in physics from the Catholic University of

Daegu, Korea in 1993 and her MS degree in

computer science from the Catholic Univer-

sity of Daegu, Korea in 1995. She is cur-

rently working towards her PhD degree in

computer science at Yonsei University,

Korea. She worked as a research scientist

at the CAD/CAM Research Center of KIST

(Korea Institute of Science and Technology)

Seoul, South Korea, from 1997 to 1999. Her

research interests include multimedia document engineering, hu-

man–computer interaction, multimedia systems, and XML. She is

a member of the Korea Information Science Society, the Korea

Information Processing Society, and the Korea Multimedia Society.

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 129

various individual motions. However, it is very difficult

for the animator to obtain motions that do exactly what

he or she wants, because commercially available mo-

tion capture libraries are not databases, but just a

collection of motion capture files.

In order to solve this problem, we propose a stan-

dard format based on XML called MCML. We also

propose a system framework based on this standard

format for motion storage and retrieval. The purpose of

MCML is not only to facilitate the conversion or

integration of different formats, but also to allow for

greater reusability of motion capture data, through the

construction of a motion database based on MCML.

There is a standard way of representing human

beings in online virtual environments such as H-ANIM

and VRML. However, these languages do not process

motion capture files and their structure is too complex

to use as a standard motion capture format. So far, no

standard format for integrating, storing and retrieving

motion capture data with relational databases or XML

databases has been defined.

If the MCML documents are stored in a motion

database, it is easy for the animator to obtain motions

that do exactly what he or she wants by using a database

query language such as SQL or XQuery. This offers

many advantages for motion synthesis or motion edit-

ing applications. Also, in order to provide increased

security for the data and more convenient data man-

agement, commercial animation software could be

used in conjunction with a database for the storage of

the motion capture data. Thus, MCML can improve the

reusability of the motion capture data.

We propose a system framework that can be used to

manage theMCML documents, and a motion database,

which is based on a relational database or XML

database. Our system has many core modules for

dealing with motion capture files and MCML docu-

ments—MoCap Syntax Analyzer, Mapping Manager,

MCML Converter, Motion Viewer, etc.

The design of MCML does not end with this initial

version. We plan to develop future versions with

enhanced functionality and to provide maintenance

which allows for other motion capture formats to be

used. Above all, we will endeavor to support the

compatibility between MCML and the H-ANIM hu-

manoid format, because H-ANIM is the upcoming

standard way of representing humanoids in the web-

based environment.

Acknowledgements

This work was supported in part by BERC/KOSEF

and in part by Brain Neuroinformatics Program

sponsored by KMST.

References

[1] Acclaim, ASF/AMC File Specifications page, http://www.

darwin3d.com/gamedev/acclaim.zip.

[2] Biovision, BVH Specifications page, http://www.cs.wisc.edu/

graphics/courses/cs-838-1999/Jeff/BVH.html.

[3] C.R. Morales, Development of an XML web based motion

capture data warehousing and translation system for collabo-

rative animation projects, Proceedings of the 9th International

Conference in Central Europe on Computer Graphics, Visual-

ization and Computer Vision, 2001.

[4] eXcelon DXE Manager, http://www.excelon.com.

[5] L.M. Tanco, A. Hilton, Realistic synthesis of novel human

movements from a database of motion, Proceedings of the

IEEE Workshop on Human Motion HUMO, 2000.

[6] Motion Analysis, HTR Specification page, http://www.cs.wisc.

edu/graphics/Courses/cs-838-1999/Jeff/HTR.html.

[7] M. Meredith, S. Maddock, Motion capture file formats ex-

plained, Department of Computer Science Technical Report

CS-01-11.

[8] O. Arikan, D.A. Forsyth, Interactive motion generation from

examples, Proceedings of SIGGRAPH, 2002.

[9] S. Rosenthal, B. Bodenheimer, C. Rose, J. Pella, The process

of motion capture: dealing with the data, Proceedings of the 8th

Eurographics Workshop on Animation and Simulation, 1997.

[10] A. Marriott, VHML—virtual human markup language, Pro-

ceedings of Talking Head Technology Workshop, 2001.

[11] W3C, Extensible Markup Language (XML) 1.0, http://

www.w3c.org/XML, 1998.

[12] Web3D Consortium, VRML International Standard, http://

www.web3d.org/technicalinfo/specifications/ISO_IEC_

14772-All/index.html.

[13] Web3D Consortium, H-ANIM 2001 Specification, http://www.

h-anim.org/Specifications/H-Anim2001/.

Page 18: MCML: motion capture markup language for integration of heterogeneous motion capture data

Yillbyung Lee has been a professor in the

Department of Computer Science, Yonsei

University since 1986. He received his BE

degree in Electronic Engineering from

Yonsei University, Korea in 1976 and his

MS degree in computer science from the

University of Illinois, USA in 1980 and his

PhD degree in computer science from the

University of Massachusetts, USA in 1985.

At present, he leads the Artificial Intelli-

gence Lab at Yonsei University. He is the

president of the Korean Cognitive Science Society and a vice

president of the Korean Data Mining Society. His main areas of

interest are Document Recognition, Data Mining, multimedia, and

H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130130

Computational Models of Vision and Biometrics.