Upload
hyun-sook-chung
View
224
Download
0
Embed Size (px)
Citation preview
www.elsevier.com/locate/csi
Computer Standards & Interfaces 26 (2004) 113–130
MCML: motion capture markup language for integration of
heterogeneous motion capture data
Hyun-Sook Chung*, Yilbyung Lee
AI Laboratory Department of Computer Science, Yonsei University, 134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea
Received 23 February 2003; received in revised form 27 May 2003; accepted 31 May 2003
Abstract
Motion capture technology is widely used for manufacturing animation since it produces high-quality character motion similar
to the actual motion of the human body. However, motion capture has a significant weakness due to the lack of an industry-wide
standard for archiving and exchanging motion capture data. It is difficult for animators to reuse and exchange motion capture data
with each other. In this paper, we propose a standard format for integrating different motion capture file formats. Our standard
format is called Motion Capture Markup Language (MCML). It is a markup language based on eXtensible Markup Language
(XML). The purpose of MCML is not only to facilitate the conversion or integration of different formats, but also to allow for
greater reusability of motion capture data, through the construction of a motion database storing the MCML documents.
D 2003 Elsevier B.V. All rights reserved.
Keywords: Motion capture file format; MCML; Markup language; XML
1. Introduction are solidified data designed for specific characters or
Motion capture technology is frequently used to
solve problems in real-time animation, because it
enables motion data to be made easily and creates
physically perfect motions. These days, motion cap-
ture plays an important role in the making of movies
or games by the larger production or game companies.
However, motion capture has a significant weak-
ness. Firstly, it has low flexibility. The high-quality
motion data created with motion capture technology
0920-5489/$ - see front matter D 2003 Elsevier B.V. All rights reserved.
doi:10.1016/S0920-5489(03)00071-0
* Corresponding author. Tel.: +82-2365-4598; fax: +82-2365-
2579.
E-mail address: [email protected] (H.-S. Chung).
circumstances, so that it is not easy to edit or modify
them for other purposes. Secondly, the captured
motion data can have different data formats depending
on the motion capture system which was employed.
Each capture system defines its own data format to
express the captured contents, ranging from a simple
format in segment form to a complex format in
hierarchical structure form. Thirdly, commercially
available motion capture libraries are difficult to use,
as they often include hundreds of examples, which
can only be browsed by using the names of the actions
they contain [3].
In this paper, we define a standard format for
integrating motion capture data with different for-
mats. Our standard format for motion capture data
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130114
is a markup language that can express motion
capture data based on eXtensible Markup Language
(XML) [11], and is called Motion Capture Markup
Language (MCML). MCML defines a set of tags to
integrate Acclaim Skeleton File (ASF)/Acclaim Mo-
tion Capture data (AMC) [1], Biovision Hierarchi-
cal data (BVH) [2] and Hierarchical Translation-
Rotation (HTR) [6]. These three motion capture
data formats are the most popular formats and have
recently become supported by many kinds of mo-
tion software. MCML has an extensible structure,
by means of which new capture file formats can be
easily added.
A motion capture library is a collection of motion
capture files. It consists of a corpus of motion
capture files and descriptions of the actions contained
in these files. If there is a motion capture library, an
animator can search files with action names and
navigate files according to the category information.
However, a motion capture library is not as good at
storing and retrieving motion capture files as a data-
base. The method of searching a motion capture library
is such that only the action name can be specified, and
the user doesn’t have the ability to retrieve specific
frames or motion clips within a set of motion capture
files containing similar motions. Furthermore, the size
of the motion capture library may become very large
due to the duplication of files containing the same
capture data. This problem occurs because of each
motion capture file being stored in the same library
but in different formats.
We solve these problems by defining a standard
motion capture data format, MCML, which can be
used to store motion capture files in a database
and retrieve the motion clips from the database
using a query expression. By having a standard
format, we can eliminate the duplication of motion
capture files and create a compact-sized motion
database.
The structure of this paper is as follows. In
Section 2, we look at other, related studies. Section
3 summarizes the structure and contents of the
different motion capture data formats. Section 4
describes the design goals and scope of MCML.
In Section 5, the structure and contents of MCML
are explained in detail. Section 6 describes the
design and implementation of the core modules of
the MCML-based motion capture data management
system. Finally, Section 7 concludes this paper with
future research directions.
2. Related works
In this section, we summarize the related stud-
ies dealing with new markup language develop-
ment using XML as the method of representing
data in character animation, virtual reality and
other fields.
Morales [3] proposed a motion capture data
storage method based on XML. In our method,
motion capture data is stored by being converted
into XML data format, in order for animation staff
to be able to access the data in a mutually cooper-
ative environment, e.g. in the web-based environ-
ment. The system was designed in such a way that
the motion capture data could easily be used, and
XML and Active Server Page (ASP) technologies
were used for this purpose. In contrast to our study,
Morales [3] dealt only with motion capture data
stored in a simple format based on segments and did
not consider hierarchical structure. Moreover, it did
not suggest the use of a standard markup language
for motion capture data, such as the MCML lan-
guage proposed in this paper, but only alluded to the
possibility of data conversion using XML.
Secondly, the Virtual Human Markup Language
(VHML) [10], which builds on existing standards
such as those specified by the W3C Voice Browser
Activity is based on XML/XSL (XML Stylesheet
Language). The intent of VHML is to facilitate the
realistic and natural interaction of a Talking Head/
Talking Human with a user. VHML is not directly
related with motion capture data, but allows for the
emotional expression of characters using facial
expressions, gestures and body language, by means
of a markup language based on XML.
In addition, Virtual Reality Modeling Language
(VRML) [12] is a language designed to simulate
three-dimensional environments on the web, and H-
ANIM [13] is a standard specification established
by the Web3D Consortium, and which describes the
structure to be used for the three-dimensional mod-
eling of an Avatar. The specification of humanoids
in H-ANIM follows the standard method of repre-
senting humanoids used in VRML 2.0. The struc-
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 115
ture of the Humanoid node of H-ANIM is similar
to the structure of motion capture data, because that
node serves as the overall container for the Joint,
Segment, Site and Viewpoint nodes, which define
the skeleton, geometry and landmarks of the human
figure. The particular interest of our system is in
the archiving and exchanging of motion capture
files in different formats. H-ANIM is a good
language for representing human beings in an
online virtual environment, but it is too complex
to be a standard motion capture format, because it
has too many additional features.
3. Overview of motion capture file formats
Motion capture file formats can be roughly divided
into two kinds, the Tracker Format and the Skeleton
Format, according to the method used for processing
the motion capture data. The former only has three-
dimensional location values and accepts the Adaptive
Optics Associates (AOA), Coordinate 3D (C3D) and
Tracked Row Column (TRC) formats. The latter has
skeleton information as well as three-dimensional
location values and accepts the BVH, Biovision data
(BVA), HTR, ASF/AMC, Lamsoft Magnetic format
BRD, Polhemous DAT files and Ascension ASC files
formats [7,9]. These file formats have varying data file
structures, therefore, only the most commonly used
file structures are taken into account in this paper.
The .bvh file format of Biovision hierarchical BVH
was developed by Biovision, a motion capture data
service company, and this format also provides skel-
eton hierarchy information as well as motion data.
Motion Analysis Corporation’s motion capture file
has the .trc format. Acclaim Motion Capture System’s
file format, .asf, is based on the definition of a
skeleton in the form of joints and bones, and a
hierarchical structure and features based upon joint
rotation data. These data file structures provide skel-
eton hierarchy information as well as motion data.
These files contain the three-dimensional coordinate
values of all the markers corresponding to the frames,
and the human body hierarchy structure in the motion
consists of a 23-segments system.
The BVH Format of Biovision is divided into the
HIERARCHY SECTION and the MOTION SEC-
TION. The HIERARCHY SECTION defines the skel-
eton structure of the Avatar. The skeleton defined in
BVH is made up of a total of 18 joints. It is composed in
such a way that the hip assumes the role of the root and
each segment is jointed toward the left lower part, right
lower part and upper part, in that order. The MOTION
SECTION is structured with Euler’s angle applied to
each joint [7].
4. Overview of MCML
4.1. Weakness of motion capture
Motion capture seems to provide the best way of
inputting realistic, natural motion into a computer
when a skilled animator is not available. However,
motion capture has one major weakness, in that it is
very difficult to edit the captured motion without
degrading its quality. Because each frame is a key-
frame, it causes any changes made by the animator to
result in jerky motion, if the animator does not rebuild
the motion curves. Commercially available motion
capture libraries are difficult to use as they often
include hundreds of examples which can only be
browsed by using the name of the actions they
contain. The motion capture library is not a database
but just a collection of motion capture files [3,5,8].
A significant problem which arises when using
motion capture in the production environment stems
from the lack of integration among motion capture
hardware/software developers and animation package
developers. For example, Maya requires the use of a
third party MEL script to import BVH or ASF/AMC
files. Thus, an animator must try to get the program to
work properly with the supplied motion files. It also
restricts the animator to using a specific program once
he or she has decided to use motion capture [3].
4.2. Goals of MCML
The purpose of MCML is not only to facilitate the
conversion or integration of different formats, but also
to allow for greater reusability of motion capture data,
through the construction of a motion database storing
the MCML documents. To construct a motion data-
base based on a relational or XML database, the
motion capture files must first be converted into the
corresponding MCML documents. Thus, the primary
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130116
goal of MCML is to facilitate the storage and retrieval
of motion capture data to/from a database.
The second goal of MCML is to provide a common
format which enables the exchange of motion capture
files among animators. If commercially available
animation software packages provide support for
MCML, animators will not need to worry about
whether or not he or she has the appropriate plug-
ins for his or her animation software.
4.3. Scope of MCML
Motion capture formats are divided into ASCII
data type and binary data type. ASCII files can readily
include descriptions associated with parameters, and
are readily manipulated by the use of common text
editors. However, they are very inefficient for storage
and access, and very large files may pose problems for
some editors. Also, ASCII files must generally be
accessed sequentially and are very inefficient if the
files need to be read non-sequentially.
Binary files are efficient in terms of data storage
and access and may also contain parameters and
associated descriptions, but not in a form casually
accessible to the user. Also, the file organization is
specific to the type of data stored, i.e. the data and any
associated parameters may only be accessed by spe-
cifically written programs which have a detailed
knowledge of the file structure.
To design MCML, we chose to analyze the
structure of the three most popular types of ASCII
files, ASF/AMC, BVH and HTR, which are industry
standard formats and are supported by many kinds of
motion software. These formats contain skeleton
information and are superior to segment-based for-
mats such as BVA and TRC. In addition, most
motion capture files contained in commercially avail-
able motion capture libraries are made with these
three formats. Although the C3D format is a binary
format, we are able to handle C3D files, because
these files can be converted into BVH or ASF/AMC
files.
5. MCML DTD specifications
MCML document Type Definition (DTD) defines
the logical structure of an MCML document. DTD
defines the elements which are allowed, and a
validating parser compares the DTD rules against a
given document to determine the validity of the
document.
In this section, we describe the tags and element
structure ofMCML. First, we define the tag names after
analyzing the bone names and keywords contained in
the motion capture file formats. Second, we define the
logical structure of an MCML document.
5.1. Tags of MCML
5.1.1. Tags for header data
The motion data file has a header data area
containing supplementary information, such as the
file type, version, comments, etc. Of course, some
files, such as BVH files, do not have header infor-
mation. MCML provides a set of tags so that it can
include all the header information of these different
kinds of files. Table 1 shows the mapping of the
header data.
MCML defines a set of tags for the header infor-
mation, in order to maintain all of this information,
and new header information can be additionally
defined.
5.1.2. Bone names of skeleton
ASF/AMC, BVH and HTR formats describe the
skeleton which is composed of a number of bones,
usually in a hierarchical structure. The bone is the
basic entity used when representing a skeleton. Each
bone represents the smallest segment within the mo-
tion that is subject to individual translation and
orientation changes during the animation. These three
formats have different bone names and different
hierarchical structures for the skeleton. To integrate
these into one unified format, we create a more
detailed hierarchical structure for the skeleton and
define the bone names. Table 2 shows the MCML
bone names and the corresponding names of these
three file formats.
ASF/AMC file formats use the names of the
human bones and the BVH file format uses the
names of marker locations to represent the joints of
the body. Although HTR uses the names of the
human bones, it has only a few names since it is a
format released in an initial stage. In order to be able
to integrate these different types of files, MCML has
Table 1
Mapping between the tags of MCML and the keywords of BVH, ASF/AMC and HTR Files, in order to represent the header information of the
motion capture data
MCML ASF/AMC BVH(1) BVH(2) HTR
1 filetype undefined undefined undefined FileType
2 datatype undefined undefined undefined DataType
3 filename undefined undefined undefined undefined
4 version version undefined undefined FileVersion
5 skeleton_name name undefined undefined undefined
6 units units undefined undefined undefined
Attribute: Attribute:
mass mass
length length
angle angle
7 num_segments undefined undefined undefined NumSegments
8 num_frames undefined undefined undefined NumFrames
9 dataframe_rate undefined undefined undefined DataFrameRate
10 euler_rotation_order undefined undefined undefined EulerRotationOrder
11 calibration_unit undefined undefined undefined CalibrationUnits
12 rotation_unit undefined undefined undefined RotationUnits
13 global_axis_of_gravity undefined undefined undefined GlobalAxisogGravity
14 bone_length_axis undefined undefined undefined BoneLengthAxis
15 scale_factor undefined undefined undefined ScaleFactor
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 117
extended power of expression so that it can contain
all of these three formats. The contents of Table 2
are used for mapping between MCML and the
various capture formats in the system implemented
in this paper.
5.1.3. Tags for character skeleton
Motion capture data contains hierarchy informa-
tion about the modeled character. That is, the
character to which the motion will be applied is
defined in the same file. The ASF/AMC motion
capture file of Acclaim manages the character and
motion separately. The ASF file contains a hierar-
chy and initial location information for the charac-
ter, and the AMC file contains the motion
information for the character. The advantages of
this separation are the possibility to apply the same
motion to other characters of similar size and
skeleton, and the potential reuse of characters. On
the other hand, the BVH and HTR file formats
contain character information and motion informa-
tion in the same file, so that they have low reuse
rates. MCML can separate human body joint infor-
mation from header information, joint hierarchy
information and motion information. MCML can
also mix the various parts and make them into one
file. Therefore, animators can attain high reuse rates
and avoid duplication.
Body hierarchy information includes the cor-
responding location and joint angle information of
each joint bone, in order to be able to perform
modeling of the human body from the root. If there
is any error in this hierarchy information, the modeled
character will have a strange appearance.
MCML has element and attribute sets that can
represent the hierarchical structure, joint length and
comparative distance from the root and each joint for
the modeling of a human character. The name element
of MCML in Table 3 has the name of each joint, and
the value of this name element should be one of the
bone names defined in Table 2.
The MCML document of Fig. 1 shows an example
of the hierarchical structure of a character. The root
element designates the initial location of the character
and the skeleton designates the hierarchy structure,
location and size of each joint. The hierarchy structure
can be expressed using a comprehension relation of
the bone elements.
Table 2
Mapping between the bone names of MCML and the bone names of ASF/AMC, BVH and HTR File formats expressing human joints
MCML ASF/AMC BVH(1) BVH(2) HTR
1 root h_root root root undefined
2 head h_head(head) head head head
3 neck1 h_neck1(upperneck) neck neck undefined
4 neck2 h_neck2 undefined undefined undefined
5 left_shoulder h_left_shoulder(lclavicle) leftcollar lshoulderjoint undefined
6 left_up_arm h_left_up_arm(lhumerus) leftuparm lhumerus lupperarm
7 left_low_arm h_left_low_arm(lradius) leftlowarm lradius llowarm
8 left_wrist (lwrist) undefined undefined undefined
9 left_hand h_left_hand(lhand) lefthand lwrist lhand
10 left_fingers h_left_fingers(lfingers) undefined undefined undefined
11 left_finger_one h_left_finger_one(lthumb) undefined undefined undefined
12 left_finger_two h_left_finger_two undefined undefined undefined
13 left_finger_three h_left_finger_three undefined undefined undefined
14 left_finger_four h_left_finger_four undefined undefined undefined
15 left_finger_five h_left_finger_five undefined undefined undefined
16 right_shoulder h_right_shoulder(rclavicle) rightcollar rshoulderjoint undefined
17 right_up_arm h_right_up_arm(rhumerus) rightuparm rhumerus rupperarm
18 right_low_arm h_right_low_arm(rradius) rightlowarm rradius rlowarm
19 right_wrist (rwrist) undefined undefined undefined
20 right_hand h_right_hand(rhand) righthand rwrist rhand
21 right_fingers h_right_fingers(rfingers) undefined undefined undefined
22 right_finger_one h_right_finger_one(rthumb) undefined undefined undefined
23 right_finger_two h_right_finger_two undefined undefined undefined
24 right_finger_three h_right_finger_three undefined undefined undefined
25 right_finger_four h_right_finger_four undefined undefined undefined
26 right_finger_five h_right_finger_five undefined undefined undefined
27 torso_1 h_torso_1(upperback) chest1 upperback torso
28 torso_2 h_torso_2(thorax) chest2 thorax undefined
29 torso_3 h_torso_3(lowerback) undefined undefined undefined
30 torso_4 h_torso_4 undefined undefined undefined
31 torso_5 h_torso_5 undefined undefined undefined
32 waist h_waist undefined undefined undefined
33 hips undefined hips hips undefined
34 left_hip h_left_hip(lhipjoint) undefined undefined undefined
35 left_up_leg h_left_up_leg(lfemur) leftupleg lfemur lthight
36 left_low_leg h_left_low_leg(ltibia) leftlowleg ltibia llowleg
37 left_foot h_left_foot(lfoot) leftfoot lfoot lfoot
38 left_toes h_left_toes(ltoes) undefined undefined undefined
39 left_toe_one h_left_toe_one undefined undefined undefined
40 left_toe_two h_left_toe_two undefined undefined undefined
41 left_toe_three h_left_toe_three undefined undefined undefined
42 left_toe_four h_left_toe_four undefined undefined undefined
43 left_toe_five h_left_toe_five undefined undefined undefined
44 right_hip h_right_hip(rhipjoint) undefined undefined undefined
45 right_up_leg h_right_up_leg(rfemur) rightupleg rfemur rthight
46 right_low_leg h_right_low_leg(rtibia) rightlowleg rtibia rlowleg
47 right_foot h_right_foot(rfoot) rightfoot rfoot rfoot
48 right_toes h_right_toes(rtoes) undefined undefined undefined
49 right_toe_one h_right_toe_one undefined undefined undefined
50 right_toe_two h_right_toe_two undefined undefined undefined
51 right_toe_three h_right_toe_three undefined undefined undefined
52 right_toe_four h_right_toe_four undefined undefined undefined
53 right_toe_five h_right_toe_five undefined undefined undefined
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130118
Table 3
The mapping between the tags of MCML and the keywords of ASF/
AMC, BVH and HTR files, in order to represent the character
skeleton of the motion capture data
MCML ASF/AMC BVH HTR
1 root root root undefined
Attribute: Attribute: Attribute:
order order channels
axis axis channels
position position offset
orientation orientation
2 skeleton bonedata hierarchy
3 bone bonedata hierarchy SegmentName
Attribute: Attribute: and Hierarchy
id id
4 name name undefined undefined
5 direction direction undefined undefined
6 length length undefined BoneLength
7 position position undefined undefined
8 axis axis undefined undefined
9 order order undefined undefined
10 dof dof undefined undefined
11 limits limits undefined undefined
12 bodymass bodymass undefined undefined
13 cofmass cofmass undefined undefined
14 offset undefined offset undefined
15 channels undefined channels undefined
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 119
5.1.4. Tags for motion data
Motion data is composed of the total number of
frames, the time per frame, number of translations
Fig. 1. An example of a character sk
per frame, number of rotations per frame, etc.
Table 4 shows the mapping of the motion data
and Fig. 2 shows an example of the motion
element in a MCML document. Motion data
describes the animation of each bone over a period
of time. We can examine a series of lines pertain-
ing to a frame of animation for each of the
segments defined in the skeleton in motion capture
files. The animator may not understand the move-
ment of the character in each frame when exam-
ining the frame data in the files, because motion
capture files contain a large number of frame lines
and the structure of frame data is complex. The
separation and dislocation of a specific frame and
the readjustment of the frame lines are difficult
tasks to perform manually.
It is relatively easy to pick out a specific frame or
specific joint motion and also possible to perform
dislocation, combination or separation by separating
each frame and the joint of each frame in MCML.
The frame_bone, the subelement of the frame ele-
ment of MCML, has translation and rotation values
for each joint angle in a particular frame. The
motion_name describes the motion shown in this
particular frame for each joint. Even though motion
capture data is saved in permanent storage, it is
difficult to find specific motions. Especially, if a
motion is stored in a motion capture file format other
eleton described with MCML.
Table 4
The mapping between the names of MCML, the names of ASF/
AMC, BVH and HTR iles in order to represent the motion
information of the motion capture data
MCML ASF/AMC BVH HTR
1 motion motion
2 frames frames
3 frametime frametime
4 frame frame frame frame
Attribute: Attribute: Attribute: Attribute:
id – #Fr frame#
Tx Tx Tx
Ty Ty Ty
Tz Tz Tz
Rx Rx Rx
Ry Ry Ry
Rz Rz Rz
5 frame_bone
Attribute:
name
Tx
Ty
Tz
Rx
Ry
Rz
6 frame_name
7 motion_name
Attribute:
start_frame
end_frame
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130120
than MCML, it is difficult to determine the motion of
specific motion data without checking it with a
viewer.
If a set of motion data is stored after being
converted into MCML, individual frames can be
retrieved through the use of an XML query language,
such as XQuery, or the use of a regular path ex-
pression language, such as XPath. To make searching
for specific motions possible, motion names cor-
responding to each motion are stored in each frame
so that searching with the motion name can be
performed.
An animator does not need to perform motion
capture for his or her new character, rather he or she
can search for the desired motion among the existing,
stored motion data by means of the motion name (e.g.
left-arm_abduction for the opening motion of the left
arm) and apply existing motion data to the new
character easily by modifying it.
The standard terms used to represent the motion of
the human body are as follows:
� flexion: bending motion, bending finger, bending
oneself� extension: stretching motion, stretching finger,
stretching oneself� abduction: opening, opening arms, opening legs� adduction: closing, closing arms, closing legs� medial (internal) rotation: turning to inside� lateral (external) rotation: turning to outside� left or right rotation: turning neck or body to left or
right
Other motions of every part of the human body,
such as the legs, body, head and shoulders are also
defined. At this time, impossible motions of the
human body should be avoided. For example, a
motion such as left-shoulder-abduction (opening
shoulder) cannot be made, so the corresponding
motion name should not be defined.
5.2. MCML document structure
The mcml element, which is the root element of
MCML, is composed of the meta element, which
expresses the metadata, the header element, the skel-
eton element and the motion element.
5.2.1. Meta element
MCML metadata is based on eight elements and
describes the contents of an MCML document. This is
depicted in Fig. 3.
The title element is the name given to the MCML
document by the creator. The creator element is the
person(s) or organization(s) which created the origi-
nal motion capture file. The subject element is the
topic of the actions contained in the motion capture
file (for example, ballet, dance, etc.). The description
element is a textual description of the actions
contained in the motion capture file. The date element
is the creation date of the motion capture file. The
format element is the data representation of the
motion capture file, such as ASF/AMC, BVH or
HTR. The duration element is the playing time of
the frames contained in the motion capture file. It is
equal to the number of frames multiplied by the frame
rate. The category is used to classify the type of
Fig. 2. Part of an MCML Document which shows an example of a character’s motion.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 121
motion capture data (for example, sports, locomotion,
human interaction, etc.)
5.2.2. Header element
The header element is composed of 15 subele-
ments which are depicted in Fig. 4. These elements
Fig. 3. The structure of
are relevant to the HTR file format. The header
element is used to convert an MCML document into
an HTR file. However, we can specify the values of
the filetype, filename, num_frames, dataframe_rate
elements during the conversion process between
ASF/AMC and BVH files and MCML documents.
MCML metadata.
Fig. 4. The structure of MCML header element.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130122
5.2.3. Skeleton element
The MCML skeleton element represents the hier-
archical structure of a human figure. In Fig. 5, we
examine the logical structure of the skeleton element.
The root element describes the parent of the
hierarchy. The axis and order attributes describe the
order of operations for the initial offset and root node
transformation. The position attribute describes the
root translation of the skeleton and the orientation
attribute defines the rotation.
The skeleton element has one or more bone ele-
ments. The skeleton element may have one bone
element as its direct children in the case of the
hierarchical structure formats such as ASF/AMC,
BVH and HTR. For any formats which do not contain
skeleton information, such as the BVA and TRC
formats, the skeleton element may directly contain
the various bone elements. The BVA file format
created by Biovision is very simple because it lists
all nine possible transformations without allowing for
any changes in the order. The TRC file format is
generated by Motion Analysis optical motion capture
systems and contains translational data only without
hierarchy definition.
The hierarchical structure of the bone element may
be recursive to represent the skeleton information. The
Fig. 5. The structure of MC
PCDATA of the name element is the bone name
according to the bone naming rule shown in Table
2. There is no nesting structure for bone elements in
the case of simple formats such as BVA and TRC.
5.2.4. Motion element
The MCML motion element is composed of one or
more frame elements and zero or more motion_name
elements. This element actually describes the anima-
tion of each bone over time. The logical structure of
the motion element is shown in Fig. 6.
The frames element is the number of frames and the
frametime element is the playing time for each frame.
The frame element has one or more frame_bone
elements to represent the actions of each bone defined
in the skeleton element. One frame_bone element
represents one frame in the motion capture files.
The frame_name and motion_name elements are
used to specify the names of the actions contained in the
motion data. The frame_name is the name of the
primary action for each frame and the motion_name
is the name of the motion sequence. The start_frame
attribute points to the start frame of the motion se-
quence. Also, the end_frame attribute points to the end
frame of the motion sequence (for example, <motion_
name start_frame = ’’5’’ end_frame = ’’24’’> sitting af-
ML skeleton element.
Fig. 6. The structure of the MCML motion element.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 123
ter walking < /motion_name>). We can use these ele-
ments to retrieve the specific motion clips in a query
expression if the motion capture data is stored in a
database.
6. Motion data management based on MCML
6.1. System architecture
The system referred to in this paper is composed of
a front-end and a back-end. The front-end has some
modules which are used to process conversion be-
tween motion capture data files and MCML docu-
ments. The back-end is a storage server used to store
and retrieve MCML documents. We can construct a
database of motion capture data using this storage
server.
The front-end of the system consists of the Mocap
Syntax Analyzer, the Mapping Manager, the MCML
Converter, the MCML Editor, and the Motion View-
er. The Mocap Syntax Analyzer analyzes the syntax
of the imported motion capture data file and gener-
ates tokens that are stored in the token table. The
Mapping Manager manages the mapping table that
has two kinds of mapping information, and which
arbitrates between the motion capture data formats
and the MCML tag set. One kind of information is
the mapping information between the joint names of
the motion capture data formats and the joint names
of MCML. The other is the mapping information
between the keywords of the motion capture data
formats and the tags of MCML. The MCML Con-
verter takes charge of the conversion between the
motion capture data files and the MCML documents.
The MCML Editor provides functions to edit the
MCML documents and the Motion Viewer provides
the animated motion from the motion capture data
files.
The back-end of the system consists of the MCML
Storage Wrapper and the database of MCML docu-
ments and provides services to store and retrieve
MCML documents (Fig. 7).
The main goal of our system is not to improve on
the motion editing functionality of commercial ani-
mation software, but to enhance the reusability of
motion capture data through the use of an XML-
based motion database. So, the core functions of our
system are the automatic generation of MCML docu-
ments from motion capture data files and the storage
and retrieval of MCML documents contained in the
motion database.
6.2. Core modules
6.2.1. Mocap syntax analyzer
If a motion capture data file of ASF/AMC, BVH or
HTR format is imported, the Mocap Syntax Analyzer
Fig. 7. System architecture for converting, retrieving, editing, reprocessing and retargeting motion capture data based on MCML.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130124
starts syntax analysis. The syntax analyzer extracts
tokens and values while scanning the file and it stores
a pair of tokens and values in a token table. A token
table is composed of header tokens, skeleton tokens
and motion tokens. The MCML Converter uses this
token table and the mapping table described below to
generate the MCML documents.
The Mocap Syntax Analyzer is implemented using
component-based programming for extensibility.
Each of the above data formats, ASF/AMC, BVH
and HTR, has its own syntax analyzer. The Mocap
Syntax Analyzer incorporates these three syntax
analyzers. When a motion capture data file is
imported into the system, the Mocap Syntax Analyz-
er checks the data format of the file and invokes the
appropriate syntax analyzer to process this particular
format. If a new motion capture data format needs to
be added, we can implement the requisite syntax
analyzer which can interpret this new format without
changing the system and simply add it to the Mocap
Syntax Analyzer. The Mocap Syntax Analyzer pro-
vides the common interface required to implement
syntax analyzer classes.
6.2.2. Mapping manager
The mapping table consists of the tag mapping
table and the joint mapping table. The joint mapping
table stores the information required to provide map-
ping between the MCML joint (bone) names and the
joint (bone) names of the various motion capture data
formats depicted in Table 2. The tag mapping table
stores the information required to provide mapping
between the MCML tags and the keywords of the
various motion capture data formats depicted in
Tables 1 and 3). These two tables constitute the
dictionary which the MCML Converter refers to
during the document conversion process. The Map-
ping Manager provides a management function which
enables the system administrator to manage these
mapping tables.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 125
6.2.3. MCML converter
MCML document conversion is divided into for-
ward conversion and reverse conversion. Forward
conversion consists of receiving a token table which
is generated by the Mocap Syntax Analyzer, the map-
ping table, and the imported motion capture data file
and generating an MCML document corresponding to
that file.
Reverse conversion involves converting an MCML
document which is loaded from the MCML repository
or created from the results of a query to a specific
motion capture data format.
The MCML Converter is implemented with compo-
nent based programming, as in the case of the Mocap
Syntax Analyzer, and consists of various subcom-
ponents—ASF/AMC_to_MCML, BVH_to_MCML,
HTR_to_MCML, MCML_to_ASF/AMC, MCML_
to_BVH, and MCML_to_HTR. If a new motion cap-
ture data format is introduced, we can implement a
new component which handles the conversion be-
tween the new format and MCML documents and
add it to the MCML Converter.
Fig. 8 depicts a conversion process between motion
capture data files and MCML documents. When the
MCML Converter receives the tokens from the Mocap
Syntax Analyzer, it checks which section includes
these tokens and finds the corresponding MCML tags
Fig. 8. Process to convert motion capture data file in BV
in the tag mapping table. Then, the MCML Converter
creates the MCML document. A DTD document called
mcml.dtd is used to create the MCML documents.
Some empty frame_name and motion_name ele-
ments are created during the conversion process. The
animator uses the MCML Editor after the generation
of the MCML document to assign values to these
empty elements. We can describe the movement in a
specific frame using frame_name elements. Also, we
can describe motion sequences using motion_name
elements, which have start_frame and end_frame
attributes. These attributes point to the start frame
and end frame of the motion sequence.
6.2.4. MCML Editor
An animator can edit the contents of an MCML
document using the MCML Editor. The MCML
Editor is not designed for editing the motion of a
character, but for editing MCML documents. We
can merge MCML fragments contained in multiple
MCML documents into a new MCML document.
The MCML Editor also provides the same function-
ality that ordinary XML editors have, i.e. open,
save, and print document and cut, copy, and paste
block.
The MCML Editor shows the contents of an
MCML document either as a tree structure or in text
H Format into corresponding MCML document.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130126
form. Using the tree structure allows the skeleton of a
character and the frames representing the motion to be
easily understood.
6.2.5. Motion viewer
We can animate the motion of a character using the
Motion Viewer. The Motion Viewer can only process
motion capture data files. When a motion capture data
file is imported or generated by reverse conversion
from an MCML document, we check the motion
using this module. The Motion Viewer shows the
markers and joints of a character and controls the
playing of frames.
6.2.6. MCML Repository
The MCML Repository takes charge of the storage
and retrieval of an MCML document. We can imple-
ment the MCML repository using a relational data-
base, an object-oriented database, or an XML
database. In this study, we used both a relational
database and an XML database as the MCML Repos-
itory. The eXcelon DXE Manager [4] is used for
Fig. 9. The relational database schema
storing the MCML documents. The eXcelon DXE
Manager supports searching for XML documents, as
well as the storage of XML documents. To handle the
problems derived from the difference between the
relational data model and the MCML data model,
we used the MCML Storage Wrapper, which is
located between the MCML Converter and the
MCML Repository. The MCML Storage Wrapper
provides storage independence for the MCML man-
agement system.
6.3. Storage and retrieval of MCML documents
To create the motion database, we convert motion
capture data files to MCML documents and store
them in a relational database or XML database. Fig.
9 shows a schema used to store MCML documents
in a relational database. The DocInfo table is a
master table that contains information about an
MCML document. The SkeletonRoot and Skeleton
tables contain the skeleton information for each
character. The FrameInfo and Frames tables contain
for storing MCML documents.
Fig. 10. MCML documents are stored in the XML database, eXcelon DXE Manager.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 127
the frame information for each character. The Join-
tMap table contains the mapping information be-
tween the MCML joint names and the joint names
of the motion capture data formats. The TagMap
table contains mapping information between the
MCML tags and keywords of the motion capture
data formats.
An MCML document is a subset of an XML docu-
ment and its tags are fixed. An animator can’t define
additional tags or modify the MCML tags. So we pro-
pose a schema which can be used to store MCML
structures in a relational data model directly, without
having to map the tree structure to the relational data
model.
To store MCML documents in a relational data-
base, the MCML Storage Wrapper extract tags and
values when parsing MCML documents and creates
SQL statements which are used to insert the MCML
documents into the relational tables.
Fig. 10 shows a list of MCML documents and
the contents of a selected document stored in an
XML database. We use DXE Manager of eXcelon
as the MCML Repository in this study. A stored
MCML document can be viewed either in the form
of text or as a hierarchy structure in a database.
Therefore, the animator can examine the contents of
an MCML document without loading it into main
memory.
SQL, Xpath or XQuery can be used to retrieve
motion data from the motion database. SQL is
used to retrieve the motion data stored in the
relational database. XPath and XQuery are used
to retrieve the motion data stored in the XML
database.
An animator can specify constraints on the
query statements used to retrieve the desired mo-
tion data, such as the length of motion, where the
body or individual joints should be or what the
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130128
body needs to be doing at particular times. An
animator can also specify the scope of the search
as being either from one document or multiple
documents.
For example, query 1 is the Xpath expression used
to retrieve motion data whose length is less than 180
frames and for which the character motion looks like
‘‘walking’’. The num_frames element contains the
number of frames in the MCML document and the
motion_name element has two attributes which spec-
ify the start frame and end frame of a specific motion
sequence and the motion name relating to that motion
sequence.
After the execution of query 1, the result is
returned. Part of this result is depicted in Result 1.
It tells us that there is a motion sequence with frames
from 1st frame to 12th frame which is named
‘turning’ motion, and a motion sequence with frames
from 13th frame to 91st frame which is named
‘walking’ motion.
Query 2 is the XPath expression used to retrieve the
skeleton data of the character that satisfies the con-
straints related to this skeleton. Result 2 shows a part of
the result obtained after the execution of query 2.
7. Conclusion and future works
The motion capture method, as one of the motion-
creating technologies employed in three-dimensional
animation, is widely used for manufacturing animation
since it produces high-quality character motion similar
to the actual motion of the human body. However,
motion capture has a significant weakness due to the
lack of an industry-wide standard for archiving and
exchanging motion capture data. It has low flexibility.
Creating all of the required motions using capture is
nonproductive and sometimes impossible due to the
difficulties and costs involved. Therefore, animators
frequently try to create slightly different motions, by
reusing previously captured motion data, and to create
new composite motions which are the synthesis of
Hyun-Sook Chung received her BS degree
in physics from the Catholic University of
Daegu, Korea in 1993 and her MS degree in
computer science from the Catholic Univer-
sity of Daegu, Korea in 1995. She is cur-
rently working towards her PhD degree in
computer science at Yonsei University,
Korea. She worked as a research scientist
at the CAD/CAM Research Center of KIST
(Korea Institute of Science and Technology)
Seoul, South Korea, from 1997 to 1999. Her
research interests include multimedia document engineering, hu-
man–computer interaction, multimedia systems, and XML. She is
a member of the Korea Information Science Society, the Korea
Information Processing Society, and the Korea Multimedia Society.
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130 129
various individual motions. However, it is very difficult
for the animator to obtain motions that do exactly what
he or she wants, because commercially available mo-
tion capture libraries are not databases, but just a
collection of motion capture files.
In order to solve this problem, we propose a stan-
dard format based on XML called MCML. We also
propose a system framework based on this standard
format for motion storage and retrieval. The purpose of
MCML is not only to facilitate the conversion or
integration of different formats, but also to allow for
greater reusability of motion capture data, through the
construction of a motion database based on MCML.
There is a standard way of representing human
beings in online virtual environments such as H-ANIM
and VRML. However, these languages do not process
motion capture files and their structure is too complex
to use as a standard motion capture format. So far, no
standard format for integrating, storing and retrieving
motion capture data with relational databases or XML
databases has been defined.
If the MCML documents are stored in a motion
database, it is easy for the animator to obtain motions
that do exactly what he or she wants by using a database
query language such as SQL or XQuery. This offers
many advantages for motion synthesis or motion edit-
ing applications. Also, in order to provide increased
security for the data and more convenient data man-
agement, commercial animation software could be
used in conjunction with a database for the storage of
the motion capture data. Thus, MCML can improve the
reusability of the motion capture data.
We propose a system framework that can be used to
manage theMCML documents, and a motion database,
which is based on a relational database or XML
database. Our system has many core modules for
dealing with motion capture files and MCML docu-
ments—MoCap Syntax Analyzer, Mapping Manager,
MCML Converter, Motion Viewer, etc.
The design of MCML does not end with this initial
version. We plan to develop future versions with
enhanced functionality and to provide maintenance
which allows for other motion capture formats to be
used. Above all, we will endeavor to support the
compatibility between MCML and the H-ANIM hu-
manoid format, because H-ANIM is the upcoming
standard way of representing humanoids in the web-
based environment.
Acknowledgements
This work was supported in part by BERC/KOSEF
and in part by Brain Neuroinformatics Program
sponsored by KMST.
References
[1] Acclaim, ASF/AMC File Specifications page, http://www.
darwin3d.com/gamedev/acclaim.zip.
[2] Biovision, BVH Specifications page, http://www.cs.wisc.edu/
graphics/courses/cs-838-1999/Jeff/BVH.html.
[3] C.R. Morales, Development of an XML web based motion
capture data warehousing and translation system for collabo-
rative animation projects, Proceedings of the 9th International
Conference in Central Europe on Computer Graphics, Visual-
ization and Computer Vision, 2001.
[4] eXcelon DXE Manager, http://www.excelon.com.
[5] L.M. Tanco, A. Hilton, Realistic synthesis of novel human
movements from a database of motion, Proceedings of the
IEEE Workshop on Human Motion HUMO, 2000.
[6] Motion Analysis, HTR Specification page, http://www.cs.wisc.
edu/graphics/Courses/cs-838-1999/Jeff/HTR.html.
[7] M. Meredith, S. Maddock, Motion capture file formats ex-
plained, Department of Computer Science Technical Report
CS-01-11.
[8] O. Arikan, D.A. Forsyth, Interactive motion generation from
examples, Proceedings of SIGGRAPH, 2002.
[9] S. Rosenthal, B. Bodenheimer, C. Rose, J. Pella, The process
of motion capture: dealing with the data, Proceedings of the 8th
Eurographics Workshop on Animation and Simulation, 1997.
[10] A. Marriott, VHML—virtual human markup language, Pro-
ceedings of Talking Head Technology Workshop, 2001.
[11] W3C, Extensible Markup Language (XML) 1.0, http://
www.w3c.org/XML, 1998.
[12] Web3D Consortium, VRML International Standard, http://
www.web3d.org/technicalinfo/specifications/ISO_IEC_
14772-All/index.html.
[13] Web3D Consortium, H-ANIM 2001 Specification, http://www.
h-anim.org/Specifications/H-Anim2001/.
Yillbyung Lee has been a professor in the
Department of Computer Science, Yonsei
University since 1986. He received his BE
degree in Electronic Engineering from
Yonsei University, Korea in 1976 and his
MS degree in computer science from the
University of Illinois, USA in 1980 and his
PhD degree in computer science from the
University of Massachusetts, USA in 1985.
At present, he leads the Artificial Intelli-
gence Lab at Yonsei University. He is the
president of the Korean Cognitive Science Society and a vice
president of the Korean Data Mining Society. His main areas of
interest are Document Recognition, Data Mining, multimedia, and
H.-S. Chung, Y. Lee / Computer Standards & Interfaces 26 (2004) 113–130130
Computational Models of Vision and Biometrics.