7
Annotation-Based Multimedia Summarization and Translation Katashi Nagao Dept. of Information Engineering Nagoya University and CREST, JST Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan [email protected] Shigeki Ohira School of Science and Engineering Waseda University 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan [email protected] Mitsuhiro Yoneoka Dept. of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan [email protected] Abstract This paper presents techniques for multimedia annotation and their application to video sum- marization and translation. Our tool for anno- tation allows users to easily create annotation including voice transcripts, video scene descrip- tions, and visual/auditory object descriptions. The module for voice transcription is capable of multilingual spoken language identification and recognition. A video scene description con- sists of semi-automatically detected keyframes of each scene in a video clip and time codes of scenes. A visual object description is created by tracking and interactive naming of people and objects in video scenes. The text data in the multimedia annotation are syntactically and semantically structured using linguistic annota- tion. The proposed multimedia summarization works upon a multimodal document that con- sists of a video, keyframes of scenes, and tran- scripts of the scenes. The multimedia transla- tion automatically generates several versions of multimedia content in different languages. 1 Introduction Multimedia content such as digital video is be- coming a prevalent information source. Since the volume of such content is growing to huge numbers of hours, summarization is required to effectively browse video segments in a short time without missing significant content. Annotating multimedia content with semantic information such as scene/segment structures and metadata about visual/auditory objects is necessary for advanced multimedia content services. Since natural language text such as a voice transcript is highly manageable, speech and natural lan- guage processing techniques have an essential role in our multimedia annotation. We have developed techniques for semi- automatic video annotation integrating a mul- tilingual voice transcription method, some video analysis methods, and an interactive visual/auditory annotation method. The video analysis methods include automatic color change detection, characterization of frames, and scene recognition using similarity between frame attributes. There are related approaches to video annota- tion. For example, MPEG-7 is an effort within the Moving Picture Experts Group (MPEG) of ISO/IEC that is dealing with multimedia con- tent description (MPEG, 2002). MPEG-7 can describe indeces, notes, and so on, to retrieve necessary parts of content speedily. However, it takes a high cost to add these descriptions by hands. The method of extracting them auto- matically through the video/audio analysis is vitally important. Our method can be inte- grated into tools for authoring MPEG-7 data. The linguistic description scheme, which will be a part of the amendment to MPEG-7, should play a major role in this integration. Using such annotation data, we have also de- veloped a system for advanced multimedia pro- cessing such as video summarization and trans- lation. Our video summary is not just a shorter version of the original video clip, but an in- teractive multimedia presentation that shows keyframes of important scenes and their tran- scripts in Web pages and allow users to interac- tively modify summary. The video summariza- tion is customizable according to users’ favorite size and keywords. When a user’s client device is not capable of video playing, our system tran- forms video to a document that is the same as a Web document in HTML format. The multimedia annotation can make deliv- ery of multimedia content to different devices very effective. Dissemination of multimedia

Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

Annotation-Based Multimedia Summarization and Translation

Katashi NagaoDept. of Information Engineering

Nagoya Universityand CREST, JST

Furo-cho, Chikusa-ku,Nagoya 464-8603, Japan

[email protected]

Shigeki OhiraSchool of Science and

EngineeringWaseda University

3-4-1 Okubo, Shinjuku-ku,Tokyo 169-8555, Japan

[email protected]

Mitsuhiro YoneokaDept. of Computer Science

Tokyo Institute of Technology2-12-1 O-okayama, Meguro-ku,

Tokyo 152-8552, [email protected]

Abstract

This paper presents techniques for multimediaannotation and their application to video sum-marization and translation. Our tool for anno-tation allows users to easily create annotationincluding voice transcripts, video scene descrip-tions, and visual/auditory object descriptions.The module for voice transcription is capableof multilingual spoken language identificationand recognition. A video scene description con-sists of semi-automatically detected keyframesof each scene in a video clip and time codes ofscenes. A visual object description is createdby tracking and interactive naming of peopleand objects in video scenes. The text data inthe multimedia annotation are syntactically andsemantically structured using linguistic annota-tion. The proposed multimedia summarizationworks upon a multimodal document that con-sists of a video, keyframes of scenes, and tran-scripts of the scenes. The multimedia transla-tion automatically generates several versions ofmultimedia content in different languages.

1 Introduction

Multimedia content such as digital video is be-coming a prevalent information source. Sincethe volume of such content is growing to hugenumbers of hours, summarization is required toeffectively browse video segments in a short timewithout missing significant content. Annotatingmultimedia content with semantic informationsuch as scene/segment structures and metadataabout visual/auditory objects is necessary foradvanced multimedia content services. Sincenatural language text such as a voice transcriptis highly manageable, speech and natural lan-guage processing techniques have an essentialrole in our multimedia annotation.

We have developed techniques for semi-

automatic video annotation integrating a mul-tilingual voice transcription method, somevideo analysis methods, and an interactivevisual/auditory annotation method. Thevideo analysis methods include automatic colorchange detection, characterization of frames,and scene recognition using similarity betweenframe attributes.

There are related approaches to video annota-tion. For example, MPEG-7 is an effort withinthe Moving Picture Experts Group (MPEG) ofISO/IEC that is dealing with multimedia con-tent description (MPEG, 2002). MPEG-7 candescribe indeces, notes, and so on, to retrievenecessary parts of content speedily. However, ittakes a high cost to add these descriptions byhands. The method of extracting them auto-matically through the video/audio analysis isvitally important. Our method can be inte-grated into tools for authoring MPEG-7 data.The linguistic description scheme, which will bea part of the amendment to MPEG-7, shouldplay a major role in this integration.

Using such annotation data, we have also de-veloped a system for advanced multimedia pro-cessing such as video summarization and trans-lation. Our video summary is not just a shorterversion of the original video clip, but an in-teractive multimedia presentation that showskeyframes of important scenes and their tran-scripts in Web pages and allow users to interac-tively modify summary. The video summariza-tion is customizable according to users’ favoritesize and keywords. When a user’s client deviceis not capable of video playing, our system tran-forms video to a document that is the same asa Web document in HTML format.

The multimedia annotation can make deliv-ery of multimedia content to different devicesvery effective. Dissemination of multimedia

Page 2: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

content will be facilitated by annotation on theusage of the content in different purposes, clientdevices, and so forth. Also, it provides object-level description of multimedia content whichallows a higher granularity of retrieval and pre-sentation in which individual regions, segments,objects and events in image, audio and videodata can be differentially accessed depending onpublisher and user preferences, network band-width and client capabilities.

2 Multimedia Annotation

Multimedia annotation is an extension of doc-ument annotation such as GDA (Global Docu-ment Annotation) (Hasida, 2002). Since natu-ral language text is more tractable and mean-ingful than binary data of visual (image andmoving picture) and auditory (sound and voice)content, we associate text with multimedia con-tent in several ways. Since most video clipscontain spoken narrations, our system convertsthem into text and integrates them into videoannotation data. The text in the multimediaannotation is linguistically annotated based onGDA.

2.1 Multimedia Annotation EditorWe developed an authoring tool called multi-media annotation editor capable of video scenechange detection, multilingual voice transcrip-tion, syntactic and semantic analysis of tran-scripts, and correlation of visual/auditory seg-ments and text.

Figure 1: Multimedia Annotation Editor

An example screen of the editor is shown inFigure 1. The editor screen consists of three

windows. One window (top) shows a video con-tent, automatically detected keyframes in thevideo, and an automatically generated voicetranscript. The second window (left bottom)enables the user to edit the transcript and mod-ify an automatically analyzed linguistic markupstructure. The third window (right bottom)shows graphically a linguistic structure of theselected sentence in the second window.

The editor is capable of basic natural lan-guage processing and interactive disambigua-tion. The user can modify the results of theautomatically analyzed multimedia and linguis-tic (syntactic and semantic) structures.

2.2 Linguistic Annotation

Linguistic annotation has been used to makedigital documents machine-understandable, andto develop content-based presentation, retrieval,question-answering, summarization, and trans-lation systems with much higher quality thanis currently available. We have employed theGDA tagset as a basic framework to describelinguistic and semantic features of documents.The GDA tagset is based on XML (ExtensibleMarkup Language) (W3C, 2002), and designedto be as compatible as possible with TEI (TEI,2002), CES (CES, 2002), and EAGLES (EA-GLES, 2002).

An example of a GDA-tagged sentence fol-lows:

<su><np opr="agt" sem="time0">Time</np><v sem="fly1">flies</v><adp opr="eg"><ad sem="like0">like</ad><np>an <n sem="arrow0">arrow</n></np></adp>.</su>

The <su> element is a sentential unit. Theother tags above, <n>, <np>, <v>, <ad> and<adp> mean noun, noun phrase, verb, adnounor adverb (including preposition and postposi-tion), and adnominal or adverbial phrase, re-spectively.

The opr attribute encodes a relationshipin which the current element stands withrespect to the element that it semanticallydepends on. Its value denotes a binaryrelation, which may be a thematic rolesuch as agent, patient, recipient, etc., or arhetorical relation such as cause, concession,etc. For instance, in the above sentence,

Page 3: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

<np opr="agt" sem="time0">Time</np>depends on the second element<v sem="fly1">flies</v>. opr="agt"means that Time has the agent role withrespect to the event denoted by flies. The semattribute encodes a word sense.

Linguistic annotation is generated by auto-matic morphological analysis, interactive sen-tence parsing, and word sense disambiguationby selecting the most appropriate item in thedomain ontology. Some research issues on lin-guistic annotation are related to how the anno-tation cost can be reduced within some feasiblelevels. We have been developing some machine-guided annotation interfaces to simplify the an-notation work. Machine learning mechanismsalso contribute to reducing the cost becausethey can gradually increase the accuracy of au-tomatic annotation.

In principle, the tag set does not depend onlanguage, but as a first step we implemented asemi-automatic tagging system for English andJapanese.

2.3 Video AnnotationThe linguistic annotation technique has an im-portant role in multimedia annotation. Ourvideo annotation consists of creation of textdata related to video content, linguistic anno-tation of the text data, automatic segmentationof video, semi-automatic linking of video seg-ments with corresponding text data, and inter-active naming of people and objects in videoscenes.

To be more precise, video annotation is per-formed through the following three steps.

First, for each video clip, the annotation sys-tem creates the text corresponding to its con-tent. We developed a method for creation ofvoice transcripts using speech recognition en-gines. It is called multilingual voice transcrip-tion and described later.

Second, some video analysis techniques areapplied to characterization of visual segments(i.e., scenes) and individual video frames. Forexample, by detecting significant changes in thecolor histogram of successive frames, frame se-quences can be separated into scenes.

Also, by matching prepared templates to in-dividual regions in the frame, the annotationsystem identifies objects. The user can specifysignificant objects in some scene in order to re-

duce the time to identify target objects and toobtain a higher recognition accuracy. The usercan name objects in a frame simply by selectingwords in the corresponding text.

Third, the user relates video segments to textsegments such as paragraphs, sentences, andphrases, based on scene structures and object-name correspondences. The system helps theuser select appropriate segments by prioritiz-ing them based on the number of the detectedobjects, camera motion, and the representativeframes.

2.4 Multilingual Voice Transcription

The multimedia annotation editor first extractsthe audio data from a target video clip. Then,the extracted audio data is divided into left andright channels. If the average for the differenceof the audio signals of the two channels exceedsa certain threshold, they are considerd differentand transfered to the multilingual speech identi-fication and recognition module. The output ofthe module is a structured transcript contain-ing time codes, word sequences, and languageinformation. It is described in XML format asshown in Figure 2.

<transcript lang="en" channel="l"><w in="20.264000" out="20.663000">Web grabber </w><w in="20.663000" out="21.072000">is a </w><w in="21.072000" out="21.611000">very simple </w><w in="21.611000" out="22.180000">utility </w><w in="22.180000" out="22.778000">that is </w><w in="22.778000" out="23.856000">attached to </w><w in="23.856000" out="24.215000">Netscape </w><w in="24.215000" out="24.934000">as a pull down menu </w><w in="24.934000" out="25.153000">and </w><w in="25.153000" out="25.462000">allows you </w><w in="25.462000" out="25.802000">to take </w><w in="25.802000" out="26.191000">your Web content </w><w in="26.191000" out="27.039000">whether it’s a </w><w in="27.039000" out="27.538000">MPEG file </w>...</transcript>

Figure 2: Transcript Data

Our multilingual video transcriptor automat-ically generates transcripts with time codes andprovides their reusable data structure which al-lows easy manual corretion. An example screenof the mulitilingual voice transcriptor is shownin Figure 3.

Page 4: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

Left Channel

Right Channel

Figure 3: Multilingual Voice Transcriptor

2.4.1 Multilingual Speech Identificationand Recognition

The progress of speech recognition technol-ogy makes it comparatively easy to transformspeech into text, but spoken language identi-fication is needed for processing multilingualspeech, because speech recognition technologyassumes that the language used is known.While researchers have been working on themultilingual speech identification, few applica-tions based on this technology has been actuallyused except a telephony speech translation sys-tem. In the case of the telephone translationsystem, the information of the language usedis self-evident; at least, the speaker knows; sothere are little needs and advantages of develop-ing a multilingual speech identification system.

On the other hand, speech data in video donot always have the information about the lan-guage used. Due to the recent progress of digitalbroadcasting and the signal compression tech-nology, the information about the language isexpected to accompany the content in the fu-ture. But most of the data available now do

not have it, so a large amount of labor is neededto identify the language. Therefore, the multi-lingual speech identification has a large part toplay with unknown-language speech input.

A process of multilingual speech identificationis shown in Figure 4. Our method determinesthe language of input speech using a simple dis-criminant function based on relative scores ob-tained from multiple speech recognizers workingin parallel (Ohira et al., 2001).

Figure 4: Configuration of Spoken LanguageIdentification Unit

Multiple speech recognition engines work si-multaneously on the input speech. It is as-sumed that each speech recognition engine hasthe speaker independent model, and each recog-nition output word has a score within a constantrange dependent on each engine.

When a speech comes, each recognition en-gine outputs a word sequence with scores. Thediscriminant unit calculates a value of a dis-criminant function using the scores for everylanguage. The engine with the highest averagediscriminant value is selected and the languageis determined by the engine, whose recognitionresult is accepted as the transcript. If there isno distinct difference between discriminant val-ues, that is not higher than a certain threshold,a judgment is entrusted to the user.

Our technique is simple, it uses the exist-ing speech recognition engines tuned in eachlanguage without a special model for languageidentification and acoustic features.

Combining the voice transcription and thevideo image analysis, our tool enables users tocreate and edit video annotation data semi-automatically. The entire process is as shown

Page 5: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

in Figure 5.

Figure 5: Multilingual Video Data Analysis

Our system drastically reduces the overheadon the user who analyzes and manages a largecollection of video content. Furthermore, itmakes conventional natural language processingtechniques applicable to multimedia processing.

2.5 Scene Detection and Visual ObjectTracking

As mentioned earlier, visual scene changes aredetected by searching for significant changes inthe color histogram of successive frames. Then,frame sequences can be divided into scenes. Thescene description consists of time codes of thestart and end frames, a keyframe (image datain JPEG format) filename, a scene title, andsome text representing topics. Additionally,when the user specifies a particular object in aframe by mouse-dragging a rectangular region,an automatic object tracking is executed andtime codes and motion trails in the frame (seriesof coordinates for interpolation of object move-ment) are checked out. The user can name thedetected visual objects interactively. The visualobject description includes the object name, therelated URL, time codes and motion trails in theframe.

Our multimedia annotation also contains de-scriptions on auditory objects in video. The au-ditory objects can be detected by acoustic anal-ysis on the user specified sound sequence visual-ized in waveform. An example scene descriptionin XML format is shown in Figure 6, and an ex-ample object description in Figure 7.

3 Multimedia Summarization andTranslation

Based on multimedia annotation, we have de-veloped a system for multimedia (especially,video) summarization and translation. One of

<scene><seg in="0.066733" out="11.945279"

keyframe="0.187643"/><seg in="11.945279" out="14.447781"

keyframe="12.004385"/><seg in="14.447781" out="18.685352"

keyframe="14.447781"/>...</scene>

Figure 6: Scene Description

<object><vobj begin="1.668335" end="4.671338" name="David"

description="anchor" img="o0000.jpg"link="http://...">

<area time="1.668335" top="82" left="34"width="156" height="145"/>

<area ... /></vobj>...</object>

Figure 7: Object Description

the main functions of the system is to gener-ate an interactive HTML (HyperText MarkupLanguage) document from multimedia contentwith annotation data for interactive multime-dia presentation, which consists of an embeddedvideo player, hyperlinked keyframe images, andlinguistically-annotated transcripts. Our sum-marization and translation techniques are ap-plied to the generated document called a multi-modal document.

There are some previous work on multime-dia summarization such as Informedia (Smithand Kanade, 1995) and CueVideo (Amir et al.,1999). They create a video summary based onautomatically extracted features in video suchas scene changes, speech, text and human facesin frames, and closed captions. They can pro-cess video data without annotations. However,currently, the accuracy of their summarizationis not for practical use because of the failure ofautomatic video analysis. Our approach to mul-timedia summarization attains sufficient qualityfor use if the data has enough semantic informa-tion. As mentioned earlier, we have developeda tool to help annotators to create multimediaannotation data. Since our annotation data isdeclarative, hence task-independent and versa-tile, the annotations are worth creating if themultimedia content will be frequently used indifferent applications such as automatic editing

Page 6: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

and information extraction.

3.1 Multimodal DocumentVideo transformation is an initial processof multimedia summarization and translation.The transformation module retrieves the anno-tation data accumulated in an annotation repos-itory (XML database) and extracts necessaryinformation to generate a multimodal docu-ment. The multimodal document consists of anembedded video window, keyframes of scenes,and transcipts aligned with the scenes as shownin Figure 8. The resulting document can besummarized and translated by the modules ex-plained later.

Figure 8: Multimodal Document

This operation is also beneficial for peoplewith devices without video playing capabil-ity. In this case, the system creates a simpli-fied version of multimodal document containingonly keyframe images of important scenes andsummarized transcripts related to the selectedscenes.

3.2 Video SummarizationThe proposed video summarization is per-formed as a by-product of text summariza-tion. The text summarization is an appli-cation of linguistic annotation. The methodis cohesion-based and employs spreading acti-vation to calculate the importance values ofwords and phrases in the document (Nagao andHasida, 1998).

Thus, the video summarization works interms of summarization of a transcript frommultimedia annotation data and extraction of

the video scene related to the summary. Sincea summarized transcript contains importantwords and phrases, corresponding video se-quences will produce a collection of significantscenes in the video. The summarization resultsin a revised version of multimodal documemtthat contains keyframe images and summa-rized transcripts of selected important scenes.Keyframes of less important scenes are shownin a smaller size. An example screen of a sum-marized multimodal document is shown in Fig-ure 9.

Figure 9: Summarized Multimodal Document

The vertical time bar in the middle of thescreen of multimodal document represents scenesegments whose color indicates if the segmentis included in the summary or not. Thekeyframe images are linked with their corre-sponding scenes so that the user can see thescene by just clicking its related image. Theuser can also access information about objectssuch as people in the keyframe by dragging arectangular region enclosing them. The infor-mation appears in external windows. In the caseof auditory objects, the user can select them byclicking any point in the time bar.

3.3 Video TranslationOne type of our video translation is achievedthrough the following procedure. First, tran-scripts in the annotation data are translatedinto different languages for the user choice, andthen, the results are shown as subtitles syn-chronized with the video. The video transla-tion module invokes an annotation-based texttranslation mechanism. Text translation is also

Page 7: Annotation-Based Multimedia Summarization and Translationweb.nagao.nuie.nagoya-u.ac.jp/papers/pdfs/nagao_coling02.pdf · The multimedia annotation editor first extracts the audio

greatly improved by using linguistic annotation(Watanabe et al., 2002).

The other type of translation is performed interms of synchronization of video playing andspeech synthesis of the translation results. Thistranslation makes another-language version ofthe original video clip. If comments, notes, orkeywords are included in the annotation dataon visual/auditory objects, then they are alsotranslated and shown on a popup window.

In the case of bilingual broadcasting, sinceour annotation system generates transcripts inevery audio channel, multimodal documents canbe coming from both channels. The user caneasily select a favorite multimodal documentcreated from one of the channels. We have alsodeveloped a mechanism to change the languageto play depending on the user profile that de-scribes the user’s native language.

4 Concluding Remarks

We have developed a tool to create multime-dia annotation data and a mechanism to ap-ply such data to multimedia summarization andtranslation. The main component of the anno-tation tool is a multilingual voice transcriptor togenerate transcripts from multilingual speech invideo clips. The tool also extracts scene and ob-ject information semi-automatically, describesthe data in XML format, and associates thedata with content.

We also presented some advanced applica-tions on multimedia content based on annota-tion. We have implemented video-to-documenttransformation that generates interactive multi-modal documents, video summarization using atext summarization technique, and video trans-lation.

Linguistic processing is an essential task inthose applications so that natural languagetechnologies are still very important in process-ing multimedia content.

Our future work includes a more efficient andflexible retrieval of multimedia content for re-quests in spoken and written natural language.The retrieval of spoken documents has also beenevaluated in a subtask “SDR (Spoken Docu-ment Retrieval) track” at TREC (Text RE-trieval Conference) (TREC, 2002). Johnson(Johnson, 2001) suggested from his group’s ex-perience on TREC-9 that new challenges such

as use of non-lexical information derived di-rectly from the audio and integration with videodata are significant works for the improvementof retrieval performance and usefulness. We,therefore, believe that our research has signifi-cant impacts and potetials on the content tech-nology.

ReferencesA. Amir, S. Srinivasan, D. Ponceleon, and

D. Petkovic. 1999. CueVideo: Automated index-ing of video for searching and browsing. In Pro-ceedings of SIGIR’99.

CES. 2002. Corpus Encoding Standard.http://www.cs.vassar.edu/CES/.

EAGLES. 2002. EAGLES online.http://www.ilc.pi.cnr.it/EAGLES/home.html.

Koiti Hasida. 2002. Global Document Annotation.http://i-content.org/GDA/.

S. E. Johnson. 2001. Spoken document retrieval forTREC-9 at Cambridge University. In Proceedingsof Text REtrieval Conference (TREC-9).

MPEG. 2002. MPEG-7 context and objectives.http://drogo.cselt.stet.it/mpeg/standards/mpeg-7/mpeg-7.htm.

Katashi Nagao and Koiti Hasida. 1998. Automatictext summarization based on the Global Doc-ument Annotation. In Proceedings of the Sev-enteenth International Conference on Computa-tional Linguistics (COLING-98), pages 917–921.

Shigeki Ohira, Mitsuhiro Yoneoka, and Katashi Na-gao. 2001. A multilingual video transcriptor andannotation-based video transcoding. In Proceed-ings of the Second International Workshop onContent-Based Multimedia Indexing (CBMI-01).

Michael A. Smith and Takeo Kanade. 1995. Videoskimming for quick browsing based on audio andimage characterization. Technical Report CMU-CS-95-186, School of Computer Science, CarnegieMellon University.

TEI. 2002. Text Encoding Initiative.http://www.uic.edu/orgs/tei/.

TREC. 2002. Text REtrieval Conference homepage. http://trec.nist.gov/.

W3C. 2002. Extensible Markup Language (XML).http://www.w3.org/XML/.

Hideo Watanabe, Katashi Nagao, Michael C. Mc-Cord, and Arendse Bernth. 2002. An annotationsystem for enhancing quality of natural languageprocessing. In Proceedings of the Nineteenth In-ternational Conference on Computational Lin-guistics (COLING-2002).