38
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah, Pat Ngai Cheuk Han, Tabl e LYU0102 LYU0102 XML for XML for Interoperable Interoperable Digital Video Digital Video Library Library

Supervised by Prof. LYU, Rung Tsong Michael

  • Upload
    galeno

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Department of Computer Science & Engineering The Chinese University of Hong Kong. LYU0102 XML for Interoperable Digital Video Library. Supervised by Prof. LYU, Rung Tsong Michael. Prepared by: Chan Pik Wah, Pat Ngai Cheuk Han, Table. Outline. Introduction to XVIP Overview of Project - PowerPoint PPT Presentation

Citation preview

Page 1: Supervised by Prof. LYU, Rung Tsong Michael

Supervised by Prof. LYU, Rung Tsong Michael

Department of Computer Science & Engineering

The Chinese University of Hong Kong

Prepared by: Chan Pik Wah, Pat

Ngai Cheuk Han, Table

LYU0102LYU0102

XML for InteroperableXML for Interoperable Digital Video Library Digital Video Library

Page 2: Supervised by Prof. LYU, Rung Tsong Michael

Outline Introduction to XVIP Overview of Project Extraction Techniques

Face Detection Speech Recognition

Multimedia Transformation & Presentation XSL SMIL Transformation

Problems & Solutions Conclusion

Page 3: Supervised by Prof. LYU, Rung Tsong Michael

Motivations Rapid increase in the

usage of multimedia information

New approach: DIGITAL VIDEO LIBRARY

Project Outline

Page 4: Supervised by Prof. LYU, Rung Tsong Michael

Motivations Little attention paying on video

information extraction and storage Scalability of the system in terms of adding

new extraction components Lack of a generic framework for

presentation and visualization of video information

Project Outline

Page 5: Supervised by Prof. LYU, Rung Tsong Michael

Overview of XVIP

Project Outline

Page 6: Supervised by Prof. LYU, Rung Tsong Michael

Achievements in last Semester 2 Extraction

Techniques Scene Change VOCR

Integrate data into XML

XML Editor Knowledge

Enrichment

Project Outline

Page 7: Supervised by Prof. LYU, Rung Tsong Michael

Achievements in this Semester 2 more extraction

techniques Face Detection Speech Recognition

New data integrated to XML

XML to SMIL Transformer

Project Outline

Page 8: Supervised by Prof. LYU, Rung Tsong Michael

Extraction Techniques

Extraction Techniques

Video

Scene Change

VOCD

Face Detection

Speech Recognition

XML

Page 9: Supervised by Prof. LYU, Rung Tsong Michael

Face Detection Object-presence

detections are also an important technique.

Identify and index features to support image similarity matching. Face detection is a good example

Extraction Techniques

Page 10: Supervised by Prof. LYU, Rung Tsong Michael

Face Detection Name of people

appearing in the video How they are interacting

with the environment More searchable

Extraction Techniques

Page 11: Supervised by Prof. LYU, Rung Tsong Michael

Face Detection Neural Network-Based Algorithm The basic algorithm used for face detection

Extraction Techniques

Page 12: Supervised by Prof. LYU, Rung Tsong Michael

Face Detection Face Recognition Facial Expression Analysis Enrich the XML Easier for user to search the content of

video

Extraction Techniques

Page 13: Supervised by Prof. LYU, Rung Tsong Michael

Speech Recognition Speech recognition technology can make

any spoken data useful for library indexing and retrieval

Extraction Techniques

Page 14: Supervised by Prof. LYU, Rung Tsong Michael

Speech Recognition Engine

Extraction Techniques

Page 15: Supervised by Prof. LYU, Rung Tsong Michael

Speech Recognition

ViaVoice Error rate > 50%

Extraction Techniques

Page 16: Supervised by Prof. LYU, Rung Tsong Michael

Usage of XML

XML

Indexing & Searching

Combine with other XML for Knowledge Enrichment

Presentation

Exchange data with different application

Page 17: Supervised by Prof. LYU, Rung Tsong Michael

Presentation of the video data XML is not presentable without processing HTML with images, but is static SMIL is good for multimedia presentation No existing tools for integrating different XML

data into a SMIL presentation Current transformation language has

a lot of limitations in transforming

XML to SMIL

SMIL

Page 18: Supervised by Prof. LYU, Rung Tsong Michael

SMIL SMIL stands for Synchronized Multimedia

Integration Language is currently a W3C Recommendation.

It is a markup language that can synchronize and integrate multimedia.

It enables authors to specify when and what should be presented.

RealPlayer, QuickTime, IE support

SMIL

Page 19: Supervised by Prof. LYU, Rung Tsong Michael

Advantages SMIL is text-based

Easy to develop with a text editor Generate customized presentations

Generate customized SMIL file based on preferences recorded in the visitor's browser

SMIL effort is led by the W3C W3C tries to shape a specification that is beneficial

to all parties involved. Avoid using container formats.

SMIL can stream many media formats, no need to merge clips into a single streaming file.

SMIL

Page 20: Supervised by Prof. LYU, Rung Tsong Michael

Timing and SynchronizationSequence element:<seq>

<img src="pix/0.jpg" dur="15" region="scene"/>

<img src="pix/15.jpg" dur="5" region="scene"/>

<img src="pix/20.jpg" dur="7" region="scene"/>

<img src="pix/27.jpg" dur="4" region="scene"/>

……

</seq>

Parallel element:

<par>

<text src="text/transcript.rt" region="transcript" />

<text src="text/mapdetail.rt" region="mapdetail" />

<video src="news.mpg" region="video" fill="freeze"/>

</par>

SMIL

Page 21: Supervised by Prof. LYU, Rung Tsong Michael

XSL Stands for “Extensible Stylesheet Language” XSL is the language defined by the W3C to add

formatting information to XML data. XSLT -- most commonly used XSL standard

Transforms one XML document into another. Used in our FYP.

XSL

Page 22: Supervised by Prof. LYU, Rung Tsong Michael

XSL

Working Principle

Source

Tree

XSL Stylesheet

Output

Page 23: Supervised by Prof. LYU, Rung Tsong Michael

Transformation Process

Transformation

Input files XML file

generated by XVIP

XML files of additional information

Output files A SMIL file

Some RealText files

Page 24: Supervised by Prof. LYU, Rung Tsong Michael

Design 1 Build with VC++ solely

Read all the input files, get the information

Create the output the files for the SMIL presentation.

Transformation

Disadvantages Layout of the SMIL

presentation need to be hard-coded in the VC++ program.

The layout becomes hard to change and the transformer becomes hard to extend.

Page 25: Supervised by Prof. LYU, Rung Tsong Michael

Design 1 with modification Modification

Provide an additional file or interface as a template for user to define the layout of SMIL presentation.

Disadvantage The flexibility provided is still limited. Not a standard way to define a template.

Transformation

Page 26: Supervised by Prof. LYU, Rung Tsong Michael

Design 2 Use XSLT assisting the

transformation. User can define his own template with XSL.

Advantages Program-independent Extensible Standard templates

Transformation

Limitations of XSLT It can only read one i

nput data file and one XSL file, then generate one output.

It cannot do combin-ation among files.

Page 27: Supervised by Prof. LYU, Rung Tsong Michael

Design 2Solutions: Knowledge Enrichment

Combine additional information with the XML file from XVIP before converting to SMIL

Creating output files Use separate XSL files to generate RealText files Use separate XSL files to generate layout of the pres

entation and displaying order of objects in different regions, then combine them to a SMIL file

Transformation

Page 28: Supervised by Prof. LYU, Rung Tsong Michael

Knowledge Enrichment

Transformation

Combined XML file

Information of major cities

XML file from XVIP

Page 29: Supervised by Prof. LYU, Rung Tsong Michael

Combined XML file XML file

contains information of major cities that are related to the video.

<COMBINE><TIME begin="10" dur="11"><NAME>香港 </NAME><DETAIL>中國南部一個沿海城市 </DETAIL><AREA>China</AREA></TIME><TIME begin="21" dur="20"><NAME>紐約 </NAME><DETAIL>隸屬美國紐約州的城市 </DETAIL><AREA>America</AREA></TIME></COMBINE>

Transformation

Page 30: Supervised by Prof. LYU, Rung Tsong Michael

Create RealText files

Geographical Information

Biographical Information

Video Transcript

Transformation

Page 31: Supervised by Prof. LYU, Rung Tsong Michael

Create SMIL file

Transformation

Layout

Displaying

order

Page 32: Supervised by Prof. LYU, Rung Tsong Michael

Create SMIL file

Transformation

SMIL PresentationCombining the temporary files

Page 33: Supervised by Prof. LYU, Rung Tsong Michael

Problems & Solutions Problem 1

The result from XSLT processor is in UTF-8 encoding format, but SMIL needs the format ANSI.

Solution: Write a function “UTF8toANSI” for conversion.

Problems & Solutions

Page 34: Supervised by Prof. LYU, Rung Tsong Michael

Problems & Solutions Problem 2

XSLT has limitation. It can only read one XML, one XSL file and generate one output file.

Our transformation process has more than one input files

Solution: Do knowledge enrichment and produce a combined XML

result file before creating the output files.

Problems & Solutions

Page 35: Supervised by Prof. LYU, Rung Tsong Michael

ConclusionXVIP contains: Four video information modalities

Scene change detection VOCD Speech recognition Face detection

Information integration module with XML For storing the extracted video data in XML format

Conclusion

Page 36: Supervised by Prof. LYU, Rung Tsong Michael

Conclusion XML editor

For editing the XML file generated

Knowledge enrichment component For adding additional information to the XML-

based video data

XML to SMIL transformer For converting the XML-based video data into

SMIL presentation

Conclusion

Page 37: Supervised by Prof. LYU, Rung Tsong Michael

ConclusionXVIP : provides multiple functions for extracting video

information stores video information in a flexible and

scalable way Comprises a transformer to generate prese

ntation on the information

Paper “XVIP: An XML-Based Video Information Processing System”, Michael Lyu, Edward Yau, C.H.Ngai, P.W.Chan, was accepted by COMPSAC 2002.

Conclusion

Page 38: Supervised by Prof. LYU, Rung Tsong Michael

Q & A