Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System

1Department of Computer Science and Engineering, CUHK

Final Year Project 2003/2004LYU0302

PVCAIS – Personal Video Conference Archives Indexing System

Supervisor: Prof Michael Lyu

Presented by: Lewis Ng, Philip Chan

25 November 2003

2

Outline

Introduction Motivation Architecture of PVCAIS - Media Acquisition Module - Archive Indexing Module - Videoconference Accessing Module Implementation in First Term Future Work Conclusion

3

Introduction

PVCAIS stands forPersonal Video Conference Archives Indexing System

A system that provides the convenient searching and browsing support for videoconferencing users on past videoconference archives

4

Introduction

What is video conference?

A real-time communication technology which combines different media:

audio, video, text chat, file transfer, whiteboard and shared communications

- More precisely is “multimedia conference”

5

Motivation

– Videoconference is becoming popular in

education, business, personal communication– Participants wish to keep videoconference

archives for later references– Normal video and audio files are neither

searchable nor helpful to recall their contents– Indexing of videoconference archives has not

been investigated till now

6

Architecture of PVCAIS

Consists of 3 modules:

- Media Acquisition Module

- Archive Indexing Module

- Videoconference Accessing Module

Mediaacquisition

Rawvideoconference

archives

Indexedvideoconference

archives

Archive indexing

Videoconferenceaccessing

7

Architecture of PVCAIS

8

Media Acquisition

Extracts channel data and forms media files Videoconferencing physically contains 4 types of ch

annels: Audio, Video, Data and Control Audio and Video channels: transmit incoming/ outgo

ing audio and video information Data channel: carries information for user applicatio

n such as Text Chat, Whiteboard and File Transfer Control channel: transmits system control informatio

n such as Member Information

9

Media Acquisition

Video-in and Video-out channel– Reduce redundancy : just store key-frames– Detect scene change in real time– Each key frame picture is stored with a timestamp

10

Media Acquisition

Audio-in and Audio-out channel– mixed into one stream after videoconference– will be used for Speech Recognition

Text Chat channel– sender, receiver– message– store with timestamp

11

Media Acquisition

Whiteboard channel– Consists of a text-based index file and a number

of snapshot pictures– Index file records timestamp for each whiteboard

update event and the path of the corresponding snapshot picture

– Update of this channel happens in a period of time -> need to detect when update begins and ends by monitoring data transfer in this channel

12

Media Acquisition

File Transfer channel– Will have a copy of the sent/received files to the

directory of archive and an index file– Index file includes sender’s and recipient’s user

names and the path of the files

Control channel– Contains timestamp and information of each

event such as member joined and member left

13

Media Acquisition

Paradigm of storing the videoconference archives.

Video_in

Video_out

Audio_in

Audio_out

Text_chat

Whiteboard

File_in

File_ out

Control

Time0:00:00

One line One lineTwo lines

One lineTwo linesThree lines

One lineTwo linesThree linesFour lines

One lineTwo linesThree linesFour linesFive lines

ii ii

Video_in archive

Audio archive

Text chat archive

Whiteboard archive

Document archive

Control archive

Video_out archive

14

Archive Indexing

7 raw files are extracted in Media Acquisition Module

Need to implement some indexing functions to retrieve more information

These includes: Face Detection, Face Recognition, Speech Recognition, OCR, Time-based Text Merging, Keyword Selection, Title Generation

15

Archive Indexing

Face Detection- distinguish between Slides and Faces

- if face is detected, find out the face region

Slide

Face

16

Archive Indexing

Face Recognition

- Associate human faces in Video-in with name

- Need to keep a face base

- If no match in the face base, ask remote user to enter the name

17

Archive Indexing

Speech Recognition

- Generate speech script from audio archive

- Speech of a videoconferencing contains the most information

- Can use commercial library: Microsoft SAPI, IBM Via Voice

OCR

- Take the slide archive as input and recognizes text from them

- Need to identify and localize text on the complex background

18

Archive Indexing

Time-based Text Merging- Merge the Speech transcript, Chat script, Whiteboard script and slide text archive to Text source according to their timestamp

Keyword Selection- takes the Text source as input

- generates keyword for the videoconference

19

Archive Indexing

Title Generation- takes the Text source as input

- automatically generates a title for the videoconference

Generate XML index file- integrates all the archives

- stores all the related files of a videoconference into a single directory

20

Videoconference Accessing

Provides an interface for user to manage, search and review all indexed conference.

Allows user to modify the content of a conference, such as editing title or keywords, or delete a conference.

Allows user to search for a conference by different criteria, such as member name or keyword.

Allows user to review a conference by playing back the audio or the key frames.

21

Implementation

NetMeeting 3.0– A Windows feature that provide Internet

conferencing function.– Support video, audio and data conferencing

including application sharing, chat, whiteboard and file transfer.

– Other features include remote desktop sharing.

22

Implementation

NetMeeting 3.0 SDK – An extension of NetMeeting, provides an interface

for programmers and Web developers to integrate conferencing capabilities into their applications.

– API is in the form of COM interfaces and functions.

23

Implementation

A simple NetMeeting compatible videoconference program built on top of the NetMeeting 3.0 SDK.

Support:

– Video– Audio– Text message– File Transfer– Whiteboard

24

Implementation

By directly using the functions of the API, the following raw data can be obtained: – the members information – file transfer record – text messages record

Video, audio and whiteboard data cannot be directly obtained.

25

Implementation

Video– create a thread to check the display of the video

windows – if scene change is detected, the video will be

captured and stored as a still image.– the stored images are key frames of the

conference and will be used for face detection and recognition after the conference.

26

Implementation

Audio– create a thread to record the local audio from the

microphone. – when certain amount of audio data is recorded, send

the audio data to all members of the conference.– all the received audio files and locally recorded audio

files will be combined to generate a single audio file.– the final audio file will be used for voice recognition,

the voice engine used is Microsoft SAPI.

27

Implementation

Whiteboard– cannot capture the

NetMeeting whiteboard information because the format of the data is not stated in the API.

– solution: create our own whiteboard function and

data format.

28

Conclusion

We developed a videoconferencing agent All channel data except whiteboard can be

collected. Speech Recognition and Face Detection &

Recognition is integrated into the system but accuracy needs to be improved

Simple searching can be performed on stored archives

29

Future Work

Whiteboard Improve accuracy of Voice Recognition XML Better searching method OCR for slide in video Improve User Interface

30

Q & A Session

Documents

Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System