PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing

PGNET, Liverpool JMU, June 2005

MediaHub: An Intelligent MultiMedia

Distributed Platform Hub

Glenn Campbell, Tom Lunney, Paul Mc Kevitt

School of Computing and Intelligent Systems

Faculty of Engineering

University of Ulster, Magee Campus

Northland Road, Derry

{Campbell-g8, TF.Lunney, P.McKevitt} @ulster.ac.uk


Outline

Goals and objectives Key research problems Distributed Processing Distributed Platforms Architecture of MediaHub Decision making in MediaHub Comparison to related research Tools and future development


Goals

The primary objectives of MediaHub are to:

Interpret/generate semantic representations of multimodal input/output

Perform fusion and synchronisation of multimodal data

(decision-making) Implement and evaluate a multimodal platform hub

(MediaHub)


Goals

Research questions:

Semantic representation?

Communication with other elements of a platform?

• Semantic representation?

• Decision-making?


Key research problems

Semantic Representation Represent language and vision Frames or XML?

Semantic Storage Blackboard model? Non-blackboard model?

Decision-making Fusion and synchronisation AI technique


Frames (CHAMELEON)

(Brøndsted et al. 1998, 2001)

[MODULEINPUT: inputINTENTION: intention-typeTIME: timestamp] [SPEECH-RECOGNISERUTTERANCE:(Point to Hanne’s office)INTENTION: instruction!TIME: timestamp] [GESTUREGESTURE: coordinates (3, 2)INTENTION: pointingTIME: timestamp]

XML (M3L, SmartKom)

(Bühler et al. 2002, Wahlster et al. 2001)

<presentationTask> <presentationGoal> <inform> <informFocus> <RealizationType>list </RealizationType>

</informFocus> </inform> <abstractPresentationContent><discourseTopic> <goal>epg_browse</goal> </discourseTopic><informationSearch id="dim24"><tvProgram id="dim23"> <broadcast><timeDeictic id="dim16">now</timeDeictic> <between>2003-03-20T19:42:32 2003-03-

20T22:00:00</between> <channel><channel id="dim13"/> </channel> </broadcast></tvProgram></informationSearch> <result> <event><pieceOfInformation> <tvProgram id="ap_3"><broadcast> <beginTime>2003-03-20T19:50:00</beginTime> <endTime>2003-03-20T19:55:00</endTime> <avMedium> <title>Today’s Stock News</title></avMedium> <channel>ARD</channel></broadcast>…….. </event> </result></presentationGoal> </presentationTask>

Semantic representation


Semantic storage

Blackboard or Non-blackboard? High coupling – Blackboard? Low coupling - distributed architecture?

Communication Via central blackboard? Message passing between modules?


Decision-making (fusion & synchronisation)

Rule-based

Potential for Other AI techniques Fuzzy Logic Neural Networks Genetic Algorithms Bayesian Networks (CPNs)


Distributed processing

DACS (Fink et al. 1995, 1996) Open Agent Architecture (OAA)

(Cheyer et al. 1998, OAA 2004) JATLite (Kristensen 2001, Jeon et al. 2000) JavaSpaces (Freeman 2004) CORBA (Vinoski 1993) .NET (Fay 2003)


Intelligent Multimedia Distributed Platforms

Blackboard Model:

Ymir (Thórisson 1999)

CHAMELEON (Brøndsted et al. 1998, 2001)

Smartkom (Bühler et al. 2002, Wahlster et al. 2001, SmartKom 2004)

DARBS (Nolle et al. 2001)

DARPA Galaxy Communicator (Bayer et al. 2001)

Psyclone (Psyclone 2004)

Spoken Image/SONAS (Ó Nualláin et al. 1994, Ó Nualláin & Smith 1994, Kelleher et al. 2000)


Intelligent Multimedia Distributed Platforms

Non-blackboard Model:

WAXHOLM (Carlson et al. 1996)

AESOPWORLD (Okada 1996)

COLLAGEN (Rich et al. 1997)

INTERACT (Waibel et al. 1996)

Oxygen (Oxygen 2004)

EMBASSI (Kirste 2001, EMBASSI 2004)

MIAMM (MIAMM 2004)


CHAMELEON

Language & vision integration system consists of ten modules, mostly programmed in C and

C++ DACS communication system used for communication Blackboard stores semantic representations produced

by other modules Communication between modules achieved by

exchanging semantic representations between themselves or blackboard

Semantic representation in form of input, output and integration frames


Architecture of CHAMELEON


SmartKom

User adaptive interface for human-computer interaction

Mobile Public Home/Office

Facilitates speech, gestures and facial expression input

XML-based mark-up language, M3L, used for semantic representation

Distributed multiple blackboard model


Architecture of SmartKom


Dialogue Manager Acts as a blackboard module Facilitates communication between other modules Synchronisation

Semantic Representation Database Provides semantic representation of language and

vision data

Decision Making Module AI technique for a unique form of decision-making

Bayesian Networks (CPNs) Neural Networks, Genetic Algorithms, Fuzzy Logic

Architecture of MediaHub


Architecture of MediaHub


Decision Making Module


Decision making in MediaHub

Decisions at Input: Determining semantic content of input Fusing semantics of input (into frames/XML) Resolving ambiguity at input

Decisions at Output: Synchronising language with visual output Best modality for output (i.e. language or vision)


Input example

“Copy all files from the ‘process control’ folder of this computer to a new folder called ‘check data’ on that computer”.


Output Example

P

T

“This is the best route from Paul’s office to Tom’s office”.


Comparison to related research


Potential Tools Main Programming Language

Java C++

Communication .NET DACS

Semantic Representation XML XHTML + Voice SMIL RDF Schema MPEG-7 EMMA


Potential Tools Decision Making Tools

HUGIN GUI / API (Hugin 2004)

Microsoft MSBNx / MSBN3 (Kadie et al. 2001)

GeNIe/SMILE (Genie 2005)

Netica (Norsys 2005) Bayes Net Toolbox (BNT 2005)

BUGS (BUGS 2005)


Hugin

Tool for implementing Bayesian Networks as CPNs (Causal Probabilistic Networks)

Hugin GUI Graphical user interface to Hugin decision engine

Hugin API Library implemented in C, C++, Java Allows programs to implement Bayesian

Networks for decision making


Bayesian Networks

AKA Bayes nets, Causal Probabilistic Networks (CPNs), Bayesian Belief Networks

Consists of nodes and directed edges between nodes

Node represents a variable Edge represents cause-effect relationship An edge connecting two nodes A and B

indicates a direct influence exists between state of A and the state of B


Simple Bayesian Network

‘Diet’ and ‘Exercise’ nodes have influence over ‘Weight Loss’ node


Future development

Define necessary decisions Develop Bayesian decision making using Hugin API for Java Semantic storage Communication Semantic representation scheme Semantic representation database Acquire multimodal corpora for testing Test MediaHub in an existing Multimodal Platform e.g.

CONFUCIUS (Ma & Mc Kevitt 2003)


Conclusion An intelligent multimodal distributed platform hub called MediaHub

will be developed

MediaHub will interpret and generate semantic representations of multimodal input and output

MediaHub will perform fusion and synchronisation of language and vision data

MediaHub will provide a new method of decision making within a distributed platform hub

MediaHub will be tested within an existing multimodal platform (e.g. CONFUCIUS)

Documents

PGNET, Liverpool JMU, June 2005 MediaHub: An Intelligent MultiMedia Distributed Platform Hub Glenn Campbell, Tom Lunney, Paul Mc Kevitt School of Computing