31
1 REVEAL THIS and the COSMOROE REVEAL THIS and the COSMOROE cross-media relations framework cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech Processing (ILSP) “Athena” Research Centre Athens, Greece

1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

Embed Size (px)

Citation preview

Page 1: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

1

REVEAL THIS and the COSMOROE REVEAL THIS and the COSMOROE cross-media relations frameworkcross-media relations framework

Katerina Pastra

Language Technology Applications Department

Institute for Language and Speech Processing (ILSP)

“Athena” Research CentreAthens, Greece

Page 2: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

2

• Basic and applied research in the field of Natural Language Processing focusing on the design of computational models for natural language recognition and understanding with application to three interwoven tracks:

- information processing, extraction & retrieval- multilingual information processing (multilingual applications & translation

systems)- multimedia information processing (fusion of language with other modalities)

ILSP/LTA GoalsILSP/LTA Goals

Page 3: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

4

Research and Development Research and Development directionsdirections

• Aiming at enhancing the capacity of processing multilingual multimedia content

• Enabling fusion of unimodal (text, speech, image) processing results in order to better understand the workings of language, information access and communication phenomena

• Preparing for the important role of language technologies in the forthcoming full-fledged convergence of information and edutainment channels (tv, radio, web)

Page 4: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

5

• Multilingual Information Processing

Use parallel corpora for automatically acquiring bilingual lexica in EN – EL

Employ contextual information for lexical transfer selection

Use the annotated parallel corpus and the automatically extracted lexica to build a statistical machine translation infrastructure

TRAID translation memory – Machine Translation Toolkit

Research and Development Research and Development projects (1)projects (1)

Page 5: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

6

• Improving machine-assisted subtitling in a universal access framework Investigate the cognitive models underlying human subtitling and implement the appropriate computational architectures

Integrate image processing to improve video segmentation and recognise subtitle unit

Investigate the extent to which existing subtitle generation methods are portable and can be parameterised across special classes of viewers, e.g. children

Projects: MUSA/IST

Research and Development Research and Development projects (2)projects (2)

Page 6: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

7

Research and Development Research and Development projects (3)projects (3)

•Multimedia indexing and retrievalAugment the content of multimedia documents with high-level semantic indexical information (e.g. names of entities, terms, topics, facts)Develop cross-media and cross-language representations to enable linking of topically relevant video programmes, webtexts and images. Build high-level functionalities like semantic search, retrieval, filtering, categorization, translation, summarization

Projects: CIMWOS/IST, REVEAL THIS/IST

Page 7: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

8

Research Focus on MultimediaResearch Focus on Multimedia

Multimedia discourse relations (the COSMOROE framework) applications: cross-media indexing and retrieval, segmentation of audiovisual data, multimedia summarization

Sensorimotor & Symbolic Integration Resources Ongoing work for building an extensible computational resource which associates symbolic representations (words/concepts) with corresponding sensorimotor representations, enriched with patterns of combinations among these representations for forming conceptual structures at different levels of abstraction; focus on human action and interaction in every day life.

Going bottom-up in the resource (from sensorimotor representations to concepts) one will get a hierarchical composition of human behaviour, while going top-down (from concepts to sensorimotor

representations) one will get intentionally-laden interpretations of those structures

Page 8: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

9

Cross-Media Decision Cross-Media Decision MechanismsMechanismsMechanisms that decide on the relation

that holds between medium specific pieces of information: across documents (Boll et al. 1999) within documents (Pastra 2006)The mechanisms decide whether medium-specific pieces of information within the same Multimedia Document are: associated (multimedia integration) complementary semantically compatible/incompatible

complementaritycomplementarityindependenceindependence

equivalenceequivalence

Page 9: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

10

Cross-media Relation ExamplesCross-media Relation Examples

Equivalence: “the yellow taxi-boats…”

Essential complementarity: “…[pollution has taken its toll] on that..”

Non-essential complementarity: “…we are heading to Patmos…”

Independence: “…I have finally found a place that’s not overrun by tourists…”

Page 10: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

11

Cross-media relationsCross-media relations

Equivalence: info expressed by different media refers to the same entity (object, state, event or property) Complementarity: info in one medium is an (essential or not) complement of the info expressed in another. Essential complementarity usually indicated through association signals (e.g. indexicals) Non-essentially complementarity info in one medium is a modifier/adjunct of info expressed in another

Independence: each medium carries an independent (but coherent) part of the MM messageIncoherence due to errors in Incoherence due to errors in

medium-specific processing or medium-specific processing or artistic/editorial reasonsartistic/editorial reasons

Page 11: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

12Non-essential complementarity: “…we are heading to Patmos…”

Essential complementarity: “…[pollution has taken its toll] on this..”

Independence: “…I have finally found a place that’s not overrun by tourists…”

Application example: Application example: a cross-media indexer’s decisionsa cross-media indexer’s decisions

Equivalence: “the yellow taxi-boat…”

or/and

or/and

andand

aanndd

2) Landscape – people

2) Landscape – people

2 2 ¬¬cchhooiiccee

1) Landscape–sea/coast

1) Landscape–sea/coast11¬¬aanndd

Page 12: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

13

Cross-Media Interaction Cross-Media Interaction RelationsRelations Intelligent multimedia systems (IMMS) need

mechanisms for analysing and generating semantic links between different modalities (Andre and Rist94, Feiner and McKeown93, Green02, Gut et al. 02, Martin and Kipp04 etc.)

Focus: either image-language, or gesture-language Semiotics: seminal analysis by Barthes84 (image-text), and Kendon04 (gesture-language) Automation of relation identification restricted to equivalence/association relations (cf. e.g. Barnard et al.03) mainly between images and text. Criticism: beyond different wording, different perspectives, different (or lack of clear) criteria, all attempts to define cross-media relations incorporate a qualitative notion of “contribution” of each medium to the message, some of them employ the Rhetorical Structure Theory (Mann & Thompson87)

Page 13: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

14

The case against RST for The case against RST for describing multimedia describing multimedia

discourse (1)discourse (1)

Inappropriate nucleus vs. satellite distinction (and the related notion of “contribution”) because:• it relies on a single, unique message reading directionality- language manifests itself linearly in time and space vs. - dynamic multimedia that are parallel in space and time (cf. AV data) vs. - static multimedia that are perceived linearly but not in a strictly pre-determined, unique order (cf. illustrated documents)• its identification relies usually on lexical cues and syntactic patternsSuch subtle cues are abundant in language to denote relations between text segments, only very few denote relation between language and other modalities• it presumes that segments are comparable in sizeInteracting modality units are not comparable (e.g. sentence – image region, word-sequence of frames etc.)

Page 14: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

getting around the island…

S

N

Means

N

S

Purpose

RST Relation Nucleus Satellite

Purpose I drove a moppet for getting around the island

Means I got around the island by driving a moppet

Example 1:Example 1:

Page 15: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

I got around the island by driving a moppet

Example 2:Example 2:

Page 16: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

17

The case against RST for The case against RST for describing multimedia describing multimedia

discourse (2)discourse (2) No compliance with media characteristics image characteristics: specificity, lack of subtle focus/salience indicators and explicit abstraction mechanisms language characteristics: abstraction, meta-language functions, lack of direct access to sensorimotor entitiescf. the following RST relation:“Elaboration” = the satellite presents additional detail about the content of the nucleus (e.g. member of a set, an instance of an abstraction, an attribute of an object, something specific in a generalisation) Lack of descriptive power and computational applicability mutual exclusiveness of RST relation categories inappropriate for capturing intentionality (Moore & Pollack92) fuzzy definitions of relations make manual annotation of data for training systems to identify the relations automatically problematic (low-inter-annotator agreement – cf. Carlson et al.03)

But images always present But images always present more details…more details…

Page 17: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

COSMOROECross-Media Interaction Relations

Equivalence Complementarity

Independence

Literal Figurative

Token-Token

Type-Token

Metonymy

Metaphor

Essential Non-Essential

Equivalence Signal

Defining Apposition

Exophora

Non-Defining Apposition

Adjunct

Contradiction Symbiosis

Meta-Information

Refining Refining the the

relation relation setset

Page 18: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

“… helmet for safety...”

Token-token Token-token equivalenceequivalenceSemantic equivalence in which one modality refers to exactly the same entity that the other also

refers too.

Page 19: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

“…the ever increasing population of Athens has the city bursting at the scenes and has created a vast

concrete sprawl of housing ...”

Type-token Type-token equivalenceequivalenceSemantic equivalence in which one modality refers to a class of entities and the other to one or more

representative members of the class.

Page 20: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

“The city, of course, is

Athens , and it is here that I will begin

my exploration of modern Greece.”

MetonymyMetonymyThe two referents come from the same domain, have same array of associations, there is no transfer of qualities from one to another – the two modalities refer to different entities but the user intends the two modalities to be considered semantically equivalent

Page 21: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

“ It’s very serene …”

a)

c)

b)

MetaphMetaphoror

The two modalities refer to different entities of different domains; the user intends the two modalities to be considered semantically equivalent – there is a transfer of qualities

Page 22: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

a)

b)

“Do you see the black at the top of the ceiling there ?”

Equivalance SignalEquivalance SignalEquivalence signals present in discourse indicate that one modality is essentially complemented by the other

Page 23: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

24

Defining Defining AppositionAppositionOne modality provides extra information to another, information that identifies or describes

something/someone and which –when vital for the clear comprehension of the message – is defining.

Non-Defining AppositionNon-Defining AppositionOne modality provides extra information to another, information that identifies or describes something/someone and which and which is not vital for the clear comprehension of the message.

Note: Apposition is Different Note: Apposition is Different from the Equivalence type:token from the Equivalence type:token

relation !relation !(e.g. Bush is an instance of a (e.g. Bush is an instance of a

president not generally, but in president not generally, but in certain time and space) certain time and space)

“the president…”

“the deceased handcuffed”

Apart from a type:token equivalence relation: “deceased” – image of victim (only part of the image/body shown here), one may identify an

apposition relation too: e.g. the tattoe on the hand of the man is extra, descriptive information,

complementary to the textual discourse, but not necessarily vital for comprehension by nature by nature

images will usually give such info, some images will usually give such info, some applications rely on such identification of extra applications rely on such identification of extra

info that seemed originally not important info that seemed originally not important (therefore not present in textual discourse) but (therefore not present in textual discourse) but

then considered significant e.g. crime scene then considered significant e.g. crime scene investigation applicationsinvestigation applications

Page 24: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

“…The city is a jumble of the ancient and the modern ...”

ExophoraExophoraA pragmatic “anaphora” case

Page 25: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

OCR: “ Acropolis ”

“…we are heading to Patmos… ”

AdjunctAdjunct

Non-essential complementarity – one modality functions as an adjunct to the other (place-position, place-direction, time, manner)

Page 26: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

Athens has been described as the last city in the West and the first city in the East. It's a place that is rich and spectacular in its history.

Many empires have held it in their sway: Romans, Venetians, Turks and Byzantines, and the result is a cosmopolitan city of three and a half million people.

SymbiosisSymbiosisEach modality expresses different pieces of information the conjunction (in time) of which serves phatic communication (visual fillers ¬ speech fillers)

Page 27: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

28

Meta-informationMeta-informationOne modality expresses information that comments on aspects of what the other expresses, going beyond the message communicated to creation-related comments (who created the message, when, why, how – cf. typical archival metadata)

“an aerial view of Athens….”

Page 28: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

29

Annotating Corpora with COSMOROE (2)Annotating Corpora with COSMOROE (2)

Tool = ANVIL (Kipp00) Levels of association (local context – diff. granularity) Annotation levels - Audiovisual Topic - Transcript (manual SR, subtitles, manual-OCR) - Body movement (indication of: body-part: hands, head, legs, whole-body type: deictic, iconic, emblem, beat, metaphoric - Images Frame-Sequence: foreground, background, both Keyframe-region: bounding box, free-text label, moving vs. static object indication, corresponding FrameSequence - Relations binding AnchorText entities with movement(s), image(s) etc.

Page 29: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

30

Page 30: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

31

Annotation ObjectivesAnnotation Objectives

To test the theory for coverage and applicability To answer questions on the semantics of multimedia discourse, e.g. - In which cases “what one sees is not what one hears” in discourse? - Which concepts are usually visualised in accompanying images or expressed through gestures in discourse and which is their level of abstraction? - How is the interaction between modalities signalled? - Which concepts are usually complemented with visual or gestural adjuncts? Could it be that one may predict the selectional restrictions for the arguments of a predicate when knowing its visual/gestural compliments (and vice versa)? - How is exophora realised? Could one use anaphora resolutions mechanisms to resolve exophora? To use machine learning (ML) for automating relation identification for different applications

Page 31: 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech

32

Future WorkFuture Work

First Phase: 5h Greek–5 hours English (to be reached by July07)

Investigation of phenomenon of “entailment” in multimedia discourse in the above dataset – internal collaboration with Stelios Piperidis-ILSP

Cognitive experimentation

(on coherence relations in Multimedia Discourse – notion of degree of fit between modalities as indicator of coherence – collaboration with Dublin Trinity College-Carl Vogel)

Machine Learning for auto identification of relations for indexing of Audiovisual Files in an extended dataset