21
Sofia April 27. 2006 Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg • Computer Science Department Natural Language Systems Group WWW: http://nats-www.informatik.uni-hamburg.de/view/User/WaltherVHa hn E-Mail: vhahn@informatik, uni-hamburg.de

Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Embed Size (px)

Citation preview

Page 1: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Language Technology in the

Information Society- Hot issues and open

questions -

Walther v.Hahn

University of Hamburg • Computer Science Department

Natural Language Systems Group

WWW: http://nats-www.informatik.uni-hamburg.de/view/User/WaltherVHahn

E-Mail: vhahn@informatik, uni-hamburg.de

Page 2: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Language Processing is more than a Text Processor

Text Processing

Page 3: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

What is „Real“ Text Technology?

Text Processing

Gestures

Corpora

Images

Tools

Web Applications

Cultures

Linguistics

Workflow

Technologies Domains

User profiles

Ontologies

Languages

Models and Society

System Type

Methods

Page 4: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Language Technology is only useful if …

• ‘text” comprises spoken and written utterances of the language (the linguistic definition):

– Borders between the two types blur in WWW texts.

• the processes have access to the semantics of texts:

– the meaning structure with lexical semantics and syntax, including reference,

• the processes have access to the pragmatics of texts:

– the intended action (plan) behind the utterance

Page 5: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Focus of Interest in Texts

• We are interested in the semantics and pragmatics of utterances, not only in linguistic features or logic description

• Semantics and pragmatics are dependent on the users‘ intentions not only literal word-

centered contents• The contents, not the wording counts

paraphrases are equivalent• Utterances are planned hearer-dependent actions

only helpful answers count• Utterances are situated reference is crucial• The coherence of utterances reveals the whole truth

Without deictic resolution a system is blind

Page 6: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Comparing and Translating Languages)

Multilingual text processes of any sort are never a linguistic problem alone; you can write rules about languages, but not about reference, ontologies or work flows. Many processing problems are still unsolved:

• Language is ambiguousI saw a man with a hat vs I saw a man with a telescope

• differences among languagesНяколко момчета и момичета отидоха на кино, но само на

двама от тях им хареса филма.Some boys and girls went to the cinema, but only two of them (the

boys!) liked the movie.• WritingIt is even difficult to recognize names in different writings (Galja or Galia?), depending on the transliteration or conceptualization

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

Page 7: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Multilingual Communication

10 Emerging Technologies That Will Change Your WorldTechnology

Review Febr 2004

– Universal Translation– Synthetic Biology– Nanowires– Bayesian Machine Learning– T-Rays– Distributed Storage– RNA Interference– Power Grid Control– Microfluidic Optical Fibers– Personal Genomics

It is hopeless to provide FAHQMT translations of - any text, - any text type, - any purpose, - any situation, - any person, by one single systemonly for the 380 European language pairs. But the need is real. What is the solution? Who is responsible for what?

It is hopeless to provide FAHQMT translations of - any text, - any text type, - any purpose, - any situation, - any person, by one single systemonly for the 380 European language pairs. But the need is real. What is the solution? Who is responsible for what?

Page 8: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Modern Methods

The case of Machine Translation shows:

• Homogenous systems have systematic problems– Rule based (problem: crashes with unknown phenomena of any sort)– Statistical (problem: the millions of rare examples, extralinguistic

parameters)• Hybrid systems

– A) hybrid technologies• Statistical / example based/ rule based/ menue based etc. translation

– B) hybrid in/out channels • text, • sound,• images, • gestures,• lip reading

Page 9: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Real-Life Background of Texts

• Text analysis is crucial for cooperation in a vast variety of communication types:

• Oral communication (politics, economy, touristics, private interests, ...)

• Written communication (publications, lyrics, e-mail, touristics, bills, legal texts, ...)

• Text and images (WWW, catalogues, museums, menues, advertizing of any type, ...)

• Texts and gestures (road information, slanders, touristics, explanations of any sort, ...)

• Text and facial expression (slander, irony, first aid, ...)

Page 10: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Why natural language at all?

1. CoverageNL is complete. Whatever you want to express, it is , in principle, possible within NL,

2. Vagueness

The use of vague expressions is a highly efficient method in human interaction and the basis for innovative thinking. It must be tackled by language technology in a pragmatic way, not only by logical mechanisms,

3. Abbreviations

In most realistic and complex query settings, NL is shorter than formal languages. NL is coherent over whole paragraphs without repeating given information again and again. Elliptical expressions are unambigous by virtue of the situation and shared knowledge.

Page 11: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

The Value of „Soft“ Fields

• Computer application (thus, computer science) in some areas is shifting from numerical operations to intentional support of human action. The specification, however, is conveyed exclusively by texts,

• This requires the understanding of utterances as communicative and cooperative problem solving (John Searle‘s „How to do things with words“).

• Introducing computer technology in real life increasingly requires psychology, linguistics, sociology and other humanities instead of mathematics, logics and statistics, which today is included anyway in tools, class libraries or plug-ins.

Page 12: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Metalanguage and Evaluation

• Any occurence of natural language is a mixture of object language and metalanguage, Error messages by the machine or by the user have to cope with both levels. This is why user cannot correct a system by giving natural language examples or helpful hints. Moreover, very often the level distinction is implicite („rubbish!“).

• Evaluation modules for quality check within hybrid systems cannot be better than the modules of the system. Otherwise a programmer would change the system‘s operating directly.

Page 13: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Verbmobil

Page 14: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

International Cooperation

• Language processing is always a mixture of transnational interests, languages and (commercial, political, ethical, ...) agents, according to import and export of knowledge.

• Some tasks are international (political, economical organizations),

• others are national (language description, analysis of utterances, corpora, culture, ethics).

• In the future, EU will support pivot languages insted of 380 language pairs and every country will need to define their knowledge import and export budget and who pays for it (tax payers, companies, foundations, sponsors)

Page 15: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Obsolete Approaches

• FAHQ Analyis

• Isolated Systems

• Systems for all Languages

• Systems for all Domains

• Monomethodic Systems

• „The unknown trick“

Page 16: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

The Sustainability Issue

Example:

• The digital satellite images of the rain forest from the 70ies in South America are not readable any more and can not be reconstructed by any method. They are lost for ever.

Page 17: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Maintenance and Conservation of Digital Documents

means

• „Refreshing“: To copy the document from media to media to keep the bit sequence. This is technically trivial.

• „Migration of content“: Conservation of the contents independently of the original perception

• “Migration of perception“: Conservation of the identical or very similar visual/acoustic surface.

Page 18: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Maintenance of Perception

Display 2106

Operating System 2106

Machine 2106

Document 2106

Display 2006

Operating System 2006

Machine 2006

Document 2006

Pro

cess

ing

Env

ironm

ent1

Pro

cess

ing

Env

ironm

ent2

=

Page 19: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Possible Solutions

• Producing paper copies, but losing programes, games, MM presentations, etc.

• Museum Approach: Keeping a machine of every new type with OS, programming languages etc. Not realistically viable (spare parts, operating knowledge, ...), extremely expensive.

• Emulation Approach: software emulation of software and OS, a long term task.

• Universal Virtual Approach: Constructing a virtual machine for the simulation of all existing machines and operating systems, extremely expensive.

• „Stone of Rosetta“-Approach: Keeping the bit sequence and producing an exact specification of the formal features of the document, the operating system and the machine. The local effort is small, but reconstruction is extremly expensive.

Page 20: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Emulation Approach

Display 2006

Operating System 2006

Machine 2006

Document 2006P

roce

ssin

g E

nviro

nmen

t1

Display 2006

Operating System 2006

2006-Emulator 2106

OS 2106

Machine 2106

Document 2006

Pro

cess

ing

Env

ironm

ent2

=

Page 21: Sofia April 27. 2006 Language Technology in the Information Society - Hot issues and open questions - Walther v.Hahn University of Hamburg Computer Science

Sofia April 27. 2006

Conclusion

• We have gained a lot of partial knowledge about language and language use, which may be sufficient for rather specific applications, however,

• we are taking the parts for the whole,

• we expect users to adapt to our technologies,

• we define the processing tasks by the cases we can handle,

• we still define text analysis in the paradigm of the 80ies.