Upload
ashlynn-rich
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Postgraduate Diploma
in Translation
Lecture 1
Computers and Language
Feb 2005 -- MR Diploma in Translation - Lecture 1 2
Course Information
Webhttp://www.cs.um.edu.mt/~mros/diptran
[email protected]@um.edu.mt
D. Arnold et al (1994) Machine Translation: an Introductory Guide. See website.
H. Somers (2003). Computers and Translation, a Translator’s Guide. See website.
Feb 2005 -- MR Diploma in Translation - Lecture 1 3
Computers and Language
Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts
Natural Language Processing Computational models of language analysis,
interpretation, and generation. Language Engineering
emphasis on large-scale performance example: Google
Feb 2005 -- MR Diploma in Translation - Lecture 1 4
CL: Two Main Disciplines
COMP SCILINGUISTICS
Feb 2005 -- MR Diploma in Translation - Lecture 1 5
Linguistics
Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use
Feb 2005 -- MR Diploma in Translation - Lecture 1 6
Grammar Rules:Prescriptive versus Descriptive
Prescriptive Grammar
Rules for and against certain uses
Proscribed forms that are in current use
“don’t end a sentence with a preposition”
Subjective
Descriptive Grammar
Rules characterizing what people actually say
Goal to characterize all and only that which speakers find acceptable
Objective
Feb 2005 -- MR Diploma in Translation - Lecture 1 7
Noam Chomsky
Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central.
Chomsky has been the dominant figure in linguistics ever since.
Chomsky invented the generative approach to grammar.
Feb 2005 -- MR Diploma in Translation - Lecture 1 8
Generative Grammar:Key Points
A language is a (possibly infinite) set of sentences. Grammar is finite. Grammar of a particular language expresses
linguistic knowledge of that language Theory of Grammar includes mathematical definition
of what a grammar is. The “Theory of Grammar” is a theory of human
linguistic abilities.[source: Sag & Wasow]
Feb 2005 -- MR Diploma in Translation - Lecture 1 9
Theories of Sentence and Word Structure: Rewrite Rules
Rules can be used to specify the sentences of a language.
Rules have the formLHS RHS LHS may be a sequence of symbols RHS may be a sequence of symbols or words.
Lexicon specifies words and their categories
Feb 2005 -- MR Diploma in Translation - Lecture 1 10
A Simple Grammar/Lexicon
grammar:
S NP VPNP NVP V NPlexicon:
V kicksN JohnN Bill
S
NP
N
John kicks
NPV
VP
N
Bill
Feb 2005 -- MR Diploma in Translation - Lecture 1 11
Formal v. Natural Languages
Formal Languages
Arithmetic3290 1 1010101
Logicx man(x) mortal(x)
URLhttp://www.cs.um.edu.mt
Natural Languages
EnglishJohn saw the dog
GermanJohann hat den hund gesehen
MalteseĠianni ra kelb
Feb 2005 -- MR Diploma in Translation - Lecture 1 12
Points of Similarity
A language is considered to be a (possibly infinite) set of sentences.
Sentences are sequences of words. Rules determine which sequences are valid
sentences. Sentences have a definite structure. Sentence structure related to meaning.
Feb 2005 -- MR Diploma in Translation - Lecture 1 13
Points of Difference
Formal Languages The grammar
defines the language
Restricted application
Non ambiguous
Natural Languages The language
defines the grammar
Universal application
Highly ambiguous
Feb 2005 -- MR Diploma in Translation - Lecture 1 14
Ambiguity Morphological Ambiguity
en-large-ment Lexical Ambiguity
the sheep is in the pen Syntactic Ambiguity
small animals and children laugh Semantic Ambiguity
every girl loves a sailor Pragmatic Ambiguity
can you pass the salt? The management of ambiguity is central to the
success of CL in general and MT in particular.
Feb 2005 -- MR Diploma in Translation - Lecture 1 15
Computer Science
The study of basic concepts Information Data Algorithm Program
The application of these concepts to practical tasks.
Implementation of computational models from other fields.
Feb 2005 -- MR Diploma in Translation - Lecture 1 16
Information Information is an theoretical concept invented by Shannon in
1948 to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits
1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else).
When I tell you that I have tea, I have conveyed one bit of information.
The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome.
Feb 2005 -- MR Diploma in Translation - Lecture 1 17
Data
A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means.
Example: a telephone directory Unlike information, which is abstract, data is
concrete Data has a certain level of structure. In the
telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number.
Feb 2005 -- MR Diploma in Translation - Lecture 1 18
Algorithm
A well defined procedure for the solution of a given problem in a finite number of steps
Abstract Designed to perform a well-defined task. Finite description length. Guaranteed to terminate.
Feb 2005 -- MR Diploma in Translation - Lecture 1 19
Algorithm for Chocolate Cake
Feb 2005 -- MR Diploma in Translation - Lecture 1 20
Program to Add X and Y
subtract 1 from X
add 1 to Y
X = 0?
Read X and YX = 2, Y = 3
yesno Output Y
Feb 2005 -- MR Diploma in Translation - Lecture 1 21
Computer Program
A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem.
Concrete A program can implement an algorithm. More than one program may implement the
same algorithm. Not all programs express good algorithms!
Feb 2005 -- MR Diploma in Translation - Lecture 1 22
Instructions vs. Execution Steps
1. Read X
2. Read Y
3. X = X-1
4. Y = Y+1
5. If X = 0 then Print(X) else goto 3
How many instructions?
How many execution steps?
Feb 2005 -- MR Diploma in Translation - Lecture 1 23
Algorithms and Linguistics
Does linguistic theory make sense without implementing the concepts?
Linguistic theory provides linguistic knowledge in the form of grammar rules theories about grammar rules
Putting knowledge to some use involves processing issues: parsing generation
Feb 2005 -- MR Diploma in Translation - Lecture 1 24
Computational Linguistics – Issues
How are a grammar and a lexicon represented? How is the structure of a given sentence actually
discovered? How can we actually generate a sentence to
express a particular meaning? How can linguistic theory be made concrete enough
to test algorithmically? Can an artificial system learn a language with
limited exposure to grammatical sentences?
Feb 2005 -- MR Diploma in Translation - Lecture 1 25
Non computational theoriescan be misleading
Representational details omitted. Computer memory requirements omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable
Feb 2005 -- MR Diploma in Translation - Lecture 1 26
Example of a Non Computational Model
Feb 2005 -- MR Diploma in Translation - Lecture 1 27
Computers and LanguageTwin Goals
Scientific Goal:Contribute to Linguistics by adding a computational dimension.
Technological Goal: Develop machinery capable of handling human language that can support “language engineering”
Feb 2005 -- MR Diploma in Translation - Lecture 1 28
Computers and Language Tools & Resources
Grammar Formalisms, e.g.Definite Clause Grammars
Parsing Algorithmssentence structure
Generation Algorithmsstructure sentence
Statistical Methods Linguistic Corpora
Feb 2005 -- MR Diploma in Translation - Lecture 1 29
Computers and Language: Applications
Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Integrated Multimodal Tasks Machine Translation