Stephen Doherty, CNGL/SALIS stephen.doherty2@mail.dcu.ie

Preview:

Citation preview

Stephen Doherty, CNGL/SALIS

stephen.doherty2@mail.dcu.ie

Past Research Readability & Comprehensibility Controlled Language Research Proposal (Methodology) Evaluation (Eye Tracking) Conclusion

2

Translating Versus Post-Editing: A Segmentation Comparison Based on Pauses (B.A. Dissertation)

Think-Aloud Protocols in Translation Studies (Interessen der kognitiv orientiereten Translationswissenschaft)

3

CNGL Work Package: ILT1.8 Controlled Language:

Supervisors – Dr. Sharon O’Brien, Dr. Dorothy Kenny

“adapt the systems developed by other ILT WPs to deal with in-house data which conforms to both source and target controlled language guidelines”

4

What is readability?

(Gray 1935: “In the reader, those features affecting readability are 1. prior knowledge, 2. reading skill, 3. interest, and 4. motivation. In the text, those features are 1. content, 2. style, 3. design, and 4. structure”.)

What is comprehensibility?

5

Metrics: (Reading scores, recall tests...)

E.g. Flesch Reading Ease:

Gunning-Fog Index – SMOG (Simple Measure of Gobbledygook) (Mc Laughlin 1969)

6

What is controlled language?

“an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style”

(Huijsen, 1998)

7

Types of CL:

Human-Orientated Controlled Language (HOCL): readability & comprehensibility e.g. AECMA Simplified English

Machine-Orientated Controlled Language (MOCL): improved translatability, MT system specific

(Huijsen, 1998)

8

Examples of CLs: AECMA Simplified English, Sun Microsystem’s Controlled English, IBM Easy English, Caterpillar Technical English, GM...

Usage (mostly English, but…)

Symantec (CNGL Industry Partner)

9

Roturier (2006):

Consistent spelling (54) Do not use pronouns that have no specific referent (19) Avoid unusual punctuation (35) Avoid embedded clauses introduced by commas or dashes (41) Do not use more than 25 words per sentence (5) Use a question mark only at the end of a direct question (48)

10

O’Brien (2003) - three types of rule categories:

Lexical (e.g. Rules that allow or rule out the use of specific acronyms or abbreviations)

Syntactic (e.g. specifying when and where past participles can be used and avoiding the present participle)

Textual: Text Structure (e.g. Specifying admissible sentence length) Pragmatic (e.g. Using certain verb forms for specific text purposes

– imperative for instructions)

11

A comparative investigation of the readability and comprehensibility of SMT and RBMT output for

controlled and uncontrolled input

12

13

14

15

Both automatic and human evaluation (focus)

Automatic evaluation (Blue…)

Human evaluation: eye tracking & retrospective protocols (recall tests & interviews)

16

Eye Tracking:

What is it exactly? (background)

Successful application in this research area

Tobii Eye Tracker & ClearView software

Additional video recording, keystroke & mouse logging

17

18

Tobii 1750 Eye Tracker (www.tobii.se)

Recall tests (comprehensibility)

Retrospective interviews (generation of additional data & resolving possible issues)

19

20

21

Recommended