2
Agenda
• Definition of a sub-segment– As a text element– As placeable/automatically substitutable elements– As terminology– In concordance searching (source and target
language)– In source/target language matching (sub-segment
translation)
• Technologies involved
3
Definition of sub-segments
• As a text element…– Separately translatable elements that are
embedded or attached to other elements (segments) in a text
• Footnotes• Index entries
– Text within elements like tags that are individual units of meaning
• Text within translatable attributes of tags
4
Footnotes inside and at the end
Across 5
The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment.Footnotes that contain several segments in themselves are treated as flowing text. Each segment is translated individually.
5
Footnotes inside and at the end
memoQ 4.5
The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment.Footnotes that contain several segments in themselves are treated as flowing text. Each segment is translated individually.
7
Footnotes inside and at the end
SDL Trados 2009
The footnote position is identified by a placeholder, the actual text of the footnote is a separate segment.Footnotes that contain several segments in themselves are treated as flowing text. Each segment is translated individually.
8
Index entries examples SDL Trados 2009 and memoQ
memoQ 4.5The index entry start and end is identified by a placeholder, the actual text of the index entry has to be translated inside the segment.
SDL Trados 2009 StudioThe index entry position is identified by a placeholder, the actual text of the index entry is a separate segment.
9
Index entries examples: Across 5
The index entry position is identified by a placeholder, the actual text of the index entry is a separatesegment, to be translated in a separate window.
10
Text inside a tag (attribute)
The text within the tag has to be translated insidethe segment.
Across 5
Title of a graphic in an HTML file (pop-up during mouse-over)
11
Text inside a tag (attribute)
SDL Trados 2007
The text within the tag has to be translated inside the segment.
The text within the tag is translated as a separatesegment.
memoQ 4.5
12
As placeables
• Automatically substitutable elements– Dates– Times– Numbers– Measurement– Variables
16
As terminology
• Terminology search in one or more term bases
• Search will also find fuzzy matches• Terms and translations shown in a separate
window
18
Term recognition examples
• memoQ 4.5Allowed (blue) and forbidden (black) terms are listed, more term information at the bottom
21
Sub-segments in source language -Concordance
• Whenever the TM system does not find a match for the whole segment, a search for segment parts can be initiated, the so-called concordance search
• A concordance is a sorted list of words and phrases, associated with the sentences they appear in.
• Concordance search will often also find similar fragments (fuzzy search)
• Result appear marked in a list of all the sentence pairs from the TM
29
Automatic concordance search
• Concordance searching is often initiated manually, when you know that part of the segment has been translated before
• Some tools also offer the automation of the concordance search in the source language so that the translator gets the information that there are segment pairs in the TM that contain a certain sub-segment in the source language, without having to look for them explicitly
30
Automatic sub-segment translation
• Concordance search can speed up the translation process– by showing the translator sentence pairs that
contain the term or phrase from the concordance search
• It would be even better, if the system could find the translation for that term or phrase and offer it to the translator automatically
• As linguistic analysis would be too difficult to implement for each and every language pair, most tools today work with a statistics-based approach.
31
Automatic sub-segment translation
Fragment Assembly• Segments can be assembled out of known parts like,
terms from the term base or smaller segments in the TM.
• The translations of those sub-segments are embedded into the source language segment.
32
Automatic sub-segment translation
Database of fragment pairs • The tool creates a list of sub-segments in the source
language that appear frequently.
• Then, by a statistical approach also known from terminology extraction, they search for recurring fragments in the target language parts of the segment pairs, thus selecting the possible translation of the fragments .
• This list is a third database besides TM and term base
33
Automatic sub-segment translation examples
• Across 5: Auto-completionauto-text suggestions out of a lexicon created with crossMining (statistical extraction)
34
Automatic sub-segment translation: Examples
• Déjà Vu: AutoAssembleKnown fragments from term base and TM are inserted into the translation field
35
Automatic sub-segment translation: Examples
• memoQ: Fragment Assemblyknown elements (from TM and term base) are embedded into the source sentence
36
Automatic sub-segment translation: Examples
• SDL Trados 2007Pre-translation with insertion of terms from the term base as annotations
37
Automatic sub-segment translation: Examples
• DéjàVu AutoAssemble– Lexicon (extracted terms of the source language
text)– User adds translations to terms in the lexicon list– Assembly of segments during translation
• Translation of similar sentences are filled with terms from lexicon
39
Automatic sub-segment translation: Examples
• SDL Trados AutoSuggestAn AutoSuggest database is created with statistical means, extracting frequent source language fragments and their (statistical) counterpart in the target language from a TM. When the translator starts to type, the suggestions for translations are displayed in a list.
41
Automatic sub-segment translation: Examples
• MultiTrans WordAlignFrom the database of bi-texts, a list of phrase and term pairs is created, and the statistics for the terms in the translation can be shown to the translator
43
Sub-Segment Suggestions
• For sub-segment matching in the target language– Tools extract phrase pairs (from 1 to n words per phrase)
with statistical means from a bilingual source (translation memory, bi-text corpus)
– A list or database of these phrase pairs provides suggestions for the translation of those phrases as auto completion
– Linguistic analysis is not (yet) a part of this phrase matching process