24
Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Embed Size (px)

Citation preview

Page 1: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Linguistic Representation of Finnish in the Medical Domain

Spoken Language Translation System

Marianne Santaholma, University of Geneva, TIM/ISSCO

Page 2: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Outline MedSLT system overview MedSLT Finnish language resources:

CorporaGeneration grammarLexiconInterlingua Finnish mapping rules

Initial evaluation results Summary

Page 3: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

MedSLT system (1) Open source medical domain SLT

system Diagnosis tool for doctors One-way dialog Multilingual Coverage: medical sub-domains Architecture: based on general linguistic

resources

Page 4: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Speech Platform Interface Process Translation Server

UnificationGrammar Database

RecognitionPackage

Nuance Voice Platform(recognition, playback)

Application Specific Data

Regulus Runtime

Time System

Regulus Compile

Time Component

GenerationGrammar

Page 5: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Outline MedSLT Finnish language resources:

Corpora Generation grammarLexiconInterlingua Finnish mapping rules

Initial evaluation results Summary

Page 6: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Finnish corpora (1) Headache and chest pain sub-domains Created by translating the original English

corpora Serve as the primary source to decide

what kind of structure rules and vocabulary necessary to introduce into Finnish language module

Page 7: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Finnish corpora (2)

Concepts covered frequency of pain, duration of pain, location of pain etc

Examples Do you have headaches in the morning?

• In the evening? Is your headache stubbing?

• severe? Are your headaches caused by coffee?

• By cheese?

Page 8: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Outline MedSLT Finnish language resources:

Corpora Generation grammarLexiconInterlingua Finnish mapping rules

Initial evaluation results Summary

Page 9: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

FIN generation grammar (1)

Specialized grammar for spoken languageReflects the specific text type and

discourse of the domain 57 grammar rules Unification formalism Developed on the Regulus platform

https://sourceforge.net/projects/regulus/

Page 10: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

FIN generation grammar (2)

FIN grammar developed by manual grammar adaptation from the Regulus general English grammar

The Finnish structure rules highly similar to English counterparts

In Finnish more phenomena resolved at morphology level rather than syntax

(Rayner et al., 2000. Spoken Language Translator)

Page 11: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

FIN generation grammar (2)'How frequent are your headaches?'s:[sem= @fronting_sem(Adj, S), wh=y\/rel, wh=Wh, vform=VForm,

inv=Inv, whmoved=y, takes_adv_type=none, gapsin=null, gapsout=null] -->adjp:[sem=Adj, wh=Wh, adjpos=pred, gapsin=null, gapsout=null], s:[sem=S, wh=n, vform=VForm, inv=Inv, whmoved=n, gapsin=adjp_gap, gapsout=null].

'Kuinka yleisiä päänsärkynne ovat?’ *how frequent your_headaches are?'s:[sem= @fronting_sem(Adj, S), wh=y, inv=n, vform=inf,

whmoved=y, takes_adv_type=none, gapsin=null, gapsout=null] -->adjp:[sem=Adj, wh=y, agr=Agr, adj_pos=pred, adj_case=Case, adj_degr=positive, gapsin=null, gapsout=null],s:[sem=S, wh=n, agr=Agr, vform=inf, inv=n, whmoved=n, gapsin=adjp_gap, gapsout=null].

Page 12: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Outline MedSLT Finnish language resources:

Corpora Generation grammarLexiconInterlingua Finnish mapping rules

Initial evaluation results Summary

Page 13: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Finnish Lexicon (1)

Domain specific ~ 530 lexical entries Difficulty: enumeration of all word forms

Example:Lievittää, ‘to relieve’, question form, sg 3., present.

verb:[sem=[[event, lievittää], [tense, present]], vform=q_ko, agr=sg, subcat=trans, subj_n_case=nom, subj_sem_n_type=(cause\/activity), obj_sem_n_type=perception_body, obj_case=ptv, takes_adv_type=frequency] --> lievittääkö.

Page 14: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Finnish Lexicon (2) Use of macros in lexical entries

macro(noun_perception_body([SgNom, PlNom, SgPtv, PlPtv], Sem), (noun:[sem=[Sem], sem_n_type=perception_body, agr=sg, case=nom]--> SgNoun)). macro(noun_perception_body([SgNom, PlNom, SgPtv, PlPtv], Sem), (noun:[sem=[Sem], sem_n_type=perception_body, agr=pl, case=nom]--> PlNoun)). macro(noun_perception_body([SgNom, PlNom, SgPtv, PlPtv], Sem), (noun:[sem=[Sem], sem_n_type=perception_body, agr=sg, case=ptv]--> SgPtv)). macro(noun_perception_body([SgNom, PlNom, SgPtv, PlPtv], Sem), (noun:[sem=[Sem], sem_n_type=perception_body, agr=pl, case=ptv]--> PlPtv)).

@noun_perception_body([särky, säryt, särkyä, särkyjä], [symptom, särky]).

Page 15: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Outline MedSLT Finnish language resources:

Corpora Generation grammarLexiconInterlingua Finnish mapping rules

Initial evaluation results Summary

Page 16: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Interlingua to FIN mapping MedSLT interlingua interlingua_constant([<key>, <value>])

‘interlingua_constant([symptom, headache])’

Interlingua mapping rules Transformation

Source InterlinguaInterlingua Target

Two types of rules: Simple interlingua transfer_lexicon entries Complex interlingua transfer_rules

Page 17: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

SOURCE INTERLINGUA TARGET

ENG: Does the redwine make your headache worse?

FIN: Pahentaako punaviini päänsärkyä?

[[adj,worse], [cause,red_wine], [event,make_adj], [prep,subj], [secondary_symptom, headache], [spec,the_sing], [tense,present], [utterance_type,ynq], [voice, active]]

[[sc,when], [clause, [[utterance_type,dcl], [pronoun,you], [tense,present], [voice,active], [action,drink], [cause,red_wine]]], [event,become_worse], [symptom,headache], [tense,present], [utterance_type,ynq], [voice,active]]

[[cause,punaviini], [event,pahentaa], [symptom,päänsärky], [tense,present], [utterance_type,ynq]]

transfer_rule([[sc,when], [clause, [[utterance_type,dcl], [pronoun,you], [tense,present], [voice,active], [action, drink], ECause]], [event, become_worse], [voice,active]],

[[event, pahentaa], @efin_cause (ECause)]).

transfer_lexicon([symptom, headache], [symptom, päänsärky]).

Page 18: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Outline MedSLT Finnish language resources:

Corpora Generation grammarLexiconInterlingua Finnish mapping rules

Initial evaluation results Summary

Page 19: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Evaluation (1)

Evaluation of Eng-Fin translation performance on headache sub-domain corpus of 870 utterances Comparison with Eng-Fre translation performance

Evaluation in two phases:1. Judging of speech recognition:

good vs. bad2. Judging of translations:

good/acceptable/bad

Page 20: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Evaluation (2)

60

4.4 0.5

35

75.8

19.2

0.7 4.4

0

20

40

60

80

FIN

FRE

FIN 60 4.4 0.5 35

FRE 75.8 19.2 0.7 4.4

Good translation

Acceptable translation

Bad translation

No translation

Page 21: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Evaluation (3)

Lexical gapsExample

“Does the pain radiate to the neck?”

(in coverage sentence) “Is the pain in the neck?”

(not in coverage sentence).

- Finnish ablative vs adessive case

‘kaulalle’ vs ‘kaulalla’

Page 22: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

Summary

Development of MedSLT Finnish language module by partly adapting the existing resources. English and Finnish grammar rules highly

similar despite the differences between the languages

Difficulty the Finnish rich morphology that however can be resolved for some degree by using macros in lexicon

Initial evaluation of translation performance

Page 23: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO

References

MedSLT http://sourceforge.net/projects/medslt/ http://www.issco.unige.ch/projects/medslt

Regulus https://sourceforge.net/projects/regulus/ http://www.issco.unige.ch/projects/regulus

Page 24: Linguistic Representation of Finnish in the Medical Domain Spoken Language Translation System Marianne Santaholma, University of Geneva, TIM/ISSCO