52
Using corpora in contrastive studies Hilde Hasselgård University of Oslo

Using corpora in contrastive studies Hilde Hasselgård University of Oslo

Embed Size (px)

Citation preview

Page 1: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

Using corpora in contrastive studies

Hilde Hasselgård

University of Oslo

Page 2: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

3 >

Contrastive analysis

“Contrastive analysis is the systematic comparison of two or more languages, with the aim of describing their similarities and differences.” (Johansson 2007: 1)

CA [contrastive analysis] is a linguistic enterprise aimed at producing inverted (i.e. contrastive, not comparative) two-valued typologies (a CA is always concerned with a pair of languages), and founded on the assumption that languages can be compared. James (1980: 3)

Page 3: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

4 >

Corpora in linguistic analysis

Corpus: a (large) structured, machine-readable collection of texts, prepared for use in linguistic research.

Benefits of corpora:

•Empirical basis for claims material for studying language in use (“parole”)

•(Relatively) easy access to material

•(Usually) shared resource

–Enhances scientific quality, in that studies can be replicated and claims can be validated

Page 4: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

5 >

Some characteristics of corpus-linguistic studies

Insistence on authentic material as basis of research.

Strong empirical and descriptive focus: Attention to patterns of use rather than grammaticality and acceptability.

Quantitative investigations (often to back up qualitative ones)

– Frequency and distribution are seen as important features of words and constructions.

– Corpus studies often aim to be exhaustive of the material investigated, i.e. to account for all the occurrences of a particular word / construction in the corpus.

– Therefore both precision and recall are important in searching the corpus; Precision means to limit the output of the search to relevant constructions, and recall to cast the net wide enough.

Page 5: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

6 >

Example of precision and recall

Topic: English N + N combinations involving the noun head.

Search for head: The ENPC (fiction) returned 332 hits

–Good recall – will certainly catch all relevant phrases.

–Bad precision – most of the hits will not be relevant.

Precision can be improved if the corpus has PoS-information, so we can search for head preceded or followed by a noun.

–The ENPC (fiction) returned 12 hits of head + N: head waiter; head Aristotle; head, bit by bit; head, dad; head teacher; head , soup; head trimmer; head curtain; head honcho; head office; head floor nurse (twice). 5 hits of N + head, two of which are compounds: section head, deputy head.

Page 6: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

7 >

Corpora in contrastive analysis

Monolingual corpora = corpora that contain texts in one language only

Bilingual/Multilingual corpora = corpora that contain texts in two or more languages.

For a collection of texts in different languages to be called a ”parallel corpus”, the texts should be in some way related to each other:

– Translation corpora (through translation)

– Comparable corpora (through text comparability)

– Bidirectional translation corpora

Page 7: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

8 >

Translation corpusA corpus that contains the ‘same’ texts in more than one

language; in other words a corpus with both original and translated texts.

Original text(s)

Translation, language 1

(Translation, language 2)

(Translation, language 3)

Page 8: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

9 >

Comparable corpusa corpus that contains original texts in more than one language and where the texts in each language have been selected according to the same criteria (genre, content, publication date etc.)

Language 1

Genre A

Genre B

Genre C

Genre D

Language 2

Genre A

Genre B

Genre C

Genre D

Language 3

Genre A

Genre B

Genre C

Genre D

Page 9: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

10 >

Bidirectional translation corpus (ENPC model)

Combination of translation and comparable corpus

The original texts are comparable (genre, number of words)

The translations go in both directions – a truly parallel corpus

Page 10: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

11 >

The English-Norwegian Parallel Corpus (ENPC) – Some facts

Started as a research project at the University of Oslo in 1994 and completed in 1997. Prof. Stig Johansson initiated and directed the project.

Original texts with authentic translations (English-Norwegian and Norwegian-English); Fictional and non-fictional texts.

Compiled for use in applied and theoretical linguistic research

Development of software for alignment of the texts (Knut Hofland, UiB) and for searching the corpus (Jarle Ebeling, UiO)

Sister projects: The English-Swedish Parallel Corpus (Lund/Göteborg), English-Finnish Parallel Corpus (Jyväskylä/Savonlinna/Tampere) – same principle of compilation; to some extent also shared texts.

Later developments: The French-Norwegian Parallel Corpus; the German-Norwegian Parallel Corpus.

Other corpora built on the ENPC model in Germany (Chemnitz), France/Belgium (Poitiers/Louvain-la-Neuve: the PLECI corpus), Spain (University of Léon)].

Page 11: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

12 >

Important features of the ENPC

The originals and the translations are aligned at sentence level (”s-unit”).

– Thus, searches in one language will return hits with the linked-up sentences in the other language.

A browser for searching a bilingual corpus was developed alongside the corpus.

Searches can be made in both originals and translations.

Searches are made in fiction and non-fiction separately.

– Thus, findings on the basis of translated language can always be checked against originals within the same genre.

Page 12: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

13 >

Searching the parallel corpus

Page 13: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

14 >

Output of the search

Page 14: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

15 >

From ENPC to OMC under the SPRIK umbrella (SPRåk I Kontrast)

New languages were added, first (mainly) German, then French

Focus on English – Norwegian – German in the first phase of the SPRIK-project: original texts in each language with translations into the other two.

Same principles for text selection, text sampling and preparation as for the ENPC (exception: even more biased towards fiction because of the lack of translated non-fiction). Same (or later versions of same) software for alignment, searching etc.

Expanded search facilities and research possibilities:

– Three-way comparison of translations and originals

– Possibilities of investigating two different translations of the same text (translation strategies, translationese)

Latest development at UiO (not part of OMC): Russian-Norwegian (RuN)

Page 15: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

16 >

Trilingual parallel corpus model

Page 16: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

17 >

Searching in the OMC (En-Ge-No): dessuten in Norwegian originals & translations, im übrigen in German originals

Dessuten skulle jeg slukke hvert øyeblikk... (TB1)

Außerdem wollte ich gerade das Licht ausmachen... (TB1TD)

Besides, I was just about to put out the light anyway.... (TB1TE)

And anyway, Mathilda had been taking them for years, they were commonly prescribed once. (MW1)

Og dessuten hadde Mathilda tatt dem i årevis, det var i sin tid vanlig å foreskrive dem. (MW1TN)

Im übrigen nahm Mathilda sie seit Jahren, sie wurden früher allgemein verschrieben. (MW1TD)

Im übrigen gerät sie nach ihrem Vater. (ERH1)

She took after her father, at any rate. (ERH1TE)

For øvrig lignet hun på sin far. (ERH1TN)

Page 17: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

18 >

Translation corpus with four languages: No-En-Fr-Ge

Norwegian originals

English translation

French translation

German translation

Page 18: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

19 >

Examples of search output

Dessuten hadde hun fått høvelig opplæring i historie, både den nye og den gamle tid. (HW2)

Außerdem habe sie eine gute Unterweisung in Geschichte, sowohl in der alten wie auch in der neuen, bekommen. (HW2TD)

It stated that Dina had received suitable instruction in both modern and ancient history. (HW2TE)

En plus, elle avait reçu un enseignement convenable en histoire, ancienne comme moderne. (HW2TF)

En contrepartie, elle eut plusieurs heures en plus devant elle. (HW2TF)

Til gjengjeld fikk hun flere timer på seg. (HW2)

Dafür hatte sie ein paar Stunden für sich. (HW2TD)

On the other hand, she got several extra hours to do the work. (HW2TE)

Page 19: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

20 >

Methodology: Classifying correspondences

congruent

expressed

divergent

Correspondence

zero

Same realisation type

Different realisation type

Example: French correspondences of however in a small En-Fr translation corpus:

However, this is likely to be a gross underestimate. (WHO1) Toutefois, leur nombre pourrait être largement sous-estimé.

However, this essential function of social integration is today under threat ...(LB1)

Mais, aujourd'hui, cette fonction essentielle est menacée ...

However, because of the cultural and social setting, … (WHO1) En raison du contexte culturel et social …

Page 20: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

21 >

What can multilingual corpora contribute?They give insights into the languages compared – insights that are likely to be unnoticed in studies of monolingual corpora.

They can be used for a range of comparative purposes and increase our understanding of language-specific, typological and cultural differences, as well as of universal features.

They illuminate differences between source texts and translations, and between native and non-native texts.

They can be used for a number of practical applications, e.g. in lexicography, language teaching, and translation.

(Aijmer & Altenberg 1996: 12)

Page 21: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

22 >

Filipović’s arguments for using corpora in contrastive analysis

a) a valid contrastive project cannot be considered complete before its results have been verified and completed with the help of some representative corpus;

b) only a corpus can verify certain cases of doubtful grammaticality;

c) frequency and distribution can be established only on the basis of a corpus;

d) without a corpus we could not analyse the stylistic value, i.e. stylistic levels and registers, of certain forms;

e) the corpus is necessary for the component of ‘use’, (1984: 114)

Page 22: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

23 >

Methodology in contrastive analysis

A CA presupposes a tertium comparationis, i.e. a measure by which we can be fairly certain we are comparing like with like.

The items to be compared across languages are selected on the basis of perceived similarity (Chesterman 1998), such as translation equivalence, semantic/etymological similarity, grammatical or functional categories.

A frequently suggested tertium comparationis is translation equivalence; which implies that the items in the two languages convey (more or less) the same meaning.

Page 23: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

24 >

Chesterman’s suggestion (1998: 60)

1. Collecting primary data against which hypotheses are to be tested. Primary data involve all instances of language use, utterances that speakers of the languages in question produce.

2. Establishing comparability criterion based on a perceived similarity of any kind.

3. Defining the nature of similarity and formulating the initial hypothesis.

4. Hypothesis testing: determining the conditions under which the initial hypothesis can be accepted or rejected.

5. Formulating the revised hypothesis.

6. Testing of the revised hypothesis, and so on.

Page 24: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

25 >

Some benefits of a bidirectional translation corpus such as the ENPC

Comparable original and translated texts in both languages Control for translation bias

In-built tertium comparationis through translation equivalence and text comparability

“…with the help of a corpus we get unprecedented opportunities to study and contrast languages in use, including frequency distributions and stylistic preferences. Corpora are absolutely essential for macrolinguistic studies, but they will also enrich studies of lexical and grammatical patterns.” (Johansson 2000)

And an important drawback: such corpora will always be limited

– In size, because of the work involved (copyrights, processing…)– In coverage, because not all types of texts are translated.

Page 25: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

26 >

Limitations(As with corpus linguistics in general:) you can only search for something that is explicit in the text

The size of the corpus restricts studies of less frequent lexis / constructions

The corpus has not been parsed (syntactically annotated), i.e. it is not possible to search for grammatical constructions, patterns of word order etc.

Faulty and less successful translations

Tagging errors

Page 26: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

27 >

How can we retrieve the relevant constructions from the corpus?

The answer depends on the research question and on the corpus, i.e. whether the corpus has annotation (information about PoS, morphology, etc.)

– Research questions that can take lexical words as a starting point: relatively easy; all you need to remember is to search for all possible forms of a word.

– Research questions that focus on a grammar feature (e.g. progressive aspect) are trickier:

– Can be solved by identifying one or more lexical starting points

– Can be carried out if the corpus is tagged.

Page 27: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

28 >

How do we know what words/constructions to compare?

Not as trivial as it sounds

– For a suggestion of a method for discovering semantic relations across languages on the basis of a bidirectional translation corpus, see DYVIK, Helge: Translations as Semantic Mirrors: From Parallel Corpus to Wordnet. (2004)

– Basically, the method involves taking advantage of the bidirectional translation corpus, starting from a word in one of the languages, finding its translations and in the next step see how these are translated in the other direction.

Example: The Norwegian connector dessuten and its English and French correspondences.

Page 28: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

29 >

DESSUTEN in Norwegian originals – a paradigm of correspondences

English translation ENPC fiction (N=132)

moreoverbesidesevenwhat BE morealsoandin additionas welltoonorØothers (only once)

French translation: FNPC (N=35)

en outred’ailleursausside plusen plusde mêmepar ailleurset puis

… so which ones of these should be selected for further contrastive study?

Page 29: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

30 >

DESSUTEN – frequencies of correspondences

ENPC fiction (N=132)

moreover 7besides 27 (20.5%)even 2what be more 8also 14and 1in addition 7as well 2too 2nor 2Ø 19others (only once) 11

FNPC (N=35)

en outre 8 (22.9%)d’ailleurs 2aussi 4de plus 6en plus 1de même 1par ailleurs 1et puis 1Ø 11 (31.4%)

Page 30: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

31 >

Mutual correspondence (MC)(Altenberg 1999)

The frequency with which different (grammatical, semantic and lexical) expressions are translated into each other.

Calculated and expressed as a percentage by means of the formula

(At + Bt) x 100

As + Bs

The MC of dessuten and besides in the ENPC (fiction) is thus

(15 + 27) x 100 / (107 + 24) = 32.1

Page 31: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

32 >

Using the ENPC/OMC for researchParticularly well suited for studies of lexis / lexico-grammar (or phenomena that can take lexis as their starting point)

A broad range of phenomena have been (are being) investigated, e.g. the use of individual verbs (bli, få, take, give, see), modality, particular syntactic constructions, connectives, sentence openings and other discourse phenomena. (For a good overview up to 2007, see Johansson 2007).

The methodology is not tied to any particular theoretical approach

“The material from the corpus serves first to verify the conclusions based on the theory and second, to provide a means to collect data in areas where the theory is inadequate.” (Filipović 1984: 113)

Page 32: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

33 >

Lexicogrammar

Paradigms of correspondence highlight the fuzzy borderlines between lexis and grammar and grammar and discourse.

Example: A modal verb will have a wide range of correspondences

Norwegian kan

Mette seier han kan gå igjen, kaféen er alt stengd. (EH1)Metta tells him to go away; the cafe is already closed.

Modal aux: can, could, may, might, ‘ll, will, would, shouldOther verbs: know, enable, have, have to, had betterAdjectives: possible, able capable. Adverb: perhapsSuffix: -able (Løken 2007)

Page 33: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

34 >

Formulating research questions in corpus-based CAThe research question needs to be one that can be investigated with the material (and the annotation) in the corpus.

We may need to be prepared to either discard or reformulate the original research question following initial corpus searches.

– Example: contrastive study of the Norwegian verb gjøre and English make. The translation correspondences suggested that they were not each other’s main correspondences.

Important to know the corpus in order to identify fruitful research questions:

– ENPC translations carried out by professional translators probably not good material for studying errors (e.g. wrong use of “false friends”)

– Limited size and text types in the corpus; limits the types of claims that can be made.

Page 34: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

35 >

Case study 1: lexis

RQ: To what extent can the following words be characterized as false friends?

– Norwegian eventuell(e)/eventuelt

– English eventual/eventually

– French éventuel(le)/éventuellement

Material: OMC (No-En-Fr) + FNPC + ENPC

Hypothesis: Norwegian and French are good friends, but English eventual(ly) has a different meaning.

For more studies of false friends, see Languages in Contrast 10:2 (2010), Special issue: Pragmatic markers and pragmaticalization: Lessons from false friends.

Page 35: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

36 >

Basis for hypothesis: a dictionary of false friendsÉventuel (F) vs Eventual (E)Éventuel (F) means possible: le résultat éventuel - the possible outcome.Eventual (E) describes something that will happen at some unspecified point in the future; it can be translated by a relative clause like qui s'ensuit or qui a résulté or by an adverb like finalement.

Éventuellement (F) vs Eventually (E)Éventuellement (F) means possibly, if need be, or even: Vous pouvez éventuellement prendre ma voiture - You can even take my car / You can take my car if need be.Eventually (E) indicates that an action will occur at a later time; it can be translated by finalement, à la longue, or tôt ou tard : I will eventually do it - Je le ferai finalement / tôt ou tard. (http://french.about.com/od/vocabulary/a/fauxamis-e_2.htm)

Page 36: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

37 >

Some corpus examples

…for å hindre å vekke oppmerksomhet til eventuelle naboer (KF1)

…in order not to alert the neighbors

…pour ne pas éveiller l'attention d’éventuels voisins

…før vi eventuelt innledet nye forhold. (JG3)

…before embarking on any new relationships.

…avant de nous engager éventuellement dans une autre relation.

Page 37: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

38 >

Renderings of eventuell/ eventuelt (N=11 in No-En-Fr)English

Adj: possible: 1Ø: 2

Adv: no congruent rendering.

All cases of eventuelt are either rephrased or omitted in the translation.

Traces of the modal meaning of eventuelt in use of modal verbs (might, would), two cases of any, and one or, where eventuelt is used almost as a conjunction)

French

Adj: éventuel(le/s): 2 Ø: 1

Adv: éventuellement: 1plutôt: 1Ø: 6

Traces of the modal meaning of eventuelt in the choice of verb forms il aurait été, il faudrait savoir…

(one case of éventuellement in Fr. translation does not come from N eventuelt)

Page 38: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

39 >

Éventuel(le) / éventuellement in French originals (FNPC)

Only 3 examples found in fiction and non-fiction; 2 adjectives, 1 adverb.

1. En réalité, c'est moins la satisfaction d'un besoin réel qui peut faire la beauté d'une chose utile, que la satisfaction possible d'un besoin éventuel. (JLA1)

2. I virkeligheten er det mindre tilfredsstillelsen av et reelt behov som ser skjønnheten i en nyttig ting, enn den mulige tilfredsstillelse av et eventuelt behov.

3. …ce qu'ils auraient éventuellement à nous reprocher, …(CC1)

4. …det de måtte ha å bebreide oss,… (Lit: ’what they might have to reproach us’)

Page 39: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

40 >

Any conclusions?1. The French and Norwegian meanings seem to be closely

similar, with no overlap with the meaning of English eventual/eventually.

2. The Norwegian and French adjectives EVENTUEL* seem to correspond closely to each other in meaning and use.

3. The Norwegian and French adverbs eventuelt/éventuellement seem to differ in frequency, with the Norwegian word being more frequent. Difference of style level??

4. The Norwegian and French adverbs, in spite of similar meanings, thus seem to have different distributional patterns.

5. (Possible translation effect: eventuelt/eventuellement may be perceived as redundant and therefore omitted by the translator.)

6. More material is needed! E.g. from comparable corpora. (Monolingual corpora seem to confirm the distributional difference: French newspapers had c. 21 éventuellement per millon words, a Norwegian corpus had c.154 eventuelt pmw.)

Page 40: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

41 >

Case study 2: grammarPresentative constructions in English and Norwegian (Ebeling 1999), see also Ebeling (1998).

1. Common basic formula:

Dummy subject

Presentative verb ’existent’ Place adjunct

There was a client at the counter

Det var en kunde ved skranken.Alternative expressions:

2. No dummy subject: A client was at the counter; En kunde var ved skranken.

3. Verb other than BE: Det stod en kunde ved skranken. ?There stood a client at the counter. ?A client stood at the counter. En kunde stod ved skranken.

4. No adjunct: There’s been a robbery / Det har vært et bankran.

5. With definite NP: There was the radio in the kitchen / Det var radioen på kjøkkenet.(AT1)

Page 41: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

42 >

Some of Ebeling’s findingsPresentative constructions with det/there are much more frequent in Norwegian than in English.

While the English there-construction is virtually restricted to the verb be, the Norwegian det-construction can include a number of intransitive verbs, e.g. finnes (‘exist’), bli (‘become’), bo (‘live’), gå (‘go’), komme (‘come’) and verbs of posture.

Lexical verbs in the passive, e.g. ... det var blitt begått et mord like i nærheten. (FC1). [‘there had been committed a murder just nearby’]

In translating a Norwegian det-clause with a verb other than være or finnes, constructions without there are often chosen. (Presentatives with posture verbs are often rendered by there + BE.)

Presentative constructions with fronted adjunct and without there (Behind the shed was a bicycle) occur almost exclusively in written, often literary, texts, and they introduce something perceivable or concrete.

Page 42: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

43 >

Case study 3: discourse

RQ: What are the correspondences of the Norwegian connective dessuten in English and French? In what contexts are they used? Can we create a semantic map of this type of discourse relation?

Material: ENPC + FNPC

Search procedure: starting from Norwegian dessuten to map out correspondences. Choosing the top four correspondences (except Ø) and investigate correspondences in the other direction.

Page 43: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

44 >

DESSUTEN – frequencies of correspondences

ENPC fiction (N=132)

besides 27 (20.5%)also 14 (10.6%)what be more 8 (6.1%)moreover 7 (5.3%)in addition 7 (5.3%)even 2as well 2too 2nor 2and 1Ø 19others (only once) 11

FNPC (N=35)

en outre 8 (22.9%)de plus 6 (17.1%)aussi 4 (11.4%)d’ailleurs 2 (5.7%)en plus 1de même 1par ailleurs 1et puis 1Ø 11 (31.4%)

Page 44: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

45 >

Besides, also, what’s more, in addition and moreover in Eng. orig. fiction

•Besides (adv.): 19 hits. dessuten (15), forresten (3), også (1), Ø (1)

•Also: 186 hits. også (129), Ø (26), og (9), dessuten (6), heller ikke (5), så (3), og så (3), other (5)

•What’s more: 6 hits. attpåtil (2), dessuten (2), Ø (1)

•In addition: 3 hits. Correspondences: i tillegg (2), Ø (1)

•Moreover: 1 hit til og med

0 % 20 % 40 % 60 % 80 % 100 %

orig

tr from dessutenbesides

also

what's more

in addition

moreover

Page 45: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

46 >

En outre, de plus, aussi and d’ailleurs in French original fiction & non-fiction

En outre (8 hits): dessuten (4), i tillegg (1), også (1), Ø (2)

Aussi (134 hits): også (93), dessuten (5), og (2), i tillegg (2), Ø (25), other (7: ikke minst, selv, ikke-heller, både-og, samt, likevel, dertil) [only fiction; only connector]

De plus (8 hits): dessuten (2), i tillegg (3), enn videre (1), også (1), Ø (1)

D’ailleurs (51 hits): for øvrig (16), forresten (11), dessuten (9), other (3) [egentlig, faktisk, heller ikke]

0 % 20 % 40 % 60 % 80 % 100 %

orig

tr from dessuten en outre

aussi

de plus

d'ailleurs

Page 46: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

47 >

Some observationsDessuten is is more frequent than any of its correspondences execpt also/aussi. Stylistically neutral; some of the English correspondences are not. French??

Both also and aussi have high percentages of Ø correspondences apparently often perceived by translators to be redundant.

Besides, moreover, and in addition are all more frequent in translations of dessuten than they are in original English fiction. A translation effect?

De plus and en outre have about the same frequencies as translations of dessuten as they have in French originals.

En outre in translation also has other sources, thus is slightly more frequent in translation from Norwegian than in original French.

The correspondences of d’ailleurs suggest that this connector signals the addition of something slightly more peripheral (more like incidentally)

Translations into and/et and also/aussi is a kind of ‘normalization’, the choice of a more general term in the target language. For English, a wish to avoid formal connectives in fiction?

Page 47: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

48 >

aussi en outre de plus d’ailleurs

also in addition moreover besides what’s more

A map of additive relations emerging from dessuten

dessuten

også heller ikke dessuten i tillegg attpåtil forresten

også dessuten i tillegg for øvrig forresten

Page 48: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

49 >

ConclusionsThe recurrent correspondences, except aussi/also/også mark an additive relation that is explicitly ”on top of”; some of them even incidental.

The most ”literal” expressions of this meaning are probably in addition to, i tillegg til, de plus.

The strongest mutual correspondences of dessuten are with besides for English (32.1) and en outre for French (33.3). Both MCs are asymmetrical; en outre and besides are translated into dessuten more often than the other way round.

The most general expressions of addition (aussi/also) get high percentages of Ø correspondences (19% for aussi; 14% for also)

A hypothesis about discourse relations that needs to be tested further: Of the three langauges, French seems to mark the additive relation most frequently.

Page 49: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

50 >

Summing up

Parallel corpora enhance contrastive studies in a number of ways

– by ensuring that observations are based on authentic language use

– by yielding paradigms of correspondences– thus often revealing meanings and nuances we might not have thought

of– and showing how the same meaning may be expressed by means of

different linguistic categories

– by providing quantitative data

– … thus also giving insights into ‘preferred ways of putting things’

– (if the corpus is bidirectional) by providing control for translation bias and by allowing for ‘reverse’ investigations

– (if the corpus is representative) by controlling for the idiosyncrasies of individual authors/translators

Page 50: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

51 >

Why undertake corpus-based contrastive investigations?

The importance of multilingual corpora extends beyond contrastive studies. It is up to the user to define fruitful research questions and use the corpora creatively. In this process we learn not only about individual languages and their relationships, about translation and foreign-language acquisition, but also about language in general – provided that the study becomes truly multilingual. Seeing through corpora we can see through language.

Stig Johansson (2007: 316)

Page 51: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

52 >

Information on the ENPC / OMC

About the corpora:

ENPC: http://www.hf.uio.no/ilos/english/services/omc/enpc/

www.helsinki.fi/varieng/CoRD/corpora/ENPC/

OMC: www.hf.uio.no/ilos/english/services/omc/

About publications based on the OMC (up to 2006):

www.hf.uio.no/ilos/forskning/prosjekter/sprik/english/publications/

Very small, freely available translation corpus: http://khnt.hit.uib.no/webtce.htm

Page 52: Using corpora in contrastive studies Hilde Hasselgård University of Oslo

53 >

ReferencesAijmer, K. & B. Altenberg. 1996. Introduction. In K. Aijmer, B. Altenberg, M. Johansson (eds.) Languages in Contrast. Lund University Press, 11-16.Altenberg, B. 1999. Adverbial connectors in English and Swedish: Semantic and lexical correspondences. In Hasselgård & Oksefjell (eds.) Out of Corpora. Amsterdam: Rodopi, 249-268.Chesterman, A. 1998 Contrastive Functional Analysis. Amsterdam/Philadelphia: John Benjamins Publishing Company.Dyvik, Helge. 2004. Translations as Semantic Mirrors: From Parallel Corpus to Wordnet. In Aijmer, K. and B. Altenberg (eds) Advances in Corpus Linguistics. Amsterdam/New York: Rodopi, 311-326. (/www.ingentaconnect.com/content/rodopi/lang/2004/00000049/00000001/art00018)Ebeling, J. 1998. Contrastive Linguistics, Translation, and Parallel Corpora. Meta 43:4, 602-615. http://www.erudit.org/revue/META/1998/v43/n4/002692ar.pdfEbeling, J. 1999. Presentative Constructions in English and Norwegian : A corpus-based contrastive study. Acta Humaniora 68. Oslo: Unipub forlag. Filipović, R. 1984. What are the primary data for contrastive analysis? In Fisiak J. (ed.), Contrastive linguistics. Prospects and Problems. Berlin/New York/ Amsterdam: Mouton Publishers, 107-118.James, C. 1980. Contrastive Analysis. London: Longman.Johansson, S. 2000. Contrastive Linguistics and Corpora. University of Oslo, SPRIK reports 3: www.hf.uio.no/ilos/forskning/prosjekter/sprik/docs/pdf/sj/johansson2.pdf Johansson, S. 2007. Seeing through multilingual corpora. Amsterdam: Benjamins.