27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

Embed Size (px)

Citation preview

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    1/32

    Ulrich

    Heid

    Institutfr

    maschinelleSprachverarbeitung,Universitt

    Stuttgart

    On

    Ways

    WordsWork

    Together

    -

    Topics

    in

    Lexical

    Combinatorics

    1.

    Introduction

    1.1

    Broad

    areas

    of

    combinatory

    phenomena

    The

    domain

    of

    lexical

    combinatorics

    has

    received

    much

    interest

    over

    the

    last

    ears,

    n

    yntax,

    exical

    emantics

    nd

    exicology,

    ut

    lso

    n

    lexicography,erminology,

    erminography

    nd

    naturalanguage

    Processing

    NLP).

    f

    he

    ield

    ofcombinatorics

    can

    maybe

    rivially

    be

    defined

    by

    the

    fact

    that

    it

    deals

    with

    syntagmaticcombination

    phenomena

    involvingwoormoreexemes,

    t

    s

    much

    harderocomeup

    with

    ny

    reasonable

    internal

    subdivision

    of

    the

    field.

    Phenomenawhichareusuallydescribedsbelongingohedomain

    of

    combinatorics

    include,

    among

    others:

    electional

    properties

    of lexical

    items:

    for

    example,

    the

    English

    verb

    to

    growas ,

    roadly

    speaking,

    worenchequivalents,

    pousser

    and

    grandir.Andmostdictionarieswouldstatethatpousseris

    preferred

    if

    thesubjectnoun

    denotes

    a

    plant,grandirifitdenotes

    a

    human

    being

    1

    The

    classical

    exampleofGerman

    essen

    ^ >

    fressen,

    for

    English

    toea t

    (depending

    onthe

    distinction

    betweenhumanbeingandanimal)

    is

    anotherinstanceofthis

    phenomenon.

    ollocations:

    ccording

    o

    any

    inguistsndexicographers,

    2

    collocationsarecombinationsofexactlywo

    exemes

    of

    category

    noun,

    verb,

    adjective

    or

    adverb),

    ealizing

    two

    concepts,

    where

    the

    choice

    ofoneofthemdepends

    on

    (or:

    is

    restricted

    by)the

    other.

    Typical

    examples

    which

    are

    often

    cited

    are

    FR

    un

    clibataire

    endurci,

    &i

    eingefleischterJunggeselle,

    EN

    pay

    attention,

    FR

    pousser

    un

    cri,

    etc.

    Usually,

    some

    sort

    ofdetermination

    relation

    between

    the

    twoitems

    can

    be

    found.

    3

    OtherlexicographersandNLP

    researchers

    haveawidernotionof

    collocation

    whichsubsumes

    any

    kind

    of

    combination oftwo

    words,

    as

    itoccurs(adjacently)in

    a

    text.

    Suchawider viewisnotuncommonin

    work

    on

    statistical

    tools

    (where

    e.g.

    also

    the

    combination

    with

    closed

    classitemsmayberegardedas

    a

    collocation,and

    where

    frequency,e.g.

    of

    co-occurrence,

    is

    the

    maindefinitioncriterion).

    Much

    f

    he

    iscussion

    n

    his

    onference

    ill

    eevoted

    o

    collocations;

    this

    is

    one

    of

    the

    reasons

    whywehavechosento

    discuss

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    2/32

    The

    way

    words

    work

    together

    /

    combinatorics

    27

    collocationsn

    some

    more

    detail,

    n

    his

    paper,

    aking

    hem

    s

    paradigmaticexampleof

    some

    of

    theresearchtopicsinthe

    linguistic

    and

    lexicographicdescriptionof

    combinatory

    phenomena.

    dioms:

    he

    ommon

    iew

    n

    dioms

    s

    hat

    hey

    re

    ultiword

    expressions

    (more

    than

    twoitems)

    which

    have

    anen

    bloc-meaning

    opaque

    with

    respect

    to

    the

    usual

    meaning

    ofthewordsmaking

    up

    the

    combination.In

    examples

    like

    DE

    das

    Kind

    mit

    dem

    Badeausschtten,

    we

    do

    not

    sayanythingabouta

    child

    orabath,somebody

    who

    FR

    a(voir)

    unearaigneau

    plafond

    may

    alsohaveother

    trouble

    than

    just

    withaspider.

    Assoon

    as we

    look

    at

    data

    fromtextcorpora,

    cases

    comeup

    whereitisnot

    easy

    to

    determine

    clearly

    whether

    to

    treat

    a

    given

    item

    as

    idiomatic

    or

    as

    collocational:

    DEeine

    Frage

    stellenis

    usually

    classified

    anddescribed

    as

    a

    collocationa

    upport

    erb

    onstruction

    lmost

    ynonymous

    ith

    E

    fragen),

    whereas

    DEin

    Fragestellenis

    lessclear:should

    it

    betreatedas an

    idiomroughlyequivalenttoDEanzweifelnorasacollocation?

    1.2 Structureofthispaper

    Thepurposeof

    this

    paper

    is

    togiveanoverviewofsomeresearchtopics

    in

    the

    field

    of

    lexicalcombinatorics.

    This

    includes

    apresentation

    ofthe

    main

    approaches,

    methods

    and

    strands

    of

    research,

    as

    of

    open

    issues

    and

    lines

    to

    be

    followed,

    in

    paricular

    those

    discussed

    at

    the

    Euralex-94conference.Such

    anoverview

    is

    boundtobepartial,inboth

    sensesoftheword:

    it

    is

    impossible

    to

    select

    all

    and

    only

    the)relevanttopics,andtheselection

    is

    ofcourse

    biased

    towards

    the

    preferencesofthe

    author.

    Nevertheless,selectingcollocations

    as

    a

    prototypical

    phenomenon

    seems

    tomakesenserom moregeneralpointof

    view

    swell:collocational

    phenomenareentraloexicographers,orpusinguistsnd

    terminologists;evidence:

    the

    sheer

    numberofpapers

    on

    this

    topic

    submitted

    to

    theEuralex-94

    conference.

    Moreover,

    the description

    and

    lexicographic

    modeling

    and

    representation

    of

    collocations

    isnot

    at

    allan

    easy

    task:a

    few

    propertiesofcollocations

    are

    well-known

    and

    easily

    eproducible,

    but

    othersare

    controversial

    ornot

    easy

    to

    consistentlyverify

    on

    data.

    The

    problems

    which

    need

    to

    be

    addressed

    and

    which

    will

    beto

    some

    extent

    discussedin

    this

    paperfall

    into the

    following

    areas:

    efining

    the

    notion

    of

    collocation,delimiting

    it

    with

    respect

    toother

    combinatory

    henomena

    nd

    dentifying

    riteria

    llowing

    o

    operationalize

    tosome

    extent

    the

    definitions;

    escribingsyntactic,semantic

    and

    pragmatic

    properties

    ofcollocations

    and

    other

    combinatory phenomena,

    both

    within

    descriptive

    linguistic

    and

    lexicographicwork(thelatterincludinginaddition

    to

    linguistic

    descriptionalso

    issues

    ofthepresentation

    of

    the

    descriptiveresults);

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    3/32

    228

    uralex

    1994

    etting,

    y

    eans

    f

    computational

    ools

    or

    exical

    cquisition,

    material

    potentiallyrelevant

    forcollocationaldescription:

    techniques,

    methodsandtools

    for

    extractingcollocationcandidates

    from

    texts;

    epresenting

    nd

    sing

    ollocational

    nformation,

    or

    xample

    n

    translation,

    bothhumanand computer-aidedor

    automatic.

    These

    opics

    span

    a

    ange

    ofactivities

    ofcomputational)

    inguists

    and

    (computational)

    lexicographersand

    terminologists:

    definition,description

    andexicographicresentationf

    ollocations,

    s

    ell

    sheir

    (semi-)automaticcquisitionndse

    n

    uman

    nd

    omputational

    applications oflexical

    knowledge

    sources.

    W e

    havechosen

    to

    comment

    on

    these

    topics

    in

    thefollowing

    order:

    efinitional

    and

    descriptive

    problems,

    as

    treated

    inlinguisticworkon

    lexical

    combinatorics

    will be

    discussed,

    alongwith

    syntactic,

    semantic

    andpragmaticpropertiesofcollocations,

    in

    Section

    2.

    Thisallowsus to

    better

    capture

    the

    phenomenonwedeal

    with,

    fromdifferentpointsof

    views.

    nthisbasis,

    we

    willdeal

    withthelexicographicandterminographic

    treatmentofcollocations,

    ncluding

    aspectsofhepresentationof

    descriptiveresults

    indictionaries,

    inSection

    3.

    he

    acquisitionof

    collocationally

    relevantinformationfromtextual

    corpora,

    s

    well

    s

    he

    use

    of

    collocation

    knowledge

    n

    ranslation

    dictionaries

    will

    be

    the

    topic

    ofSection4.

    W e

    will

    illustratesomeofthestatements

    made

    in

    this

    paperwithexamples

    fromdictionaries.

    The

    aim

    ofthispaperis

    not

    to

    support

    onegivenapproach

    or

    to

    argue

    foragiven

    methodortool

    for

    theacquisitionordescription

    of

    collocations:

    the

    exampleshavebeen chosenfortheir

    illustrative

    character,

    andanattempt

    has

    been

    made

    to

    cover

    several

    approaches.

    2.Propertiesof

    combinatoryphenomena

    -

    th e

    case

    of

    collocations

    2.1

    Data

    and

    afirstinterpretation

    Thentuitionaboutcollocationsshatheyarecombinationsof

    two

    lexemes,

    not

    necessarily

    extuallyadjacentones.

    To

    hese

    wo

    exemes

    correspondwoconcepts.

    ncertaincollocations,wecan

    ind

    egular

    semanticinterrelationship

    between

    thetwocomponents

    whichiscloseto

    a

    determination

    relation(collocations

    arepolar

    in

    Hausmann's

    terms).

    Anessential

    property

    ofcollocationsseems

    to

    beheirperception

    by

    native

    peakers

    f

    a

    anguage

    s

    requent,

    ecurrent,

    onventionalized

    building

    locks

    fheexicon:

    dj-vu ,

    s

    ausmann

    ays.

    he

    combinationof

    exactly

    the

    twoitemsappearing

    in

    thecollocation

    is

    lexically

    determined;

    itis

    often

    notpredictable; but nativespeakersarequitegoodat

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    4/32

    Theway

    words

    worktogether

    /

    combinatorics

    229

    identifying

    non-collocationalcombinations

    in

    other

    people's

    texts,

    and

    they

    feel

    thatnon-collocational

    texts

    arenot

    fluent,

    not

    elegant

    or

    just

    not

    the

    usual

    way how

    one

    would

    express

    agivenidea.

    Collocations

    occur

    in

    both

    general

    language

    and

    sublanguage.

    The

    table

    in

    Figure1

    contains

    a

    few

    examples

    from

    English,

    French

    and

    German.The

    sublanguageexamples

    may

    be

    feltto

    be

    differentinnaturefrom

    those

    given

    for

    general

    language:wewillcome

    back

    tothislater(see

    Section

    2.3.3).

    language general

    language

    sublanguage

    English pay

    attention,

    want

    sth.

    adly

    meritedpraise

    closely

    related

    stophe

    onveyor

    overlayingock

    expensive

    nabour

    French

    oprer

    tinchoix

    une

    dception

    amre

    perdument

    amoureux

    crer

    un

    fichier

    lution

    gredue

    ressources

    renouvelables

    German eine

    ereinbarung

    treffen

    starker

    Raucher

    tiefbeeindruckt

    (jmdn)

    hart

    treffen

    eineForderungabtreten

    Abwassereinleiten

    anstehende

    Kohle

    Dateien

    bgleichen

    Figure1.Afewexamples

    of

    generalandsublanguagecollocations

    Anumber

    of

    criteria

    have

    been

    discussedin

    the

    literature

    to

    distinguish

    collocationsfromfreecombinationsontheonehandandfromidiomsonthe

    other,

    or,

    rather,

    toarrangeexamples

    of

    certain

    typessomewhere

    on

    the

    scalebetweenhesewoextremes.Thesecriteria

    nvolve

    hesyntactic,

    semanticand

    pragmatic

    descriptionoflexemes.

    2.2 Syntactic

    properties

    2.2.1Combinatory phenomenavs.phrasestructure

    Mostcombinatory

    phenomena

    followtherulesofsyntax;noparticular

    syntacticrules

    are

    necessary

    to

    describe

    combinatoryphenomena.

    But

    not

    all

    of

    them makeupconstituents.

    Selectional

    phenomenacan

    be

    observedbothwithinconstituentsand

    withinthesentence:theexamplesgiven

    above

    (

    grow ,

    eat )

    concern

    the

    interactionbetween

    a

    subject

    noun phraseand

    the

    main

    verbal

    predicate

    of

    the

    sentence.

    Similarly,

    we

    observe

    selection

    phenomena

    between

    verbs

    and

    their

    subcategorized

    complements,

    e.g.objects,

    prepositional

    objects,etc.,

    but

    also

    with

    adjuncts,

    or

    withinother

    constituents

    than

    VPs,forexamplein

    adjectivephrases(noun+

    attributive

    adjective).

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    5/32

    230

    Euralex

    1994

    Collocations

    can

    be

    classified,

    at

    eastoranguagesike

    English,

    he

    Germanic,

    Romance

    and

    Slaviclanguages,

    according

    to

    thecategoryoftheir

    elements,into

    noun-verb,

    noun-adjective,noun-noun

    collocations,as

    well

    as

    erb-adverb

    nd

    djective-adverb.

    oun-verb

    ollocations

    an

    e

    further

    subclassified

    according

    o

    he

    grammatical

    unction

    ofthe

    noun

    phraseontributing

    he

    oun

    art

    f

    he

    ollocation:

    ubject-verb-,

    verb-complement-,erb-adjunct-collocations.

    4

    ollowingausmann

    (1979),

    Hausmann(1985)and

    Hausmann

    (1989),we

    have

    classified

    afew

    examplesin

    theillustration

    inFigure2.

    NOUN

    +

    adjective confirmed

    bachelor

    eingefleischter

    Junggeselle

    clibataire

    ndurci

    NOUN+

    verb

    (Subj)

    his

    anger

    falls

    Zorn

    erraucht

    laolre apaise

    NOUN

    +

    verb

    (Obj)

    to

    withdraw

    money

    Geldabheben

    retirerde

    argent

    VERB+adverb

    it

    s

    aining

    heavily es

    regnet

    in

    Strmen

    ilpleut erse

    ADJ

    adverb

    seriouslyinjured

    schwer

    verletzt

    grivement

    bless

    VERB+adverb

    tofailmiserably klglich

    ersagen

    NOUN+

    noun

    agust

    of

    anger Wutanfall

    un ebouffe

    e

    olre

    Figure

    2.

    Types

    of

    collocationsin

    terms

    of

    the

    category

    of

    their

    compo-

    nents,following

    Hausmann

    (1989)

    This

    notion

    ofcollocationdoes

    not

    assume

    that

    all

    collocations

    makeup

    phrases:

    n+adj-collocations

    may

    do

    so,

    ifthe

    adjective

    is

    used

    attributively,

    asin

    EN

    heavyrain,ENunquenchablethirst,DEstarker

    Raucher,

    FR

    regrets

    amers,FRremords

    ardifs,

    etc

    owever,

    we

    till

    want

    o

    consider

    he

    combination

    of

    EN

    unquenchableandhirst

    ascollocational,whenhe

    adjective

    ssedredicatively

    His

    hirst...was...unquenchable.).

    his

    implies,

    among

    others,

    that

    computational

    toolswhich

    would

    just

    look

    for

    combinations

    of

    adjacent

    exemes,

    5

    wouldnot

    etrieve

    ll

    combinations

    which

    fall

    under

    thesyntactic

    definition

    givenabove.

    The

    noun

    which

    participates

    in

    ann+v

    collocationcan

    also

    be

    located

    in

    an

    adjunct

    (cf.

    DE

    es

    regnet

    in

    Strmen,

    etc.);

    such

    cases

    are

    difficult

    to

    treat

    in

    a

    strictly

    valency-based

    model

    or

    in

    a

    formal

    account

    which

    makes

    use

    of

    subcategorizationinformationonly.

    To

    ourknowledge,

    not

    much

    work

    has

    so

    far

    beendone

    on

    (lexically)typical

    adjuncts .

    As

    observed,combinatoryphenomena

    are

    oftenorthogonal

    with

    phrase

    structural

    or

    valency-based

    grammatical

    rules

    in

    the

    widestsense).

    This

    property

    is

    problematic

    forexample

    forlexicalchoice

    in

    naturallanguage

    generation;

    inearlyapproaches,theorderinwhichlexemeswere

    selected

    in

    aentenceo

    e

    enerated,

    asetermined

    by

    elationships

    between

    syntactic

    heads

    and

    modifiers,

    or

    nodes

    of

    a

    valency

    representation

    and

    their

    dependents

    (rule:

    lexical

    heads

    first ).

    This

    works

    outforverb-adverb-

    or

    adjective-adverb-collocations

    andforsome

    noun-adjective-collocations

    as

    well,

    butnotfornoun-verb-collocations(e.g.in

    the

    verb-object

    case:

    the

    object

    nounmust

    bedetermined

    first,only

    then

    a

    collocationally

    adequate

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    6/32

    Theway

    words

    work

    together

    /combinatorics

    3 1

    verb

    can

    be

    selected).

    Researchers

    in natural

    language

    generation

    werefirst

    to

    discuss

    problems

    ofcollocation:some

    of

    the

    work

    on

    lexical

    choice

    s

    aimedt

    ringing

    ollocationalnd

    yntactic

    constraints

    ogether

    nd

    controlling

    their

    interaction

    in anadequateway

    (cf.

    e.g.Nirenburg

    et

    al.

    1988,

    etc.).

    Andditionalroblem

    f

    he

    nteraction

    etween

    yntactic

    nd

    collocational

    descriptionisthe

    recursive

    nature

    of

    collocational

    properties:

    thecomponents

    of

    a

    collocation

    can

    again

    be

    collocational

    themselves:next

    toheGermancollocation

    ltigkeithabenn+v),

    we

    haveallgemeine

    Gltigkeit

    haben,

    with

    llgemeine

    Gltigkeit,

    ollocationn+a),s

    component.

    These

    cases

    have

    sometimes

    been

    analyzed

    as

    differentfrom

    collocations,

    but

    thereisnoreasonfor

    suchtreatment.However,a

    formal

    account,

    e.g.

    for

    machine

    translation,

    would

    have

    to

    be

    able

    to

    account

    for

    such

    cases.

    2.2.2

    Problems

    ofth e

    syntactic

    description

    ofcollocations-

    th e

    case

    of

    support

    verb

    constructions

    Syntacticianshave

    observed

    some

    irregularities

    in

    thesyntacticbehaviour

    ofollocations,n

    articular

    fupport

    erb

    onstructions

    Funktionsverbgefiige ,

    constructions

    verbesupport ):examples

    are

    FR

    avoir

    peur,

    avoirfaim,prendreun

    bain,

    poser

    une

    question,

    oprer

    un

    choix,

    ENbein ahabit,

    takeabath,pay

    attention,

    delivera

    speech,DEAngsthaben,

    ein

    Bad

    nehmen,eine

    Frage

    stellen,

    zur

    Anwendung

    kommen.

    Many

    of

    the

    syntactic

    operationspossiblewith verbphrasesarenot

    or

    only

    in

    part

    possiblewith

    support

    verb

    constructions;

    such

    operations

    (often

    used

    as tests)includepassivization,pronominalization,

    the

    possibilityof

    taking

    the

    nominal

    part

    up

    with

    an

    anaphoric

    pronoun,

    thepossibility

    of

    modifying

    the

    noun

    e.g.

    with

    adjectives,

    genitives,

    relativeclauses,

    etc.),

    the

    choice

    between

    different

    kinds

    of

    determiners,

    etc.

    The

    most

    frozen (or

    as

    Cruse

    (1986:41) says,

    bound )

    collocations

    are

    closeto

    typicalexamplesof

    idioms,

    insofar

    as

    no

    modifications

    are

    possible.

    Other

    support

    verbconstructions

    participate

    nsome,butnot

    ll

    of

    the

    processesmentioned

    bove:

    E

    ine

    Frage

    tellen

    canbemodified

    r

    pronominalized,whereasDE

    zur

    Ausfhrunggelangenoesnotallow

    pronominalizationor

    modification:

    Hanshateine

    kluge

    Frage

    gestellt,

    Josef

    hatieeantwortet;dasrogrammelangtuinerollstndigen

    Ausfhrung;

    das

    Programm

    gelangt

    zur

    Ausfhrung:

    *sie

    mu

    korrektsein.

    Apparently,pronominalizationor

    pronominalanaphoricreferenceand

    the

    possibility

    of

    modification

    of

    the

    predicative

    noun are

    somehowrelated.

    Similar

    data

    to

    those

    for

    German

    have

    been

    observed

    for

    Danish

    (cf.

    e.g.

    Dyhr1980),

    utch

    cf.

    inderdael

    980)

    nd

    French

    cf.

    ross

    986,

    Gross/Vives1986).Inmanyarticles

    about

    support

    verb

    constructions,

    just

    some

    suchfacts

    aredescribed

    (

    anecdotically ),and

    we

    arestillnotaware

    of

    a

    more

    comprehensive

    treatment

    oraformalaccount.

    It

    seemsthat

    the

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    7/32

    232 uralex

    1994

    twotypesof lexicalized

    and

    non-lexicalized support

    verbconstructions

    observedyelbig1984)

    oughly

    orrespondo

    ases

    here

    he

    predicative

    noun

    isstill

    available

    asareferent( non-lexicalized case)

    as

    opposed

    to

    referentially

    blocked

    cases

    (

    lexicalized ):

    ongoing

    work

    by

    Kuhn(1994)shows

    that

    the

    testusedtodistinguishthesetwotypesare

    all

    based

    on

    referential(un-)availability.

    Similarly,thesyntactic(e.g.valency)

    behaviour

    of

    lexical

    combinations

    (includingbothcollocations

    and

    dioms)

    has

    notbeendescribedinvery

    muchdetail

    soar

    n

    dictionaries.

    We

    only

    know

    ofprojects

    for

    foreign

    language

    learners'

    idiomdictionarieswhichaimatcomingupwithadetailed

    syntacticdescription.

    Nouncompoundingis

    often

    not

    looked

    at

    from

    thepoint

    of

    view

    of

    lexical

    combinatorics;

    but

    a

    collocational

    view

    is

    most

    relevant,

    e.g.

    for

    contrastive

    workn

    omance

    s.ermanic

    anguages.

    el'chuk's

    xamplesf

    expressionsfor

    groups

    ofanimals(ENflock

    of

    seagulls,

    pack

    of

    dogs,school

    offish,

    cf.Fontenelle

    (1994b)

    for

    more

    examples),

    but

    also

    technicalterms

    like

    those

    describedand

    analyzedby

    Seelbach

    (1994)

    (IT

    acque

    di

    rifiuto

    DE

    Abwsser,

    IT

    stazionedi

    depurazione

    -

    DE

    KlranlageIT

    perditadi

    sostanze

    liquide-DEFlssigkeitsverlust)

    are

    cases

    npoint.

    Knowledge

    about

    collocationally

    adequate

    combinations

    of

    nouns

    n

    compounds

    or

    nounphrases

    is

    most

    important

    fortranslation.

    Soler/Marti

    (1994)

    discuss

    this

    problem

    in

    detail,giving

    examples

    from

    Spanish-Englishtranslation.

    2.3

    Semant ic

    properties

    2.3.1

    Combinatory

    phenomena

    and

    compositionality

    Syntactic

    propertiesdo

    not

    seem

    to

    have

    muchdiscriminatorypower,

    as

    far

    scollocations

    and

    dioms,heirborderline

    and

    he

    borderline

    with

    normal

    constructions

    areconcerned.Fortestsandcriteriaofclassification,

    we

    thus

    have

    to

    relyon

    (lexical)

    semantics.

    A

    ew

    general,

    broad

    distinctions

    seem

    to

    be

    commonly

    accepted:

    he

    meaningof idioms

    is

    not

    derivable

    from

    the

    meaningof

    thelexemes,

    word

    forms

    which

    make

    upthe

    idiom:

    idioms

    arenon-compositional.

    On

    theother

    hand,

    what

    hasbeencalled freecombination

    by

    Hausmannandothers,i.e.

    the

    normal

    case ,is

    fully

    compositional:themeaningofEN tobuyabook

    is

    derivable

    bytheusual

    processes

    from

    the

    meanings

    of

    EN

    buy

    andEN

    book.

    Collocationsare

    anintermediate

    case

    between

    the

    two:the

    meaning

    of

    EN

    buy

    somebody's

    argumentisnot

    fully

    compositionallyderivable

    from

    the

    meanings

    of

    argumentandbuy.However,themeaningof

    argument

    is

    present

    in

    (and

    used

    in

    the

    meaning

    description

    of)

    buy

    sb's

    argument,

    it

    is

    only

    buywhich

    does

    not

    have,

    in

    thiscollocation,

    the

    meaningit

    has

    in

    buy

    a

    book.

    This

    partialcompositionalityofcollocations

    has

    ledHausmannto

    describecollocationsas

    polar

    combinations,

    consistingof

    abase

    (the

    item

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    8/32

    Theway

    words

    work

    together

    /

    combinatorics

    233

    which

    has

    its

    fulllexical

    meaning,in

    our

    exampleabove:argument)

    and

    a

    collocate

    6

    (withmodified

    or

    reduced meaning:buy).

    Mel'chuk,

    in

    a

    talkabout

    collocations

    atthe

    1990conferenceofEuralex,

    7

    has

    most

    clearly

    summarized

    the

    differences

    in

    compositionality

    between

    free,

    collocational

    andidiomatic

    combinations,

    and

    we

    schematizethesein

    Figure

    3 ,

    followingMel'chuk'spresentation.

    A

    B

    regular

    compositional

    -c

    M

    C

    ll

    c

    n

    A=

    c

    n

    B

    full

    id ioms

    non-coapositional

    A

    Dn

    A

    Dn

    collocations

    partially

    cojapoaitionll

    Figure

    3 .

    Types

    of

    lexical

    combinations

    interms

    ofcompositionality

    (following

    Mel'chuk)

    When

    constructing

    semantic

    epresentations,

    we

    can

    apply

    he

    usual

    procedures

    for

    compositional

    cases

    to

    free

    combinations;we

    can

    assign

    a

    single

    semantic

    epresentation

    o

    an

    dioms whole.Dobrovol'skij

    s

    examples

    DE

    denKopfhngen

    lassen

    or

    etw.

    ndie

    Wege

    leiten

    could

    be

    described

    as

    denoting resignation

    orstartup

    of

    an

    activity respectively,

    and

    we

    could,dependingonthegranularity

    of

    descriptionweaimat ,describe

    DEjemand

    ltden

    Kopfhngenin

    a

    similarwayas jemandist

    deprimiert

    or

    jemandistresigniert,

    or

    aswell

    jemand

    leitet

    etwasin

    die

    Wege

    similarly

    to

    jemand

    beginnt

    etwas,

    jemand

    leitetetwas

    ein.

    A

    ore

    difficult

    problem

    s

    he

    ollowing:

    ollocations

    ike

    DE

    ur

    Anwendung

    gelangen( tobeapplied )can,asitseems,

    be

    representedby

    the

    same

    device

    as

    the

    idioms

    above:

    at

    a

    certainlevelof

    specificity,

    we

    can

    considersuchcollocationsasquasi-synonymouswithverbs(DE

    anwenden,

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    9/32

    234 uralex

    1994

    inthis

    case)

    and

    use

    the

    same

    representation

    for

    DE

    angewendet

    werden

    and

    zur

    Anwendunggelangen,only

    with

    an

    aspectualdifference.

    Butwhat

    about

    he

    cases

    where

    he

    predicativenounseferentially

    available,

    as

    in

    the

    example

    discussed

    above,

    in

    Section

    2.2.2:

    DE

    Hans

    hat

    eine

    rage

    estellt.

    Max

    atsieeantwortet.

    nhisase,e

    eed

    representation

    hich

    ould

    reserve

    n

    ntecedent

    or

    he

    naphoric

    pronoun.f

    he

    emantic

    epresentation

    s

    ust

    he

    ame

    s

    hat

    f

    two-place

    verbal

    predicate

    ask :

    DE

    fragen,

    in

    the

    case

    ofeine

    Frage

    stellen),no

    hook

    to

    serve

    as an

    antecedent

    fortheanaphoric

    pronoun

    is

    available.Thesame

    way,

    no

    easonable

    semantic

    epresentation

    of

    the

    modifier

    in

    DE

    erhat eineklugeFrage

    gestellt

    would

    be

    possible

    then.

    If

    we

    treat

    the

    referentially

    available

    cases

    separately,

    doweneeddifferent

    representations

    for

    DE

    Verkauf

    in

    zum

    Verkauf

    stehen

    (no

    referent)

    and

    einen

    Verkauf

    ttigen

    (referent

    available)?

    Thisproblem

    comes

    up

    when

    one

    triestogiveaslightlymoreformalaccountofsupportverbconstructions,

    e.g.

    in

    ead

    riven

    hrasetructurerammar,

    PSG

    see

    .g.

    ork

    f

    Erbach/Krenn

    1993),

    or

    inany

    other

    framework

    usable

    in

    NLP.

    It

    also

    comes

    upin

    translation:

    Thurmair

    (1990)

    discusses

    cases

    like

    the

    translationofEN

    tolaunch(a

    product)

    byDE(ein

    Produkt)

    auf den

    Marktbringen;if we

    use

    a

    compact

    semantic

    representation

    or

    he

    collocation,

    .e.

    one

    which

    wouldbe

    similar

    oridenticalwith

    that

    ofthe verb,w e

    would

    beintrouble

    to

    translate

    backfromDE(einProdukt)aufdenberfllten

    Markt

    bringen

    into

    English.

    Other,moredetailedsemantic

    representations

    seem

    necessary.

    W e

    have

    o

    sk

    urselves,

    hen,

    owever,

    ow

    ar

    e

    hould

    o

    n

    decomposing the

    meaning

    ofcollocations,

    derivatives

    and

    of

    one-word

    lexemes .

    2.3.2Towards

    a

    semantic

    classification

    of

    collocations?

    Mel'chuk's

    lexical

    functions

    Theeaning= >

    ext-Model

    MTM),

    r:

    eaning-Text

    heory,

    developed

    by

    Igor

    A.

    Mel'chukand

    his

    collaborators

    includes,

    among

    many

    other

    things,

    asemanticclassificationof

    lexicalcombinationphenomena.

    The

    approach

    is much broader

    thanjust

    a descriptionof

    thesemanticclasses

    into

    which

    collocations

    can

    be

    subdivided:it

    is

    a

    whole

    theory

    of

    language,

    conceived

    as a

    model

    ofhowmeanings can

    be

    realized

    in

    language.

    Itis impossible

    to

    giveafullandadequatecharacterizationofMTMin

    the

    frameworkof

    the

    present

    article.

    Itissufficient,here,

    to

    recalla

    fe wof its

    most

    important

    aspects:

    -

    TM

    equally

    supports

    analysis

    and

    generation

    of

    text,

    but

    its

    primary

    goalisanaccount ofgeneration,i.e.

    the

    problemofhowmeaningsget

    realizednexts

    hence

    he

    ame

    Meaning

    = >Text-Theory ).

    Consequently,

    the

    description

    ofparaphrasing,

    ofquasi-synonymy,

    of

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    10/32

    The

    way

    words

    work

    together/

    combinatorics

    23 5

    the

    hading

    f

    eaning,

    epending

    n

    ommunicativend

    text-structural

    phenomena,

    etc

    are

    important

    to

    MTMresearchers;

    MTMisamodularand

    stratified

    approach.tdistinguishesseveral

    strata,roughlycorresponding

    to

    thetraditionallevelsof

    description

    and

    epresentation

    semantic,

    deep

    nd

    urface

    yntactic,

    morphological

    andphonological);

    -TM

    yntax

    s

    ependency-based;

    escriptions

    f

    verbs

    n

    he

    dictionaryincludeaninventoryoftherelevantactants

    and

    of

    their

    syntacticrealization

    (e.g.

    as

    phrase

    structural

    constructs);

    this

    is

    a

    basis

    for

    the definition ofa

    syntax-semantics

    interface,

    and italsoallows

    to

    link

    the

    descriptionof

    collocations

    to

    thisinterface.

    8

    MTM

    describes

    collocations

    by

    means

    of

    lexical

    functions .

    These

    can

    be

    seen

    as relations

    between

    one

    ormorewords

    or

    wordcombinations

    on

    the

    one

    hand

    and

    partialsemantic

    description

    onhe

    ther.

    he

    partial

    semantic

    descriptionconsistsof

    akeyword

    and

    an

    abstractsemantic

    operator

    applied

    to

    thiskeyword;thedifferentkindsof

    operators

    are

    the

    different

    types

    of

    collocations.

    9

    The

    number

    of

    lexical

    unctions

    simitedo

    around

    60 ;

    heycan

    be

    combined(see

    e.g.

    Ramos/Tutin

    1992,

    workbyMel'chuk, etc.).Outof

    these

    60exical

    unctions,about

    dozen

    play

    n

    mportant

    ole

    o

    describe

    collocations

    of

    indoeuropean

    languages.

    10

    The

    table

    inFigure4contains

    a

    fewexamplesof

    lexical

    functions,

    along

    with

    the

    name

    ofthe

    LFs

    and

    our

    very

    roughdescription

    of

    the

    meaning

    expressedbyeachoftheoperators.

    Meaning lexical

    unction

    examples

    French,

    English)

    Intensifier MAGN

    bruitinfernal,

    interdireabsolument

    Quantity

    electors

    MULT

    SING

    un

    e aim

    d'abeilles

    ungrain

    de

    riz,

    ak eofsoap

    Evaluate*

    VER

    sharp

    knife,

    meritedpraise

    semantically

    almost)

    emptystylistic

    figures

    EPIT

    GENER

    FIGUR

    ocanimmense

    unsentiment

    de

    joie

    un

    rideaudefume

    points

    ina

    process

    GERM

    CULM

    seed

    of

    hope

    paroxysme

    de

    joie

    semantically

    almost)

    empty

    support

    verbs

    OPERl

    OPERj

    porter

    plainte,pousserun

    cri,

    mener

    unelutte

    sth.formsanoffer,th .

    onstitutes

    an

    offer

    Figure4.

    Examples

    oflexical

    functions

    The

    Meaning

    < = >

    Text-Model

    is

    not

    only

    a

    framework

    forthe

    description

    and

    semantic

    classification

    of

    collocations:

    Mel'chuk

    andhis

    collaborators

    have

    lso

    orked

    ut

    roposals

    or

    ery

    etailed

    ictionaries,

    he

    ExplanatoryandCombinatory

    Dictionaries

    (ECD);theseproposals

    11

    have

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    11/32

    236

    Euralex

    1994

    been

    moststimulating

    for

    bothlexicographycf.the

    dictionary

    of

    Cohen

    (1986),thethree

    volumesof

    French

    ECDs,

    as

    well

    as

    ECD

    fragments

    of

    Russian

    and

    studies

    towards

    ECDs

    of

    English

    (Steele1988),

    and

    German)

    and

    atural

    Language

    Processing

    cf.

    ork

    by

    Nirenburg

    et

    l.

    988,

    Heylen/Maxwell

    994,eid/Raab

    989).

    tillsomeeficitsave

    been

    identified,

    such sheact

    hat

    he

    evelof

    granularityof

    the

    semantic

    description

    f

    exical

    unctions

    ay

    ot

    e

    ully

    ufficient

    or

    semantics-basedNLP(cf.

    Heylen/Maxwell

    1994)

    or

    the

    lack

    ofa grammar

    forcombininglexical

    unctions

    among;

    hislatter

    gap

    hasbeen

    filled

    by

    Ramos/Tutin(1992).

    If

    onecompares

    he

    MTM

    approach

    o

    collocationswith

    he

    work

    of

    Hausmann

    and

    otherlexicographers,as Cop(1990)and

    Heid

    (1992)have

    done,

    quite

    some

    overlap

    is

    found,

    despite

    differences

    in

    terminology.

    The

    tableinFigure5comparativelysummarizes

    the

    relevantterminology

    used

    in

    Mel'chuk'sand

    Hausmann's

    work.

    Compared

    Who?

    components/properties

    Terminlogy

    H.

    Base

    Collocate

    M.

    Keyword

    Value

    of

    LF

    Semantic

    H.

    autonomous

    dependent,

    properties

    non-autonomous

    M.

    compositionally

    not

    fully

    describable

    compositionally

    Implication

    H. collocations

    musl

    be

    learned

    fo r

    treatment

    separately

    of

    collocations

    M.

    collocations

    must

    be

    stored

    explicitly

    n

    theECD

    Figure5.

    Comparing

    terminologyofMTMandHausmann

    Collocation-related

    esearch

    opics

    n

    TMnclude

    he

    ctual

    integration

    of

    a

    collocational

    componentinto

    implementations,as

    well

    as

    workon

    the

    relationship

    between

    semanticsandcollocation(seeSection

    2.3.3below).

    2.3.3Correlating semantic

    classes

    and collocationalbehaviour

    It

    has

    been

    stated

    that

    collocationsand

    collocational

    lexical

    choice

    are

    completelylexicallydetermined

    (cf.

    Mel'chuk/Polgure1987)and

    thus

    need

    tobememorized,byforeignlanguagelearners

    (cf.

    Hausmann1984)or

    in

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    12/32

    Theway

    words

    work

    together/

    combinatorics

    3 7

    dictionaries,

    be

    t

    orhumanuseor

    for

    NLP.

    On

    he

    otherhand,

    some

    researchhasbeengoingon,over

    he

    past

    ew

    years,

    aboutcorrelations

    between

    emanticlassifications,

    exical

    ields,

    tc .ndollocational

    behaviour.

    Heid/Raab

    (1989) have

    observed

    that

    theFrenchnouns

    denoting

    personalattitudes

    whicharedescribedinthefirst

    volume

    oftheECD(cf.

    Mel'chuket

    al.

    1984

    )selectsimilar

    collocates,for

    certain

    lexical

    functions:

    foradozenof

    semanticallyrelated

    nouns,

    12

    aparallel

    behaviour

    in

    collocate

    selection

    for

    thelexical

    functions

    OPERI,CAUSOPER,INCEP

    FUNC,

    INCEP

    OPER,

    FIN

    OPER,

    ..

    wasobserved.

    Inthe

    field

    of

    lexical

    acquisition,

    it

    has

    been

    triedtoconstitute

    lexical

    semantic

    lasses

    oromain

    lasses?)

    yonsideringollocational

    behaviour:

    the

    assumptionisthatbaseshaving

    the

    samecollocatesbelongto

    the

    same

    ield.

    ustejovsky

    et

    l.

    1993)

    have

    sed

    his

    assumption

    n

    terminology-relatedcorpus

    exploration.

    Muchmorematerial

    is

    nowanalyzed

    n

    studies

    by

    Meyer/Mackintosh

    (1994)

    and

    inparticularby

    Mel'chuk/Wanner

    (1994).

    While

    the

    first

    ison

    sublanguagecollocations,

    the

    second

    deals

    with

    general

    language,

    coming

    back

    to

    the

    fieldofemotionnouns,

    subdivided,

    for

    that

    purpose,

    into

    (in

    part

    overlapping)

    subsets,

    according

    to

    inherent

    propertiesof

    emotions,as

    they

    aredescribedin

    psychology:

    positive

    vs .negativeemotions,moderatevs .

    intense,temporaryvs .permanent,etc.Foreachsuch

    subset,

    thecollocate

    selection

    behaviour

    of

    a

    few

    prototypical

    German

    base

    nouns

    and

    selected

    lexical

    functions

    isanalyzed.

    Theesults

    are

    of

    wo

    ypes:

    onheone

    hand

    ndeed,

    a

    numberof

    collocations

    appear

    withmostor

    al l

    of

    the elements

    of

    the

    field

    or

    ofa

    given

    subset;

    on

    the

    other

    hand,

    a

    non-negligible amount

    of

    exceptions

    is

    noted

    as

    well.

    Mel'chuk/Wanner1994)ake

    his

    esults

    starting-point

    or

    proposal

    for

    thereorganizationof

    the

    ECDentriesfor

    emotion

    nouns.

    The

    proposalis

    tointroduce

    a

    common

    public entryfor

    the

    whole

    class

    of

    nounswhich

    wouldstipulate

    the

    values

    of

    certain

    lexical

    functions,

    either

    for

    all ofthe

    class

    members,or

    in

    function

    of

    the

    presence

    ofoneormore

    of

    the

    subclass-definingriteria.

    he

    esults

    o

    ot

    mmediately

    ead

    o

    hierarchy;

    the

    domain

    model

    usedis

    not

    hierarchical

    neither.

    Whatcomes

    out

    reather

    mplications

    etween

    he

    resence

    fcertain

    emantic

    propertiesand

    the

    collocation

    behaviour.

    Meyer/Mackintosh

    (1994)observethat,

    for

    thesublanguage

    of

    technical

    documentationofCD-ROM

    devices,

    afew

    collocational

    generalizations

    are

    possible,

    which

    can

    be

    modeled

    in

    aninheritancehierarchy.Maybe

    in

    part

    theifferences

    etween

    el'chuk/Wanner'sndeyer/Mackintosh's

    resultshave

    to

    do

    withthe

    fact

    that

    terminologicaldomains,

    especially

    when

    denoting

    oncrete

    bjects,

    an

    ore

    asily

    e

    odeled

    hemselves

    n

    taxonomies

    than

    domains

    ofabstract

    notions,

    as used

    in

    general

    language.

    Theesult

    snteresting

    n

    heightofMartin's

    otionofconceptual

    collocation':

    Martin

    (1992) observesa

    correlation

    between

    the

    semantic

    and

    conceptual

    description

    of

    items

    ofa

    (technical)

    domainandthe

    collocational

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    13/32

    238

    Euralex

    1994

    behaviour.

    He

    observes

    a

    subtypeof

    n+adj-andn+n-collocations

    which

    j

    ust

    denotesspecialized)subtypesof

    the

    objectsdenoted

    by

    hebasenoun.

    Similarly,hepoints

    out

    thatn+v-collocations denote what

    one

    cantypically

    do

    with

    (or

    to)

    the

    object

    denoted

    by

    the

    base

    noun.

    13

    W ehavemadeafew experiments

    on

    this

    problem

    ourselves,usingCohen's

    descriptionofcollocationsofthe

    sublanguage

    ofthestock

    market

    s

    starting

    point(Cohen

    1986).

    14

    W e

    have

    looked

    up

    theentries

    for

    nouns

    which

    share

    certain

    collocational

    properties.Onetypeofquestion

    we

    askedwastoknowwhichsubsetsof

    nouns

    share

    one

    or

    more

    collocate

    expressing

    the

    INCREASE

    or

    DECREASEof

    theprocess

    denoted

    by

    thebasenoun,bothwith

    subject-

    andobject-

    taking

    verbs.Onesuchgroupconsistsof

    :

    all

    hese

    nouns

    share

    he

    collocates

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    14/32

    Theway

    words

    work

    together/combinatorics 239

    Theresult

    ofthis

    exploration

    shows,

    among

    otherthings,

    the

    following:

    some

    collocateverbs

    are

    passe-partout ,like

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    15/32

    240 uralex

    1994

    in

    such

    smallgroups

    of

    nouns

    is significantandmostlikely

    can

    berelated

    with

    properties

    relevant

    forthe

    semanticorconceptualdescription

    ofthe group

    of

    nouns.

    Terminologists

    and

    lexicographers

    might

    usefully

    explore

    in

    more

    detail

    the

    elationship

    etween

    onceptual

    r

    emantic

    escription

    nd

    collocational

    behaviour.

    AsMartin(1992)

    states,

    resultsof

    acollocational

    analysisfurnish

    input

    fordefinition

    construction

    and

    vice

    versa,definitions

    (in

    ermsframes,

    or

    xample)

    an

    e

    sed

    s

    ackground

    or

    collocational

    expectation

    patterns.

    With

    the

    availabilityofcorpus

    processing

    tools,

    such

    analyses

    become

    less

    expensive.

    ohenid

    ot

    xplicitely

    roup

    he

    ouns

    reated

    n

    er

    dictionary,

    althoughthis

    would

    be

    possible,as

    our

    experimentsshowandas

    the

    ork

    f

    el'chuk/Wanner

    1994)

    nd

    eyer/Mackintosh

    1994)

    suggest.

    uch

    tructuringoulde

    elpful

    oredagogicalurposes.

    Knowles/Roe

    (1994)dealwith

    the

    pedagogical

    use

    of

    collocationalmaterial

    extracted

    from

    textsofspecializedlanguage;someofthetoolsdescribedby

    Grefenstette

    (1994)

    are

    helpfulfor

    technically

    doingthejob.Tools,methods,

    applications

    and

    descriptiveworkcome

    ogether

    at

    his

    point:

    affaire

    suivre.

    2.4

    Pragmaticpropert ies

    The

    ragmatic

    escription

    f

    ollocations

    nvolves

    he

    otion

    f

    collocations

    as

    conventionalized

    expressions.

    Generallanguage

    collocations

    are

    thenormalway ofexpressingagivenmeaning(cf.sichdieZhne

    putzen/*biirsten

    s.

    e

    rosser/*nettoyeres

    ents).

    ausmannalls

    collocations

    semi-finished

    products

    of

    language( Halbfertigprodukte

    der

    ede ).

    his

    s

    hy

    ollocationally

    orrectexts

    re

    erceived

    s

    fluent ,

    hereasexts

    ith

    rong

    ollocates

    rith

    ompositional

    expressions

    where

    collocational

    alternatives

    would

    exist,are

    perceivedas

    unnatural;

    his

    property

    ofcollocations

    n

    urn

    otivates

    much

    of

    he

    pedagogical

    interest

    they

    attract.

    In

    addition,individualcollocations canpertain

    to

    diasystematiclanguage

    varieties,

    thesamewayas

    one-word

    lexemescan(cf.SwissGerman

    einen

    Entscheid

    fllen

    for

    German

    eineEntscheidung

    treffen;

    East

    German

    eine

    Bestellung

    auslsen

    vs .eine

    Bestellung

    aufgeben).

    3.

    Lexicographic

    treatment

    of

    combinatory

    phenomena

    -

    access

    to collo-

    cations

    We

    ave

    o

    ar

    entioned

    ew

    roblems

    fhe

    escription

    f

    collocations.In

    addition,

    the

    properties

    ofcollocations

    lead

    to

    a

    numberof

    particular

    roblems

    oncerningheresentationn

    ictionaries

    f

    collocational descriptions.

    Here,

    wecapitalize onthe organization oflexical

    entries

    and

    the

    access

    tocollocational

    information

    indictionaries.

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    16/32

    The

    waywordsworktogether/combinatorics

    41

    Although,

    from

    a

    semantic

    point

    ofview,

    twouldprobablybe

    a

    good

    solution

    tohaveindividuallexical

    entries

    for

    collocationsand

    idioms

    (and

    to

    make

    themaccessibleasawhole),thisisnotpracticalwithinsemasiological

    dictionaries.

    This

    is ,

    however,

    what

    happens

    in

    onomasiological

    dictionaries,

    suchas the Longman

    Language

    Activator

    (LLA)

    orthedictionary

    of

    idioms

    plannedby

    Dobro

    vol'skij(1994).

    3.1 The

    organization

    ofcollocationand id iom

    dictionaries

    Lexicographers

    have

    muchdiscussedtheaccesstoidiomsandcollocations

    inmonolingualandbilingualdictionaries;inparticularthequestion

    where

    to

    lphabetize

    ultiword

    xpressions:hisroblemmustbeolved

    n

    different

    ways,

    depending

    on

    he

    distinctions

    between

    monolingual

    and

    bilingual

    dictionaryandbetween

    encoding

    (textproduction)

    anddecoding

    (text

    understanding)

    seof

    he

    ictionary.

    roductiondictionaries

    will

    favour

    the

    accessto

    collocational

    informationvia

    the

    base,

    whereasin

    a

    decoding

    dictionary

    we

    can

    not

    be

    sure

    thatthe

    reader

    of

    a

    text

    is

    able

    to

    figure

    outwhetherornotawordformbelongstoacollocation,

    and,

    access

    viaboth,

    bases

    and

    collocates,

    or

    via

    the

    collocation

    as

    a

    whole

    wouldbe

    ideal.

    15

    Thisiseasierto

    realize

    online;

    an

    experiment

    of

    thistypehasbeenmade

    in

    a

    lexicaland

    terminological

    database

    designedtoholdsingleword

    items

    as

    well

    as

    collocations,

    which

    has

    been

    designed

    by

    Heid/Freibott

    (1991).

    The

    ollowing

    roblems

    f

    ccess

    o

    ombinatory

    nformation

    n

    dictionaries

    have

    beendiscussedintheliterature.

    For

    idiomatic

    expressions,theproblemisparticularlyhard,sinceusually

    none

    of

    the

    word

    forms

    whichmake

    up

    theidiom

    isa

    clear

    candidate,

    on

    semanticgrounds,

    to

    serve

    asan

    entry

    word.

    For

    collocations,Hausmann(1988)

    has

    suggested

    tosortthemunder

    the

    bases.

    This

    is

    what

    happensconsistently

    in

    Ilgenfritz

    et

    al.

    (1989)(cf.theentry

    for

    respect

    inFigure9).

    This

    sorting

    procedureisin

    line

    withthetraditionof

    stylistic

    dictionaries,

    such

    as

    Lacroix

    (1956)

    and

    others.

    An

    example

    of

    an

    entry

    fromLacroix

    (1956)

    is

    reproducedinFigure8.

    It

    liststheverbal

    and

    ad j ectivalcollocates

    ofthe

    entry

    word,sortingthemin part

    accordingto

    their

    subcategorizationproperties.

    TheBBIcombinatorydictionaryofEnglish

    (Benson

    et

    al.1986)

    also

    organizesits

    macrostructure

    by

    the

    bases

    treated,

    listing

    the

    collocates

    in

    thebody

    of

    theentry.

    Respect.prouver,

    ressentir,

    montrer,

    marquer,

    tmoigner,

    manifester,

    devoir,porter,professer,affecter,feindredu

    respect.

    Inspirer,provoquer,

    commander,forcerle respect.Manquerderespect. Adresser

    ses respects.

    tre

    entour

    d'un

    certain

    respect.

    Rappeler

    au.

    QUAL.:

    profond,

    filial,

    sincre,craintif,gnral,unanime,universel.

    Figure8.The

    entry

    s.v.respectinthecollocation dictionaryby

    Lacroix

    (1956)

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    17/32

    242 uralex1994

    respect

    m

    Respekt,

    Achtung

    avoir,ressentirdu~

    envers,pour,

    l'gardde

    qnj-m

    Achtungentgegenbringen:Nous

    ressentons

    du

    ~

    envers

    Monsieur

    votre

    pre.

    /

    devoirle

    ~

    qn

    j-m

    Respektschulden:Nous

    devons

    le ~

    no s

    professeurs.

    /

    forcer

    le

    ~

    de

    qn

    j-n

    Achtung

    bntigen:

    on

    omportement

    a

    forc

    mon

    ~.

    /

    mposer,

    nspirer,

    commanderle

    ~

    Achtungeinflssen:Cette

    personne,

    bienqu'ellesoit

    trs

    petite,inspire le

    ~.

    /manquer

    de

    ~

    (envers

    qn)

    es

    an

    der

    notwendigenAchtung

    fehlen

    lassen(gegenber

    j-m:)

    Jetrouve

    qu'il

    manque

    de

    ~

    envers

    se s

    parents.

    /tmoigner,montrer

    du

    ~

    ,

    envers,pour,

    l'gardde

    qn

    j-m

    Respekterweisen:

    Les

    enfants

    d'aujourd'huinetmoignentplus

    tellement

    de

    ~

    aux

    personesges.

    Figure

    9.

    The

    entry

    s.v.

    respect

    inIlgenfritz

    et

    al.(1989)

    A

    quite

    detailed

    syntactic

    account

    of

    collocations

    similar

    to

    our

    proposals

    in

    Figure2isgiven

    in

    Laine

    (1993):

    this

    dictionary

    (specialized

    vocabulary

    of

    CAD/CAM

    French/English)

    distinguishes

    subject-verb-,

    verb-object-,

    and

    oun-adjective-collocations,

    s

    well

    s

    ollocational

    oun

    hrases

    involving

    PPs

    (compound

    nouns).

    Below,

    we

    reproduceanexampleofan

    entry.

    t

    onsists

    f

    wo

    olumns,

    neofwhich

    ontains

    he

    yntactic

    classificationsed

    n

    heictionary,

    he

    ther

    he

    elevant

    exical

    combinations.

    ordonnancement scheduling

    ~

    V.

    V.~

    ~

    Adj.

    ~

    Prp)(Art)N

    N(Prp)(Art)

    ~

    ~

    connatre

    les

    ordres

    lancs

    choisir

    ~,dfinir

    ~ essayer

    ~,rgles]gouverner

    ~

    ~

    assist

    par

    ordinateur,

    ~

    dynamique,

    ~

    nformatis,

    ~

    multiconvergent,

    ~

    optimal

    ~

    butsmultiples,

    ~

    pardates

    croissantes,

    ~parvaleurs

    croissantes

    des

    marges

    libres

    l'artde'~,

    coefficient

    d'~,fonction

    ~,

    mthode

    d'~

    (de

    production),rebouclagesur/'~,echnique

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    18/32

    The

    way

    words

    work

    together

    /

    combinatorics

    43

    (glossed)

    examples,

    as

    wellas

    subentries.

    Similarly,Cobuildhascollocations

    in

    its

    definienda,as

    well

    asin

    its

    examples.

    Amongilingualictionaries,heollins/Robertnglish/French

    dictionaries

    and

    the

    Collins/Klett

    English/German

    ones

    are

    a

    remarkable

    exception:

    hey

    ave

    articular

    evices

    o

    enote

    +v-collocations,

    distinguishing

    even

    whether the

    noun

    is

    the

    subject

    or

    a

    complement

    of

    the

    verb.

    Thesedictionaries

    have

    been

    collocationallyexplored

    and

    described

    in

    detail

    yontenelle

    1992a)

    tc.:n

    op

    fheir

    ell-structured

    representation

    of

    collocations,they

    are

    a

    remarkably

    richsourceforthis

    type

    ofexicalnformation.nother

    articular

    eviceor

    he

    reatment

    f

    collocationshasbeenusedin theVan Dalebilingualdictionaries.

    They

    follow

    theidea

    of

    a

    categorialdescription

    of

    the

    component

    partsof

    collocations

    and

    indicate,

    for

    example,

    verbal

    collocates

    of

    a

    noun

    in

    a

    special

    part

    of

    the

    entry,

    using

    a

    numeric

    code

    opoint

    o

    he

    categoryof

    the

    combination

    partner.

    16

    A

    sample

    entry

    from

    theFR->

    NLdictionaryis

    reproduced

    in

    Figure

    11.

    respect0.1r6'eerf= hoog)achting,ontzag,reaped0.2erbieding=

    naveling

    0.3

    beiuigingen

    van

    hoogachiing

    O2.1~humainvrees

    voor

    watmenervandenken,

    zeggenal2.3m es respects

    votre

    emmede

    groeien

    aa n

    uw

    rouw;

    m esespects

    3.1voirdu

    ~

    pour

    qn.

    achting,

    espect

    voor

    em .

    hebben;

    commander,

    imposer,

    nspirer

    le

    ~

    ontzag

    nboezemen,

    espect

    afdwingen;

    manquer

    de

    ~,

    enversqn.

    zieh

    tegenover

    iem.

    onbehoorlijk,

    niet

    correctgedragen;

    manquer

    de~

    une

    emmeiehvrijpostiggedragenegenover

    een

    vrouw;montrer,moignerdu~

    ,nvers,pour

    qn .

    em.

    chting

    betonen,

    eiuigen;garder,enir

    qn .

    n~em.'nbedwang

    houden,em.

    onder

    schot

    houden

    3.3prsenterses

    respects

    an.

    em.degroetendoen4.1

    ~

    de

    soi

    zelrespect6.1

    sauf

    le~qu evousdois,sauf

    votre

    ~

    metuw

    verlof,metuuiwelnemen,

    metaileespect.

    Figure

    11.

    Theentrys.v.respect

    intheVanDaleFR/NL dictionary

    3.3TheECDs:access

    v ia

    semantic criteria

    The

    bove

    ictionaries,pecializedndeneral,

    onolingual

    nd

    bilingual,

    se

    yntactic

    riteria

    or

    he

    rganization

    f

    ollocational

    information.

    Theonly

    dictionaries

    we

    are

    aware

    ofto

    base

    their

    organization

    on

    emantic

    riteriasell,

    re

    he

    xplanatoryndombinatory

    dictionaries

    which

    have

    been

    publishedby Mel'chukandhis research

    group,

    such

    as

    Mel'chuketal.(1984),etc.Theaccesstocollocationsis

    via

    thebase

    entry

    and

    the

    lexical

    functions

    applicable

    tothe

    base

    entry(seeSection

    2.3.2

    andnote

    9).

    A

    small

    part

    of

    the

    entry

    s.v.

    respect

    is

    given

    in

    Figure

    12.

    17

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    19/32

    244

    Euralex

    1994

    Operi

    avoir,

    prouver

    ART

    ~

    [Toute

    la

    population

    n

    profond

    respect

    pource t

    artiste

    mrite]

    continuel-

    lement

    Operi

    vivre

    [dans

    le~

    ]

    [Cette

    famillevit

    an s

    le

    espectde

    se s

    anctres]

    ContOperi

    garder

    [ART~

    ]

    [Malgrlespropos

    diffamatoires

    es

    journalistes

    envers

    ce

    put,

    se s

    prochescollaborateurs

    ont

    gardunprofondrespectpour

    lui]

    FinOperi

    perdre

    [ART

    ~

    /out

    ~

    ]

    [Lesdirigeants

    ont

    perdu

    tout

    respectpour

    ces

    artistes]

    Caus/

    3

    \

    Operi

    inciterN

    ART

    ~

    ]

    [Les

    parents

    les

    incitent

    au

    espect

    es

    valeurs

    morales;

    L'honnte

    e

    Louise

    ncitePaul

    etJean

    au

    espect

    de

    ette

    femme]

    nonOperi

    ignorer

    tout

    ~

    [Jean

    gnore

    toutrespectpour

    se s

    parents]

    Oper

    jouir

    de

    ART~

    ],avoir[le

    ~

    [Pierrejouitdu

    espectde

    se s

    subordonns]

    ContOper

    conserverle

    ~

    [Malgr

    lespropos

    diffamatoires

    es

    journalistes

    envers

    e

    put,

    ce

    dernieraconservlerespectese s

    proches

    collaborateurs]

    FinFunco

    disparatre

    [L e

    respect

    du

    publicpourceministre

    disparu]

    CauSFunco

    se

    mriter

    [ART

    ~

    [Parso ntravailconsciencieux,

    l

    semritale

    respect

    de

    se s

    collgues

    Figure12.

    A

    fragment

    ofthe

    collocation

    partof

    the

    entry

    s.v.respectinthe

    ECD

    An

    application

    of

    the

    ECD

    description

    echnique

    s

    found

    n

    Cohen's

    dictionary

    Cohen

    1986)

    of

    collocations

    of

    the

    sublanguage

    of

    economy

    (stock

    market

    and

    conjuncture).

    18

    Instead

    of

    using

    lexicalfunctions,

    she

    uses

    paraphrases

    ofa

    elevant

    subsetofthese;given

    hat

    manyofthe

    tems

    serving

    as

    entrywordsin

    Cohen's

    dictionary

    denote

    processes,the

    dictionary

    indicates

    phases

    of

    the

    processes,

    likethe

    start,increase,

    decrease

    and

    end.

    We

    reproduce

    inFigure13a partof

    the

    entryforFRempruntas anexample.

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    20/32

    The

    way

    words

    worktogether/combinatorics 245

    emprunt

    nouns

    ( sub

    j.

    of)

    verbs

    (obj .

    of)

    verbs

    adjectives

    START

    mission

    lancement

    mettre

    lancer

    INCREASE

    accroissement

    augmentat ion

    s'accrotre

    augmenter

    monter

    accrotre

    augmenter

    considrable

    lev

    gros

    UNDETERMINED

    DECREASE

    baisse

    diminut ion

    rduction

    baisser

    diminuer

    rduire

    restreindre

    petit

    END clore,l iquider

    rembourser

    restituer

    Figure

    13 .

    The

    entry

    s.v.

    emprunt

    in

    the

    dictionary

    by

    Cohen

    (1986)

    The

    two-dimensional

    presentationof

    the

    materialin

    Cohen

    (1986)supports

    access

    viadifferent

    ways:the

    userstarts

    with

    abase

    lemma

    and

    then

    can

    eitherlook

    up

    collocations

    in

    terms

    of

    the phases

    of

    the

    process

    denotedby

    thenoun,selecting

    hereafter

    headequategrammaticalealization,or,

    alternatively,

    by

    jointly usingboth,semanticandgrammaticalproperties.

    3.4

    Summary,

    new

    proposa ls

    The

    tablein

    Figure

    14

    contains

    anexemplary

    summary

    ofthetypesof

    information

    we

    canfindin

    dictionaries

    andof

    thewayshow

    this

    information

    can

    be

    accessed.

    Information

    given

    n

    dictionaries

    Example

    cited

    Access

    via

    ...

    Example

    cited

    acollocation

    is

    used:

    it

    isattested

    any

    definitions,

    examples,

    etc.

    any

    collocation

    related

    with

    a

    given

    reading

    of

    thebase(explicitely)

    VanDale

    bilinguals

    base

    +

    reading

    (number)

    VanDale

    bilinguals

    category

    of

    base

    andcollocate

    Van

    Dale,

    [Ilgenfritze.a.

    1989]

    [Laine1993]

    base+

    reading+

    cat.code VanDale

    fo rN-V

    collocations:

    gramm.function

    of

    N

    Robert/Collins

    [Cohen1986]

    base+

    r.,

    +position

    (markup)

    Robert/Collins

    semanticclassification

    of

    collocations

    ECDs,

    [Cohen1986]

    base+r.,

    lexicalf.

    ECDs,

    [Cohen

    1986]

    Figure14.MainFeatures

    of

    collocation

    treatment

    andaccess

    in

    dictiona-

    ries

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    21/32

    246

    uralex

    1994

    New

    proposalsorsimplynewpracticalsolutionscome

    up

    indictionaries

    which

    cons istently

    followan

    onomasiological

    view.Thisis

    discussed,

    with

    a

    view

    to

    the

    plans

    fo ra

    Russian/English

    idiom

    dictionaryin

    the

    paperby

    Dobrovol'skij

    (1994).

    The

    author

    uses

    a

    set

    of

    local

    (or:

    partial)

    conceptual

    hierarchies,inspired

    by

    prototypetheory,

    to

    organize

    th e

    backbone

    of the

    dictionary,

    an dhe

    thenlinks th e

    idioms

    described

    in

    the

    articles

    to

    the

    nodes

    of

    hesehierarchyrees.

    imilar

    pproachsollowednhe

    LongmanLanguageActivatoranonomasiologicaldictionaryfo r

    language

    production.

    The

    LLA

    has

    collocations,

    idiomsandnormal

    single

    word

    lexemes as entries,all

    related

    by

    a

    semantic

    superstructurespanning

    up

    small

    hierarchies,fo rabout1,000topics .Collocationsandidiomsare

    treated

    here

    on

    a

    parwithotherlexemes.Accessisby

    the

    meaningofthe

    multiword

    item

    as

    a

    whole.

    Other

    proposalsfo r

    dictionary

    structureare

    madein

    Oubine's(Oubine

    (1994)

    ilingual

    ussian-English

    ictionaryf

    exical

    ntensifierscf.

    Mel'chuk's

    lexical

    function MAGN).

    Given

    thatabilingual

    dictionary

    is

    aimed

    at, andthatnot forallcollocationsfullequivalencecanbe

    stated,

    the

    author

    optsfor

    keeping

    the

    twolanguagesseparate,pointingfrom

    bases

    of

    one

    languageto

    their

    equivalentsin

    the

    otherlanguage;

    the

    base

    entriescontain

    alphabetical

    listsof

    intensifiers,each

    with

    examples

    and,

    optionally,

    usage

    notes.

    Access to

    collocations

    is

    normally

    givenviathebases,

    a

    reverse

    part

    can beaccessed

    by

    th eintensifiersthemselves,

    leading

    toan indexof

    bases

    modifiable

    bya

    given

    intensifier.

    Another

    Russian/English

    collocational

    dictionary is

    described

    inBenson(1994).

    Collocationseem

    o

    equire ultidimensionalescription

    (syntactic,semantic,

    pragmatic,

    relation

    withthedomainmodel,

    etc.).

    Representing

    this

    information

    and

    making

    it

    accessible,

    also

    indifferent

    'adhoc'combinations,makes

    a

    flexible

    dictionary

    structure

    necessary,

    as it

    is

    best

    achieved

    with

    computational

    tools.

    Definingthe

    structure

    of

    a

    computational

    collocation

    dictionary

    in

    a

    way

    goingbeyondth e

    simple

    encoding

    in

    Heid/Freibott

    (1991)

    is

    an

    interesting

    task.

    19

    4.

    Acquisit ionand

    application

    of

    collocational

    information

    4.1

    Acquiringcollocationalinformationfromtext

    Much

    of

    the

    linguisticand

    lexicographic

    discussion

    about

    collocations has

    long

    been

    based

    on

    a

    few

    examples,

    mostly

    made

    upby

    linguists

    for

    the

    purpose

    of

    exemplification.

    Researchers thus

    felt

    that

    there

    isaneed

    for

    lists

    of

    ollocationsollectedromextualmaterial.Others,ike

    Fontenelle

    (1992a),

    Fontenelle

    (1992b)

    have

    developed

    and

    used

    computational

    tools

    to

    identify

    collocationcandidatesindictionaries

    and

    toextractcollocation

    listsfrom

    machine

    readabledictionaries.

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    22/32

    Theway

    words

    work

    together

    /

    combinatorics

    47

    Onthe

    otherhand,

    practicalexicographers

    hemselves

    need

    ools

    for

    corpusnalysishichouldiveccessoollocationandidates:

    collocationaldescriptionindictionariesisanareawherestill

    improvements

    are

    possible

    nd

    necessary,

    nd,

    on

    he

    other

    hand,

    he

    availability

    of

    collocationlistsextracted

    from

    texts

    is

    a

    great

    advantage

    for

    thesemantic

    descriptionof

    lexical

    items.

    Some

    of

    the

    ools

    or

    extracting

    collocationsrom

    exts

    are

    basedon

    statistical

    methods.

    A

    few

    statistical

    measures

    of

    similarity

    havebeen

    used

    to

    identify

    how

    often

    wordsappear

    together

    and

    how

    similar

    the

    contextsare

    wherethese

    words

    appear.The

    measures

    mostfrequentlyusedare

    mutual

    information ,

    t-score

    and

    z-score .Thereare

    as

    wellothersimilarity

    measureswhichhavebeenappliedintoolsforcorpusexploration.

    20

    We

    can

    not,

    in

    the

    framework

    of

    this

    article,

    discuss

    the

    different

    statistical

    toolsinalldetail,and

    we

    willthusrestrict

    ourselves

    to

    an

    informalaccount

    oftheworkingsofthemostsimplestatisticaltools;adiscussionof

    the

    choices

    which

    exicographers

    nd

    omputational

    inguists

    ave

    o

    ake

    hen

    applyingstatisticaltoolswitha

    view

    tothe

    retrieval

    of

    collocation

    candidates

    fromtext.

    On

    hat

    basis,wecandentifyafewasks

    or

    both

    esearch

    and

    ool

    development:

    essentially

    the

    mpression

    s

    hat

    he

    combination

    ofboth

    statistical

    easures

    nd

    inguisticnformation

    e.g.rom

    re-

    nd

    post-analysissteps)is

    a

    successful

    practical

    way

    forward.

    4.1.1

    Simple

    statistical

    measures

    Themost

    prominent

    statistical

    measuresused,mplementations

    of

    whichare

    available

    tomanylexicographers,arethe

    mutualinformation

    index

    andthe

    t-score test.

    21

    4.1.1.1Mutualinformation

    The

    mutual

    nformation

    ndex

    MI,

    or

    short)

    s

    used

    o

    measure

    he

    association

    between

    twowords;for

    a

    givencorpus,wecan count howoften

    a

    givenword

    occursinthat

    corpus

    (frequency).

    W e

    can

    dosuch

    statistics

    for

    allword

    forms

    ina

    corpus

    (and

    useforexampletheinformationonvery

    frequent

    orveryare

    ordso

    decide

    whetherhey

    houldgo

    nto

    dictionary

    builtfor

    that

    corpus).Bydividingthefrequency

    of

    a

    word

    form

    by

    the

    numberof

    word

    formsinthecorpus,wegetthe

    lexical

    probability

    of

    thewordform

    (how

    often,inrelationtothe

    overall

    number

    of

    word

    forms

    inthecorpus,does

    it

    appear?).

    When

    easuring

    utual

    nformation,

    e

    o

    ot

    nly

    bserve

    he

    probability

    of

    single

    word

    forms,butalsothe

    probability

    of

    combinations

    of

    twowords

    (

    bigrams ).This

    leads

    to

    threeprobability

    values

    whichcan

    be

    compared:

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    23/32

    248 uralex

    1994

    -

    he

    probabilityofthe

    first

    word form,w\:

    P(w\);

    heprobability

    ofthe

    second

    word

    form,wg

    P(w-i);

    heprobability

    of

    a

    pair(w\,wj)builtupofthetwoword

    forms:

    P

    (Wl,

    W2)).

    These

    three

    values

    are

    compared:

    theprobabilitythat

    wjand

    H > 2

    cooccur

    (e.g.

    nexttoeachother)isdividedby

    the

    product

    of

    the

    individual

    probabilities

    ofH I

    and

    W 2 each.

    22

    WhencomputingMI,the

    window

    (or:

    span )within

    whichthewordformshavetoappearto

    be

    takenascooccurring can

    be

    defined

    bythe

    user.

    We

    maylookjust

    at

    adjacentwordforms

    or

    at

    word

    formswhichareatupto5,3,4,...

    word

    forms

    distance

    oneofeachother.

    Itis

    quite

    evident

    what

    the

    comparisonwill

    tell

    us :the

    MI

    valueishigh,

    whenmostof

    the

    occurrences

    ofagivenitemareinfact

    cooccurrences

    with

    the

    second

    item

    selected:

    if

    we

    compare

    MI

    for

    a

    set

    of

    pairs

    of, say,

    noun

    and

    adjective,

    by

    keeping

    the

    nounconstant,the

    adjectives

    which

    mosttypically

    cooccurwith

    our

    noun

    willhave the highestMIindex Typically means:if

    the

    adjectiveisusednexttoanounatall,

    it

    isverylikely

    that

    it

    isthenoun

    for

    which

    theMI

    is

    high.The

    famous

    clibataireendurcishould

    be

    easy

    to

    detect

    in

    texts thisway.

    MIis

    dependent

    on

    the

    frequency

    of

    wordforms.

    IfwecalculateMI

    for

    a

    list

    ofadjectives,collocating

    with

    anoun,wedo not see,

    however,

    in

    the

    MI

    values,

    whether

    theadjectives

    arefrequentornot.

    With

    MI,

    wehave

    toface

    the

    sparse

    data

    problem :

    given

    hat

    MI

    s

    similarity

    or

    ypicality)

    measure,

    it willyieldhigh

    typicality

    valuesincaseswhere

    an

    itemisrarebut

    cooccurs

    by

    chanceorby

    ule)with

    notherone,n

    each

    oftsare

    occurrencesin

    thetext.Rareitems

    maythus

    be

    rankedmuchhigher

    than

    one

    wouldintuitively likethem

    to

    be .

    4.1.1.2

    T-score

    Thisproblem

    of

    high

    ranking of

    rare

    forms

    can

    be

    remedied

    byuseof

    thet-test.Thet-test

    operates

    onpairs

    of

    words.

    It

    finds

    thoseadditional

    words

    which

    are

    more

    likely

    to

    cooccur

    with

    one

    of

    the

    two

    words

    from

    the

    pair

    hanwithhe

    other.

    Theesultsof

    the-test

    comes

    positive

    and

    negative

    values.

    Thehighestandheowestvaluesaresignificant:hey

    indicatestrong

    association

    with

    one

    or

    the

    other

    word.T-score

    is

    also

    not

    reliable

    for

    low

    frequencies.

    It

    tends

    to

    indicate

    thefrequentwordsthatcooccur

    with

    the

    target

    words,

    and

    it

    allows

    to

    separate

    near

    synonyms

    by

    showing

    frequentcombination

    partners.Churchetal.(1991)haveappliedittodiscriminate

    EN

    strong

    and

    powerful

    by

    findingout

    nouns

    which

    frequently

    cooccur

    withone

    ofthese

    adjectives.

  • 7/23/2019 27_Euralex_Ulrich Heid - On Ways Words Work Together - Topics in Lexical Combinatorics

    24/32

    Thewaywords

    work

    together/

    combinatorics

    49

    4.1.2

    Choices

    in

    applying

    statistical

    measures

    fo r

    identifying

    collocations

    Thestatisticalmeasuresdescribed

    above,

    as

    well

    as

    modificationsthereof,

    are

    often

    used

    to

    identify

    collocation

    candidates

    in

    texts;

    such

    use

    depends

    ofcourse

    on

    a

    number

    of

    assumptions

    and

    choices.

    he

    distance

    of

    itemscompared.

    MI

    and

    t-score

    can

    becalculated

    on

    immediately

    adjacent

    tems

    or

    on

    tems

    occurringwithin

    certain

    window orspan (cf.

    the

    terminologyofSinclair

    1991,

    Clear1994).

    heimpactof textstructure.Themeasurescanbecalculatedonbigrams

    within

    or

    across

    sentence

    boundaries.Usually,

    we

    wouldassumethat

    limitingourselves

    to

    occurrences

    within

    one

    sentence

    wouldlead

    to

    more

    relevant

    results

    than

    ignoring

    sentence

    boundaries.

    23

    he

    mpact

    f

    emmatization

    nd

    ategorial

    nformation.

    he

    discussion

    above,inSection2.2

    led

    to

    the

    assumptionwecandescribe

    collocations

    n

    erms

    fcategory

    ombinations

    n+v,

    +n,

    +adj,

    adj+adj,

    v+adv),

    in

    part

    evenof

    partial

    syntactic

    structures,such

    as

    verb+object ,

    verb+subject ,

    noun+attributive

    adjective ,etc.

    Mostof

    theworkdone sofarinusingMI

    and

    t-scorewas

    performed

    onEnglishmaterial.Theimpactofword fromvariation

    there

    isnotas

    important

    as

    with

    inflecting

    languages,like

    German

    or

    the

    Romance

    languages.

    Fortheseitseemsusefultohaveanoption,instatistical

    programs

    o

    calculate

    he

    measures

    or

    emmas

    ather

    han

    word

    forms.oreover,tanesefulo

    arry

    ut

    hetatistical

    computationonlyon,say,adjectivesappearingnexttoanoun,oron

    verbs

    and

    heir

    nominalobjects,

    tc.,

    24

    .e.

    oestrict

    search

    and

    statistical

    computation

    according

    to

    syntactic

    environments.

    There

    is

    arangeofchoices

    in

    theapplicationofthestatistical

    tools,

    andthe

    axisonwhichthesechoicescanbearrangedbasically

    has

    to

    dowith

    the

    amountof

    linguistic

    informationwhichiskepttrackof,eitherbypre-or

    postprocessing:

    The

    statistical

    measures

    may

    be

    applied

    to

    material

    selected

    according

    o

    certain

    inguistic

    criteria

    e.g.

    y

    use

    of

    concordances),

    or

    relevant

    material

    is

    selectedaccording

    to

    linguisticcriteriafrom

    the

    set

    of

    data

    extracted

    by

    statistical

    processing.

    Proposals

    for

    tool

    building

    in view

    of

    collocation

    extraction

    hus

    should

    be

    staged

    along

    with

    he

    amount

    of

    information

    available

    alongwith

    the

    corpus

    text:

    -

    rawtext,possiblywithsentenceboundaries,

    -

    textwith

    part-of-speech

    annotations,

    -

    lemmatized

    andmorphosyntactically

    annotated

    text,

    -