Lecture Notes in Articial Intelligence 10596 Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany

Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

Lecture Notes in Artificial Intelligence 10596

Subseries of Lecture Notes in Computer Science

LNAI Series Editors

Randy GoebelUniversity of Alberta, Edmonton, Canada

Yuzuru TanakaHokkaido University, Sapporo, Japan

Wolfgang WahlsterDFKI and Saarland University, Saarbrücken, Germany

LNAI Founding Series Editor

Joerg SiekmannDFKI and Saarland University, Saarbrücken, Germany

Page 2: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

More information about this series at http://www.springer.com/series/1244

Page 3: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

Ruslan Mitkov (Ed.)

Computational andCorpus-Based PhraseologySecond International Conference, Europhras 2017London, UK, November 13–14, 2017Proceedings


Page 4: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

EditorRuslan MitkovUniversity of WolverhamptonWolverhamptonUK

ISSN 0302-9743 ISSN 1611-3349 (electronic)Lecture Notes in Artificial IntelligenceISBN 978-3-319-69804-5 ISBN 978-3-319-69805-2 (eBook)https://doi.org/10.1007/978-3-319-69805-2

Library of Congress Control Number: 2017957565

LNCS Sublibrary: SL7 – Artificial Intelligence

© Springer International Publishing AG 2017The chapter “Frequency Consolidation Among Word N-Grams” is licensed under the terms of the CreativeCommons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). For furtherdetails see license information in the chapter.This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of thematerial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology nowknown or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book arebelieved to be true and accurate at the date of publication. Neither the publisher nor the authors or the editorsgive a warranty, express or implied, with respect to the material contained herein or for any errors oromissions that may have been made. The publisher remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer NatureThe registered company is Springer International Publishing AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Page 5: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science


Computational and Corpus-Based Phraseology:Recent Advances and Interdisciplinary Approaches

As the late and inspiring John Sinclair (1991, 2007) observed, knowledge of vocab-ulary and grammar is not sufficient for someone to express himself/herself idiomati-cally or naturally in a specific language. One has to have the knowledge and skill toproduce effective and naturally phrased utterances, which are often based on phrase-ological units (the idiom principle). This is in contrast to the traditional assumption oropen choice principle that lies at the heart of generative approaches to language. AsPawley and Syder (1983) stated more than three decades ago, the traditional approachcannot account for nativelike selection (idiomaticity) or fluency.

Language is indeed phraseological and phraseology is the discipline that studiesphraseological units (PUs) or their related concepts referred to (and regarded largelysynonymous) by scholars as multiword units, multiword expressions (MWEs), fixedexpressions, set expressions, phraseological units, formulaic language, phrasemes,idiomatic expressions, idioms, collocations, and/or polylexical expressions. PUs orMWEs, are ubiquitous and pervasive in language. They are a fundamental linguisticconcept that is central to a wide range of natural language processing and appliedlinguistics applications, including, but not limited to, phraseology, terminology,translation, language learning, teaching and assessment, and lexicography. Jackendoff(1977) observes that the number of MWEs in a speaker’s lexicon is of the same orderof magnitude as the number of single words (Jackendoff 1977). Biber et al.(1999) argue that they constitute up to 45% of spoken English and up to 21% ofacademic prose in English. Sag et al. (2002) state that they are overwhelmingly presentin terminology and 41% of the entries in WordNet 1.7 are reported to be MWEs.

PUs do not play a crucial role only in the computational treatment of natural lan-guages. Terms are often MWEs (and not single words), which makes them highlyrelevant to terminology. Translation and interpreting are two other fields wherephraseology plays an important role, as finding correct translation equivalents of PUs isa pivotal step in the translation process. Given their pervasive nature, PUs are absolutelycentral to the work carried out by lexicographers, who analyse and describe both singlewords and PUs. Last but not least, PUs are vital not only for language learning, teaching,and assessment, but also for more theoretical linguistic areas such as pragmatics, cog-nitive linguistics, and construction grammars. All the aforementioned areas are todayaided by (and often driven by) corpora, which makes PUs particularly relevant forcorpus linguists. Finally, PUs provide an excellent basis for inter- and multidisciplinary

Page 6: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

studies, fostering fruitful collaborations between researchers across different disciplines,which are, for the time being, unfortunately still largely unexplored.

This volume features a selection of papers written by the invited speakers as well asregular papers presented at the international conference “Computational and Corpus-Based Phraseology: Recent Advances and Interdisciplinary Approaches” (Europhras2017). The conference, which is organised jointly by the European Association ofPhraseology (Europhras) and the Research Institute in Information and LanguageProcessing of the University of Wolverhampton, and sponsored by Europhras, theSketch Engine, ELRA and the University of Wolverhampton, provides the perfectopportunity for researchers to present their work, fostering interaction and collaborationbetween scholars working in disciplines as diverse as natural language processing,translation, terminology, lexicography, languages learning, teaching and assessment,and cognitive science, to name only a few. I organised the volume thematically into thefollowing sections, which demonstrate the breath of the topics represented at Europhras2017: (1) Keynote and Invited Papers, (2) Phraseology in Translation and ContrastiveStudies, (3) Lexicography and Terminography, (4) Exploitation of Corpora inPhraseological Studies, (5) Development of Corpora for Phraseological Studies,(6) Phraseology and Language Learning, (7) Cognitive and Cultural Aspects ofPhraseology, (8) Theoretical and Descriptive Approaches to phraseology, and(9) Computational Approaches to Phraseology. In fact, the variety of topics at Euro-phras 2017 is even more remarkable if we take into account other conference pre-sentations that are not included in this volume – in addition to the regular papers, theconference also featured short papers and posters, which are published separately ase-proceedings with ISBN and DOI numbers assigned to every contribution.

Every submission to the conference was evaluated by three reviewers – i.e.,members of the Programme Committee consisting of 46 scholars from 23 differentcountries, or 12 additional reviewers from eight countries, who were recommended bythe Programme Committee. The conference contributions were authored by a total of91 scholars from 24 different countries. These figures attest to the truly internationaldimension of Europhras 2017.

I would like to thank everyone who made this truly interdisciplinary and interna-tional event possible. I would like to start by thanking all colleagues who submittedpapers to Europhras 2017 and travelled to London to attend the event. I am grateful toall members of the Programme Committee and the additional reviewers for carefullyexamining all submissions and providing substantial feedback on all papers, helpingthe authors of accepted papers to improve and polish the final versions of their papers.A special thanks goes to the invited speakers – both the keynote speakers of the mainconference (Ken Church, Gloria Corpas, Dmitrij Dobrovol’skij, Patrick Hanks, MilošJakubíček) and the invited speakers of the two accompanying workshops (CarlosRamish and Jean-Pierre Colson). Words of gratitude go to our sponsors – Europhras,the Sketch Engine, ELRA, and the University of Wolverhampton.

Last but not least, I would like to use this paragraph to acknowledge the membersof the Organising Committee, who worked very hard during the last 12 months andwhose dedication and efforts made the organisation of this event possible. I would liketo mention (in alphabetical order) the following colleagues whom I would like tohighlight for competently carrying out numerous organisational tasks and being ready

VI Preface

Page 7: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

to step in and support the organisation of the conference whenever needed. My bigthank you goes out to Amanda Bloore, Martina Cotella, Arianna Fabbri, April Harper,Sara Moze, Nikolai Nikolov, Ivelina Nikolova, Rocío Sánchez González, AndreaSilvestre Baquero, Shiva Taslimipoor, and Victoria Yaneva.

November 2017 Ruslan Mitkov

Preface VII

Page 8: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science


Europhras 2017 was jointly organised by the European Association for Phraseol-ogy EUROPHRAS, the University of Wolverhampton (Research Institute of Informa-tion and Language Processing), and the Association for Computational Linguistics,Bulgaria.

Programme Committee

Julio Bernal Caro and Cuervo Institute, ColombiaDouglas Biber Northern Arizona University, USANicoletta Calzolari Institute for Computational Linguistics, ItalyMaría Luisa Carrió-Pastor Polytechnic University of Valencia, SpainSheila Castilho Dublin City University, IrelandKenneth Church IBM Research, USAJean-Pierre Colson Université catholique de Louvain, BelgiumGloria Corpas University of Malaga, SpainFrantišek Čermák Charles University in Prague, Czech RepublicAnna Čermáková Charles University, Czech RepublicDimitrij Dobrovol’skij Russian Academy of Sciences, Russian Language

Institute, RussiaJesse Egbert Northern Arizona University, USAThierry Fontenelle Translation Centre for the Bodies of the European

Union, LuxembourgKleanthes K. Grohmann University of Cyprus, CyprusPatrick Hanks University of Wolverhampton, UKUlrich Heid University of Hildesheim, GermanyMiloš Jakubíček Lexical Computing and Masaryk University,

Czech RepublicKyo Kageura University of Tokyo, JapanValia Kordoni Humboldt University of Berlin, GermanySimon Krek University of Ljubljana, SloveniaPedro Mogorrón Huerta University of Alicante, SpainJohanna Monti Naples Eastern University, ItalySara Moze University of Wolverhampton, UKPreslav Nakov Qatar Computing Research Institute, HBKU, QatarMichael Oakes University of Wolverhampton, UKMarija Omazić University of Osijek, CroatiaPetya Osenova Sofia University, BulgariaMagali Paquot Université catholique de Louvain, BelgiumGiovanni Parodi Sweis Pontifical Catholic University of Valparaíso, ChileAlain Polguère University of Lorraine, France

Page 9: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

Carlos Ramisch Marseille Laboratory of Fundamental ComputerScience, France

Ute Römer Georgia State University, USAAgata Savary François Rabelais University, FranceBarbara Schlücker The University of Bonn, GermanyVioleta Seretan University of Geneva, SwitzerlandKathrin Steyer Institute of German Language, GermanyYukio Tono Tokyo University of Foreign Studies, JapanCornelia Tschichold Swansea University, UKBenjamin Tsou City University of Hong Kong, SAR ChinaAgnès Tutin University of Grenoble, FranceAline Villavicencio Federal University of Rio Grande do Sul, BrazilEveline Wandl-Vogt Austrian Academy of Sciences, AustriaTom Wasow Stanford University, USAEric Wehrli University of Geneva, SwitzerlandStefanie Wulff University of Florida, USAMichael Zock Marseille Laboratory of Fundamental Computer

Science, France

Additional Reviewers

Verginica Barbu Mititelu Romanian Academy, Research Institute for AI,Romania

Archna Bhatia Language Technologies Institute, CMU, USAIsmail El Maarouf Adarga Limited, Oxford University Press, UKVoula Giouli Institute for Language and Speech Processing,

Athena RIC, GreeceVáclava Kettnerová Charles University, Czech RepublicRogelio Nazar Pontifical Catholic University of Valparaíso, ChileIrene Renau Pontifical Catholic University of Valparaíso, ChileIoannis Saridakis University of Athens, GreeceInguna Skadina University of Latvia, LatviaShiva Taslimipoor University of Wolverhampton, UKVeronika Vincze Hungarian Academy of Sciences, HungaryVictoria Yaneva University of Wolverhampton, UK

Keynote Speakers Main Conference

Kenneth Church Johns Hopkins University, USAGloria Corpas University of Malaga, SpainDmitrij Dobrovol’skij Russian Academy of Sciences, Russian Language

Institute, RussiaPatrick Hanks University of Wolverhampton, UKMiloš Jakubíček Lexical Computing and Masaryk University,

Czech Republic

X Organisation

Page 10: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

Invited Speakers of Europhras 2017 Workshops

Jean-Pierre Colson Université catholique de Louvain, BelgiumCarlos Ramisch Marseille Laboratory of Fundamental Computer

Science, France

Organising Committee

Amanda Bloore University of Wolverhampton, UKMartina Cotella University of Genoa, ItalyArianna Fabbri University of Genoa, ItalyApril Harper University of Wolverhampton, UKSara Moze University of Wolverhampton, UKRocío Sánchez González University of Malaga, SpainAndrea Silvestre Baquero Polytechnic University of Valencia, SpainShiva Taslimipoor University of Wolverhampton, UKVictoria Yaneva University of Wolverhampton, UK

Conference Chair

Ruslan Mitkov University of Wolverhampton, UK

Organisation XI

Page 11: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science



Sketch Engine

University of Wolverhampton


XII Organisation

Page 12: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science


Keynote and Invited Talks

Corpus Methods in a Digitized World . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Kenneth Ward Church

The IdiomSearch Experiment: Extracting Phraseologyfrom a Probabilistic Network of Constructions . . . . . . . . . . . . . . . . . . . . . . 16

Jean-Pierre Colson

Collocational Constructions in Translated Spanish: What Corpora Reveal. . . . 29Gloria Corpas Pastor

Constructions in Parallel Corpora: A Quantitative Approach . . . . . . . . . . . . . 41Dmitrij Dobrovol’skij and Ludmila Pöppel

Mechanisms of Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Patrick Hanks

Putting the Horses Before the Cart: Identifying Multiword ExpressionsBefore Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Carlos Ramisch

Phraseology in Translation and Contrastive Studies

A Web of Analogies: Depictive and Reaction Object Constructionsin Modern English and French Fiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Susanne Dyka, Iva Novakova, and Dirk Siepmann

Brazilian Recipes in Portuguese and English: The Role of Phraseologyfor Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Rozane Rodrigues Rebechi and Márcia Moura da Silva

Pragmatic Parameters for Contrastive Analyses of the Equivalenceof Eventive Specialized Phraseological Units . . . . . . . . . . . . . . . . . . . . . . . 115

Óscar Javier Salamanca Martínez and Mercedes Suárez de la Torre

Phraseological Units and Subtitling in Television Series: A Case StudyThe Big Bang Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Esther Sedano Ruiz

Page 13: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

Lexicography and Terminography

A Semantic Approach to the Inclusion of Complex Nominalsin English Terminographic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Melania Cabezas-García and Pamela Faber

Eye of a Needle in a Haystack: Multiword Expressions in Czech:Typology and Lexicon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

Milena Hnátková, Tomáš Jelínek, Marie Kopřivová, Vladimír Petkevič,Alexandr Rosen, Hana Skoumalová, and Pavel Vondřička

Predicate-Argument Analysis to Build a Phraseology Moduleand to Increase Conceptual Relation Expressiveness . . . . . . . . . . . . . . . . . . 176

Arianne Reimerink and Pilar León-Araúz

Exploitation of Corpora in Phraseological Studies

Phrasal Settings in Which the Definite and Indefinite Articles Appearto Be Interchangeable in English: An Exploratory Study . . . . . . . . . . . . . . . 193

Stephen James Coffey

Deverbal Nouns in Czech Light Verb Constructions . . . . . . . . . . . . . . . . . . 205Václava Kettnerová, Veronika Kolářová, and Anna Vernerová

Contribution Towards a Corpus-Based Phraseology Minimum . . . . . . . . . . . 220Marie Kopřivová

Estimating Lexical Availability of European Portuguese Proverbs . . . . . . . . . 232Sónia Reis and Jorge Baptista

Development of Corpora for Phraseological Studies

Verbal Multiword Expressions in Slovene . . . . . . . . . . . . . . . . . . . . . . . . . 247Polona Gantar, Simon Krek, and Taja Kuzman

Using Parallel Corpora to Study the Translation of Legal System-BoundTerms: The Case of Names of English and Spanish Courts . . . . . . . . . . . . . 260

Francisco J. Vigier and María del Mar Sánchez

Phraseology and Language Learning

Towards Better Representation of Phraseological Meaningin Dictionaries for Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Elena Berthemet

Designing a Learner’s Dictionary with Phraseological Disambiguators . . . . . . 290P.V. DiMuccio-Failla and Laura Giacomini

XIV Contents

Page 14: Lecture Notes in Artificial Intelligence 10596 - Springer978-3-319-69805-2/1.pdf · Lecture Notes in Artificial Intelligence 10596 Subseries of Lecture Notes in Computer Science

Individual Differences in L2 Processing of Multi-word Phrases: Effectsof Working Memory and Personality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

Elma Kerz and Daniel Wiechmann

Cognitive and Cultural Aspects of Phraseology

Verbal Phraseology: An Analysis of Cognitive Verbs in Linguistics,Engineering and Medicine Academic Papers. . . . . . . . . . . . . . . . . . . . . . . . 325

María Luisa Carrió-Pastor

Cultural Models and Motivation of Idioms with the Component ‘Heart’in Croatian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

Zvonimir Novoselec

Variation of Adjectival Slots in kao (‘as’) Similes in Croatian:A Cognitive Linguistic Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

Jelena Parizoska and Ivana Filipović Petrović

Cognitive Processing of Multiword Expressions in Nativeand Non-native Speakers of English: Evidence from Gaze Data . . . . . . . . . . 363

Victoria Yaneva, Shiva Taslimipoor, Omid Rohanian, and Le An Ha

Theoretical and Descriptive Approaches to Phraseology

Verb-Object Compounds and Idioms in Chinese . . . . . . . . . . . . . . . . . . . . . 383Adams Bodomo, So-sum Yu, and Dewei Che

Korean Morphological Collocations: Theoretical and DescriptiveImplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Mi Hyun Kim and Alain Polguère

Computational Approaches to Phraseology

Towards Comprehensive Computational Representations of ArabicMultiword Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

Ayman Alghamdi and Eric Atwell

Frequency Consolidation Among Word N-Grams: A Practical Procedure . . . . 432Andreas Buerki

Combining Dependency Parsing and a Lexical Network Basedon Lexical Functions for the Identification of Collocations . . . . . . . . . . . . . . 447

Alexsandro Fonseca, Fatiha Sadat, and François Lareau

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463

Contents XV