Upload
lyxuyen
View
229
Download
0
Embed Size (px)
Citation preview
Lecture Notes in Artificial Intelligence 10596
Subseries of Lecture Notes in Computer Science
LNAI Series Editors
Randy GoebelUniversity of Alberta, Edmonton, Canada
Yuzuru TanakaHokkaido University, Sapporo, Japan
Wolfgang WahlsterDFKI and Saarland University, Saarbrücken, Germany
LNAI Founding Series Editor
Joerg SiekmannDFKI and Saarland University, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/1244
Ruslan Mitkov (Ed.)
Computational andCorpus-Based PhraseologySecond International Conference, Europhras 2017London, UK, November 13–14, 2017Proceedings
123
EditorRuslan MitkovUniversity of WolverhamptonWolverhamptonUK
ISSN 0302-9743 ISSN 1611-3349 (electronic)Lecture Notes in Artificial IntelligenceISBN 978-3-319-69804-5 ISBN 978-3-319-69805-2 (eBook)https://doi.org/10.1007/978-3-319-69805-2
Library of Congress Control Number: 2017957565
LNCS Sublibrary: SL7 – Artificial Intelligence
© Springer International Publishing AG 2017The chapter “Frequency Consolidation Among Word N-Grams” is licensed under the terms of the CreativeCommons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). For furtherdetails see license information in the chapter.This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of thematerial is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting, reproduction on microfilms or in any other physical way, and transmission or informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology nowknown or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoes not imply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in this book arebelieved to be true and accurate at the date of publication. Neither the publisher nor the authors or the editorsgive a warranty, express or implied, with respect to the material contained herein or for any errors oromissions that may have been made. The publisher remains neutral with regard to jurisdictional claims inpublished maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by Springer NatureThe registered company is Springer International Publishing AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Computational and Corpus-Based Phraseology:Recent Advances and Interdisciplinary Approaches
As the late and inspiring John Sinclair (1991, 2007) observed, knowledge of vocab-ulary and grammar is not sufficient for someone to express himself/herself idiomati-cally or naturally in a specific language. One has to have the knowledge and skill toproduce effective and naturally phrased utterances, which are often based on phrase-ological units (the idiom principle). This is in contrast to the traditional assumption oropen choice principle that lies at the heart of generative approaches to language. AsPawley and Syder (1983) stated more than three decades ago, the traditional approachcannot account for nativelike selection (idiomaticity) or fluency.
Language is indeed phraseological and phraseology is the discipline that studiesphraseological units (PUs) or their related concepts referred to (and regarded largelysynonymous) by scholars as multiword units, multiword expressions (MWEs), fixedexpressions, set expressions, phraseological units, formulaic language, phrasemes,idiomatic expressions, idioms, collocations, and/or polylexical expressions. PUs orMWEs, are ubiquitous and pervasive in language. They are a fundamental linguisticconcept that is central to a wide range of natural language processing and appliedlinguistics applications, including, but not limited to, phraseology, terminology,translation, language learning, teaching and assessment, and lexicography. Jackendoff(1977) observes that the number of MWEs in a speaker’s lexicon is of the same orderof magnitude as the number of single words (Jackendoff 1977). Biber et al.(1999) argue that they constitute up to 45% of spoken English and up to 21% ofacademic prose in English. Sag et al. (2002) state that they are overwhelmingly presentin terminology and 41% of the entries in WordNet 1.7 are reported to be MWEs.
PUs do not play a crucial role only in the computational treatment of natural lan-guages. Terms are often MWEs (and not single words), which makes them highlyrelevant to terminology. Translation and interpreting are two other fields wherephraseology plays an important role, as finding correct translation equivalents of PUs isa pivotal step in the translation process. Given their pervasive nature, PUs are absolutelycentral to the work carried out by lexicographers, who analyse and describe both singlewords and PUs. Last but not least, PUs are vital not only for language learning, teaching,and assessment, but also for more theoretical linguistic areas such as pragmatics, cog-nitive linguistics, and construction grammars. All the aforementioned areas are todayaided by (and often driven by) corpora, which makes PUs particularly relevant forcorpus linguists. Finally, PUs provide an excellent basis for inter- and multidisciplinary
studies, fostering fruitful collaborations between researchers across different disciplines,which are, for the time being, unfortunately still largely unexplored.
This volume features a selection of papers written by the invited speakers as well asregular papers presented at the international conference “Computational and Corpus-Based Phraseology: Recent Advances and Interdisciplinary Approaches” (Europhras2017). The conference, which is organised jointly by the European Association ofPhraseology (Europhras) and the Research Institute in Information and LanguageProcessing of the University of Wolverhampton, and sponsored by Europhras, theSketch Engine, ELRA and the University of Wolverhampton, provides the perfectopportunity for researchers to present their work, fostering interaction and collaborationbetween scholars working in disciplines as diverse as natural language processing,translation, terminology, lexicography, languages learning, teaching and assessment,and cognitive science, to name only a few. I organised the volume thematically into thefollowing sections, which demonstrate the breath of the topics represented at Europhras2017: (1) Keynote and Invited Papers, (2) Phraseology in Translation and ContrastiveStudies, (3) Lexicography and Terminography, (4) Exploitation of Corpora inPhraseological Studies, (5) Development of Corpora for Phraseological Studies,(6) Phraseology and Language Learning, (7) Cognitive and Cultural Aspects ofPhraseology, (8) Theoretical and Descriptive Approaches to phraseology, and(9) Computational Approaches to Phraseology. In fact, the variety of topics at Euro-phras 2017 is even more remarkable if we take into account other conference pre-sentations that are not included in this volume – in addition to the regular papers, theconference also featured short papers and posters, which are published separately ase-proceedings with ISBN and DOI numbers assigned to every contribution.
Every submission to the conference was evaluated by three reviewers – i.e.,members of the Programme Committee consisting of 46 scholars from 23 differentcountries, or 12 additional reviewers from eight countries, who were recommended bythe Programme Committee. The conference contributions were authored by a total of91 scholars from 24 different countries. These figures attest to the truly internationaldimension of Europhras 2017.
I would like to thank everyone who made this truly interdisciplinary and interna-tional event possible. I would like to start by thanking all colleagues who submittedpapers to Europhras 2017 and travelled to London to attend the event. I am grateful toall members of the Programme Committee and the additional reviewers for carefullyexamining all submissions and providing substantial feedback on all papers, helpingthe authors of accepted papers to improve and polish the final versions of their papers.A special thanks goes to the invited speakers – both the keynote speakers of the mainconference (Ken Church, Gloria Corpas, Dmitrij Dobrovol’skij, Patrick Hanks, MilošJakubíček) and the invited speakers of the two accompanying workshops (CarlosRamish and Jean-Pierre Colson). Words of gratitude go to our sponsors – Europhras,the Sketch Engine, ELRA, and the University of Wolverhampton.
Last but not least, I would like to use this paragraph to acknowledge the membersof the Organising Committee, who worked very hard during the last 12 months andwhose dedication and efforts made the organisation of this event possible. I would liketo mention (in alphabetical order) the following colleagues whom I would like tohighlight for competently carrying out numerous organisational tasks and being ready
VI Preface
to step in and support the organisation of the conference whenever needed. My bigthank you goes out to Amanda Bloore, Martina Cotella, Arianna Fabbri, April Harper,Sara Moze, Nikolai Nikolov, Ivelina Nikolova, Rocío Sánchez González, AndreaSilvestre Baquero, Shiva Taslimipoor, and Victoria Yaneva.
November 2017 Ruslan Mitkov
Preface VII
Organisation
Europhras 2017 was jointly organised by the European Association for Phraseol-ogy EUROPHRAS, the University of Wolverhampton (Research Institute of Informa-tion and Language Processing), and the Association for Computational Linguistics,Bulgaria.
Programme Committee
Julio Bernal Caro and Cuervo Institute, ColombiaDouglas Biber Northern Arizona University, USANicoletta Calzolari Institute for Computational Linguistics, ItalyMaría Luisa Carrió-Pastor Polytechnic University of Valencia, SpainSheila Castilho Dublin City University, IrelandKenneth Church IBM Research, USAJean-Pierre Colson Université catholique de Louvain, BelgiumGloria Corpas University of Malaga, SpainFrantišek Čermák Charles University in Prague, Czech RepublicAnna Čermáková Charles University, Czech RepublicDimitrij Dobrovol’skij Russian Academy of Sciences, Russian Language
Institute, RussiaJesse Egbert Northern Arizona University, USAThierry Fontenelle Translation Centre for the Bodies of the European
Union, LuxembourgKleanthes K. Grohmann University of Cyprus, CyprusPatrick Hanks University of Wolverhampton, UKUlrich Heid University of Hildesheim, GermanyMiloš Jakubíček Lexical Computing and Masaryk University,
Czech RepublicKyo Kageura University of Tokyo, JapanValia Kordoni Humboldt University of Berlin, GermanySimon Krek University of Ljubljana, SloveniaPedro Mogorrón Huerta University of Alicante, SpainJohanna Monti Naples Eastern University, ItalySara Moze University of Wolverhampton, UKPreslav Nakov Qatar Computing Research Institute, HBKU, QatarMichael Oakes University of Wolverhampton, UKMarija Omazić University of Osijek, CroatiaPetya Osenova Sofia University, BulgariaMagali Paquot Université catholique de Louvain, BelgiumGiovanni Parodi Sweis Pontifical Catholic University of Valparaíso, ChileAlain Polguère University of Lorraine, France
Carlos Ramisch Marseille Laboratory of Fundamental ComputerScience, France
Ute Römer Georgia State University, USAAgata Savary François Rabelais University, FranceBarbara Schlücker The University of Bonn, GermanyVioleta Seretan University of Geneva, SwitzerlandKathrin Steyer Institute of German Language, GermanyYukio Tono Tokyo University of Foreign Studies, JapanCornelia Tschichold Swansea University, UKBenjamin Tsou City University of Hong Kong, SAR ChinaAgnès Tutin University of Grenoble, FranceAline Villavicencio Federal University of Rio Grande do Sul, BrazilEveline Wandl-Vogt Austrian Academy of Sciences, AustriaTom Wasow Stanford University, USAEric Wehrli University of Geneva, SwitzerlandStefanie Wulff University of Florida, USAMichael Zock Marseille Laboratory of Fundamental Computer
Science, France
Additional Reviewers
Verginica Barbu Mititelu Romanian Academy, Research Institute for AI,Romania
Archna Bhatia Language Technologies Institute, CMU, USAIsmail El Maarouf Adarga Limited, Oxford University Press, UKVoula Giouli Institute for Language and Speech Processing,
Athena RIC, GreeceVáclava Kettnerová Charles University, Czech RepublicRogelio Nazar Pontifical Catholic University of Valparaíso, ChileIrene Renau Pontifical Catholic University of Valparaíso, ChileIoannis Saridakis University of Athens, GreeceInguna Skadina University of Latvia, LatviaShiva Taslimipoor University of Wolverhampton, UKVeronika Vincze Hungarian Academy of Sciences, HungaryVictoria Yaneva University of Wolverhampton, UK
Keynote Speakers Main Conference
Kenneth Church Johns Hopkins University, USAGloria Corpas University of Malaga, SpainDmitrij Dobrovol’skij Russian Academy of Sciences, Russian Language
Institute, RussiaPatrick Hanks University of Wolverhampton, UKMiloš Jakubíček Lexical Computing and Masaryk University,
Czech Republic
X Organisation
Invited Speakers of Europhras 2017 Workshops
Jean-Pierre Colson Université catholique de Louvain, BelgiumCarlos Ramisch Marseille Laboratory of Fundamental Computer
Science, France
Organising Committee
Amanda Bloore University of Wolverhampton, UKMartina Cotella University of Genoa, ItalyArianna Fabbri University of Genoa, ItalyApril Harper University of Wolverhampton, UKSara Moze University of Wolverhampton, UKRocío Sánchez González University of Malaga, SpainAndrea Silvestre Baquero Polytechnic University of Valencia, SpainShiva Taslimipoor University of Wolverhampton, UKVictoria Yaneva University of Wolverhampton, UK
Conference Chair
Ruslan Mitkov University of Wolverhampton, UK
Organisation XI
Sponsors
EUROPHRAS
Sketch Engine
University of Wolverhampton
ELRA
XII Organisation
Contents
Keynote and Invited Talks
Corpus Methods in a Digitized World . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Kenneth Ward Church
The IdiomSearch Experiment: Extracting Phraseologyfrom a Probabilistic Network of Constructions . . . . . . . . . . . . . . . . . . . . . . 16
Jean-Pierre Colson
Collocational Constructions in Translated Spanish: What Corpora Reveal. . . . 29Gloria Corpas Pastor
Constructions in Parallel Corpora: A Quantitative Approach . . . . . . . . . . . . . 41Dmitrij Dobrovol’skij and Ludmila Pöppel
Mechanisms of Meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Patrick Hanks
Putting the Horses Before the Cart: Identifying Multiword ExpressionsBefore Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Carlos Ramisch
Phraseology in Translation and Contrastive Studies
A Web of Analogies: Depictive and Reaction Object Constructionsin Modern English and French Fiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Susanne Dyka, Iva Novakova, and Dirk Siepmann
Brazilian Recipes in Portuguese and English: The Role of Phraseologyfor Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Rozane Rodrigues Rebechi and Márcia Moura da Silva
Pragmatic Parameters for Contrastive Analyses of the Equivalenceof Eventive Specialized Phraseological Units . . . . . . . . . . . . . . . . . . . . . . . 115
Óscar Javier Salamanca Martínez and Mercedes Suárez de la Torre
Phraseological Units and Subtitling in Television Series: A Case StudyThe Big Bang Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Esther Sedano Ruiz
Lexicography and Terminography
A Semantic Approach to the Inclusion of Complex Nominalsin English Terminographic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Melania Cabezas-García and Pamela Faber
Eye of a Needle in a Haystack: Multiword Expressions in Czech:Typology and Lexicon. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Milena Hnátková, Tomáš Jelínek, Marie Kopřivová, Vladimír Petkevič,Alexandr Rosen, Hana Skoumalová, and Pavel Vondřička
Predicate-Argument Analysis to Build a Phraseology Moduleand to Increase Conceptual Relation Expressiveness . . . . . . . . . . . . . . . . . . 176
Arianne Reimerink and Pilar León-Araúz
Exploitation of Corpora in Phraseological Studies
Phrasal Settings in Which the Definite and Indefinite Articles Appearto Be Interchangeable in English: An Exploratory Study . . . . . . . . . . . . . . . 193
Stephen James Coffey
Deverbal Nouns in Czech Light Verb Constructions . . . . . . . . . . . . . . . . . . 205Václava Kettnerová, Veronika Kolářová, and Anna Vernerová
Contribution Towards a Corpus-Based Phraseology Minimum . . . . . . . . . . . 220Marie Kopřivová
Estimating Lexical Availability of European Portuguese Proverbs . . . . . . . . . 232Sónia Reis and Jorge Baptista
Development of Corpora for Phraseological Studies
Verbal Multiword Expressions in Slovene . . . . . . . . . . . . . . . . . . . . . . . . . 247Polona Gantar, Simon Krek, and Taja Kuzman
Using Parallel Corpora to Study the Translation of Legal System-BoundTerms: The Case of Names of English and Spanish Courts . . . . . . . . . . . . . 260
Francisco J. Vigier and María del Mar Sánchez
Phraseology and Language Learning
Towards Better Representation of Phraseological Meaningin Dictionaries for Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Elena Berthemet
Designing a Learner’s Dictionary with Phraseological Disambiguators . . . . . . 290P.V. DiMuccio-Failla and Laura Giacomini
XIV Contents
Individual Differences in L2 Processing of Multi-word Phrases: Effectsof Working Memory and Personality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
Elma Kerz and Daniel Wiechmann
Cognitive and Cultural Aspects of Phraseology
Verbal Phraseology: An Analysis of Cognitive Verbs in Linguistics,Engineering and Medicine Academic Papers. . . . . . . . . . . . . . . . . . . . . . . . 325
María Luisa Carrió-Pastor
Cultural Models and Motivation of Idioms with the Component ‘Heart’in Croatian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Zvonimir Novoselec
Variation of Adjectival Slots in kao (‘as’) Similes in Croatian:A Cognitive Linguistic Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Jelena Parizoska and Ivana Filipović Petrović
Cognitive Processing of Multiword Expressions in Nativeand Non-native Speakers of English: Evidence from Gaze Data . . . . . . . . . . 363
Victoria Yaneva, Shiva Taslimipoor, Omid Rohanian, and Le An Ha
Theoretical and Descriptive Approaches to Phraseology
Verb-Object Compounds and Idioms in Chinese . . . . . . . . . . . . . . . . . . . . . 383Adams Bodomo, So-sum Yu, and Dewei Che
Korean Morphological Collocations: Theoretical and DescriptiveImplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Mi Hyun Kim and Alain Polguère
Computational Approaches to Phraseology
Towards Comprehensive Computational Representations of ArabicMultiword Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Ayman Alghamdi and Eric Atwell
Frequency Consolidation Among Word N-Grams: A Practical Procedure . . . . 432Andreas Buerki
Combining Dependency Parsing and a Lexical Network Basedon Lexical Functions for the Identification of Collocations . . . . . . . . . . . . . . 447
Alexsandro Fonseca, Fatiha Sadat, and François Lareau
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
Contents XV