Poljski Rad - Toposlaw

Embed Size (px)

Citation preview

  • 8/10/2019 Poljski Rad - Toposlaw

    1/14

    Intelligent Information Systems 9999ISBN 666-666-666, pages 114

    Constructing an Electronic Dictionary of PolishUrban Proper Names

    AbstractWe present an ongoing project of constructing an electronic dictionary of Polishurban proper names. It allows us to describe variants of a proper name and linkthem with a city object. Moreover, we classify proper names according to theirformer, common or spoken usage. We present tools which we use for the creationof the dictionary, and an analysis of Warsaw toponyms gathered for description.

    Keywords: Warsaw toponyms, electronic dictionary, Polish, inection and vari-ability of proper names

    1 Introduction

    In the paper, we present an ongoing project1 of creating a computer dictionary of Polish urban proper names. Our goal is to create a dictionary that can be usedfor the recognition and generation of proper names in written texts as well as indialogues concerning a large agglomeration.

    The idea of developing such a dictionary is one of the outcomes of the LUNAEU 6th Framework Programme in which Institute of Computer Science PAS isinvolved. The main goal of this project is to create a robust and effective spokenlanguage understanding module, which can be used in developing automatic tele-com services. For Polish, the city transportation domain was chosen, and toolsfor the automatic recognition of the meaning of utterances concerning this domainare under development. One of the very important milestones of the project wasthe creation of an annotated corpus of dialogues concerning public transportation(Mykowiecka et al. , 2007). The corpus was collected at the Warsaw TransportAuthority Information Center where operators provide information on tram andbus connections, schedules, routes, fares, reductions etc. During the dialoguesannotation the problem of lemmatization (assigning a canonical form) of Polishproper names arose. This problem is not straightforward because of rich inectionand relaxed word order in Polish. Moreover, rather little work on this subject hasbeen undertaken from the computational point of view, see Piskorski et al. (2007),Piskorski and Sydow (2007). In Marciniak et al. (2008), the problem of lemmatiza-tion of urban proper names is described, and a practical but not universal solutionis presented.

    The long-term aim of LUNA project is improving dialogue systems. The rstapproach to a dialogue system concerning Warsaw transportation, focused mainly

    1 The project is partially nanced by the MNiSW decision number 567/6. PR UE/2008/7.

  • 8/10/2019 Poljski Rad - Toposlaw

    2/14

    2

    on speech recognition, is described in Marasek et al. (2009). The thoroughly de-signed dictionary of urban proper names is essential for developing such a dialoguesystem. The dictionary should enable:

    Recognition of different grammatical forms and variants of the same propername. For example, the full official name of the street ulica Bitwy Warsza-wskiej 1920 r. is abbreviated in practice into ulica Bitwy Warszawskiej . Forspeech recognition and generation the year should be represented by wordstysic dziewiset dwudziestego roku nineteen twenty. Our dictionary oughtto include all possibilities.

    Representation of former and common names of the citys objects and con-nection of different names with the same object, e.g., for the street ulica ks.Jerzego Popieuszki the previous name was ulica Stoeczna .

    Lemmatization of a proper name assigning its base form. Names are lemma-tized in different ways, for example, for a street name in locative case ulicy LocMarszakowskiej Loc the lemma is ulica Nom Marszakowska Nom , while for thestreet ulicy Loc Calineczki Gen it is ulica Nom Calineczki Gen .

    Generation of a desired inected form for a given base form function oppositeto lemmatization.

    If several objects have the same name, we should be able to distinguish them.For example, a street ulica Sportowa exists in many towns of the Warsawagglomeration. We want to represent all of them, so we have to introduceseveral objects with the same name assigned to them.

    To account for all these phenomena we have devised the following structurefor the dictionary. Each lexical entry describes one name with all its grammaticalforms and variants. Lexical entries are linked with city objects, which representthe actual or historical objects referenced by the names. Several names (former,common, etc.) may be linked with a single city object, and many objects with asingle name.

    The organization of the paper is as follows. Section 2 provides the descriptionof linguistic and pragmatic features especially concerning variants of the samename. For realization of the project we created our own tool Toposaw which co-operates with Morfeusz , a morphological analyser and generator for Polish words,and Multiex , a cross-language morpho-syntactic generator of multi-word units(section 3). Finally, in section 4 we present the data collected within the project.

    2 Pragmatic and Stylistic Phenomena

    In this section we present pragmatic and stylistic phenomena which were takeninto account while designing the dictionary. Some linguistic observations such asthe description of abbreviations, acronyms and initialisms, numerals and namevariations based on Warsaw toponyms have already been presented by Savaryet al. (2009). Here, we focus on these features of proper names that are reectedin the lexicon as name variants and their stylistic characteristics.The description of a multi-word lexical unit in the dictionary consists of:

  • 8/10/2019 Poljski Rad - Toposlaw

    3/14

    Polish Urban Proper Names Dictionary 3

    an entry name, an inection description of name components, a graph representation of all possible name variants, pragmatic characteristics of the name variants (official, neutral, neutral

    spoken), stylistic characteristics of the name (former, common, and marked), a link between the name and a concept described by a hierarchy (e.g. area,

    building, etc.).

    We limit the range of the data in our lexicon to proper names of the transporta-tion system and public places in Warsaw. Thus, we consider the following typesof places: Warsaw administrative units, traffic routes, stopping places, parks andgardens, cemeteries, public institutions and facilities, palaces, monuments, com-mercial centres, and business establishments, see examples (1-3).

    (1) Nowy Dwr Mazowiecki (town in Warsaw agglomeration)(2) Pomnik Stefana Starzyskiego (monument)(3) Urzd Miasta Stoecznego Warszawy (office)

    Signicant part of toponyms contain other names, e.g. person names, seeexample (2). To describe a formal structure of names, one should take into accountthose embedded names, too. In the dictionary we have decided to include namesof persons as separate lexical units, see examples (4-6).

    (4) Stefan Starzyski(5) Jan Rodowicz Anoda(6) Krl Jan I Olbracht

    2.1 Proper Names Variants

    In the dictionary we represent several variants of names. We mark three of themwhich are pragmatically important for potential applications.

    the official name variant represents a variant used in official lists of citynames,

    a neutral name variant is a shortened variant used in written texts, a neutral spoken name variant is a shortened variant used in spoken language.

    Usually, proper name dictionaries (Grzenia, 2003; Rzetelska-Feleszko, 2005;Cielikowa, 2008) and official lists of public places contain names in their full formthat is not commonly used, see examples (7)-(10). We have decided to mark themas official names in our lexicon. We describe names in an extensive way bygathering together all their possible pragmatic variants. Among them, we chooseone variant, short enough for efficient communication and object identication,which is marked as neutral and neutral spoken.

  • 8/10/2019 Poljski Rad - Toposlaw

    4/14

  • 8/10/2019 Poljski Rad - Toposlaw

    5/14

    Polish Urban Proper Names Dictionary 5

    (15) Aleja Solidarnoci (avenue),former name: Aleja Karola wierczewskiego

    (16) Aleja Jana Rodowicza Anody (avenue),a former part of the street: ulica Jana Rosoa

    Warsaw citizens sometimes invent their own popular names for places or build-ings. If these names are commonly known and recognized by city residents werepresent them in the dictionary as different names of the same object. We distin-guish two types of such proper names: common and marked names. Commonnames usually bring some associations of place shape, colour or physical feature, asin example (17). Sometimes, residents give an ironic name to the place or object,see example (18), which expresses their disapproval or funny association connectedto it.

    (17) official name: Pomnik Bohaterw Warszawy 1939-45 (Monument of War-saw Heroes),common name: Warszawska Nike (Warsaw Nike)

    (18) official name: Pomnik Jzefa Pisudskiego (Monument of Jzef Pisudski),marked name: Dziadek Parkingowy (Parking Grandpa)

    2.3 Hierarchy of Concepts

    We decided to include some semantic information in the dictionary. For this pur-pose we dened a very simple hierarchy of concepts. All city objects are connectedto exactly one concept represented by leafs of a hierarchy presented below. Thehierarchy is not crucial for conducting dialogues in the city transportation callcenter. Thus, we made it possible to introduce a hierarchy of concepts into ourdictionary, but we have not developed a complex one.

    PLACE AREA:

    ADMINISTRATIVE AREA: concept representing administrative divi-

    sion of an agglomeration. It represents: towns e.g. Nowy Dwr Ma-zowiecki , city districts and parts of districts. It is possible that one areaincludes others such as: city district Mokotw is divided into Grny Mokotw and Dolny Mokotw and the last one contains Czerniakw all these names are connected to this area concept.

    PUBLIC AREA: areas that can be visited by citizens like: parks, ceme-teries, nature reserves.

    CLOSED AREA: areas that are not accessible to all citizens like: mil-itary training areas, bus depots.

    COMMUNICATION POINT STOP: stops of all means of transport, including different types of buses

    (municipal, private, long-distance), trams, metro.

    RAILWAY STATION: e.g.: Dworzec Centralny AIRPORT: e.g.: Port Lotniczy im. Fryderyka Chopina w Warszawie

  • 8/10/2019 Poljski Rad - Toposlaw

    6/14

    6

    ROAD: STREET: includes avenues, roads and highways. It can also refer to a

    named route like Wisostrada which consists of about a dozen streets. SQUARE: includes also roundabouts. BRIDGE, VIADUCT, TUNNEL.

    FACILITY: buildings, their parts and groups of buildings such as: hospi-tals, universities, theaters, museums, shopping centers, stadiums, industrialplants, etc. For example, this concept refers to Paac Kultury i Nauki anexhibition centre and office building, in which a theater Teatr Dramatyczny is located and both names are connected to the facility concept.

    HYDRONYM: it applies to all bodies of water like rivers, brooks, lakes,ponds, e.g.: Rzeka Wisa, Jeziorko Czerniakowskie

    MONUMENT: Pomnik Adama Mickiewicza PERSONAGE: refers to people like Adam Mickiewicz , ctitious characters a

    literary hero Micha Woodyjowski , and religious characters e.g.: Jan Chrzciciel (John the Baptist).

    3 ToolsThe dictionary is built using a dedicated graphical interface Toposaw . The for-malism used in the description of compounds is implemented in Multiex , whichin turn utilises Morfeusz for inecting components of names.

    3.1 Morfeusz

    Morfeusz SGJP is a morphological analyser and generator for single words basedon the data of the Grammatical Dictionary of Polish (Saloni et al. , 2007). Theinterface and the tagset of Morfeusz SGJP is compatible with the previous versioncalled Morfeusz SIaT (Woliski, 2006), but features a much improved dictionary(ca. 245,000 lexemes, ca. 4,000,000 different textual words).

    Morfeusz SGJP has two modules. The rst one, given any textual word, pro-vides all possible interpretations of this word as a form of a Polish lexeme. Thesecond module generates all possible forms of a lexeme when given the lemma andthe part of speech.

    The IPI PAN tagset used by Morfeusz is based on a set of morphological,morphosyntactic and syntactic criteria (cf. Przepirkowski and Woliski, 2003;Woliski, 2003). It operates with more detailed grammatical classes than tradi-tional parts of speech (POS). Some of these classes, however, correspond directlyto the traditional POS, e.g., noun, adjective, adverb, preposition, conjunction.Grammatical categories assumed in the tagset include well established ones suchas number, case, person, degree, aspect, gender, as well as more restricted cate-gories rst introduced in the work of Jan Tokarski and Zygmunt Saloni (Tokarski,2002).

    The following example presents the analysis of the phrase Jana Rodowicza Anody :

  • 8/10/2019 Poljski Rad - Toposlaw

    7/14

    Polish Urban Proper Names Dictionary 7

    (19) 1 Jana Jan subst:sg:gen.acc:m12 sp3 Rodowicza Rodowicz subst:sg:gen.acc:m14 sp5 interp6 Anody anoda subst:sg:gen:f|subst:pl:nom.acc.voc:f 7 interp

    The phrase is segmented as required by Multiex , in particular, segments 2 and4 are spaces. Lemmas Jan (a rst name) and Rodowicz (a last name) are capi-talised, anoda anode, being a common noun, is put in lowercase. The tags in thisexample consist of the following components: subst noun, sg singular, pl plural, nom nominative, gen genitive, acc accusative, voc vocative,m1 masculine personal, f feminine, interp punctuation, sp blank. Avertical bar denotes alternative tags, a dot denotes alternative values of a givengrammatical category.

    3.2 Multiex

    Multiex (Savary (2005a), Savary (2005b)) is a cross-language morpho-syntacticgenerator of multi-word units (MWUs). It allows us to exhaustively and preciselydescribe the inection paradigm and variation of a MWU via a two-tier approach.First, an underlying morphological module such as Morfeusz allows us to tokenizethe MWU lemma, to annotate its components, and to generate inected formsof simple words on demand. Then, each inected multi-word form is seen asa particular combination of the inected forms of its components. For instance,Fig. 1 shows, in its upper part, the segmentation of the person name from example(5) into seven tokens, four of which are spaces and quotes. Each token which canbe inected, is annotated by a Morfeusz tag (cf section 3.1). The inection of thewhole compound is described by a graph. Each path in the graph describes oneor more inected forms and variants.

    Morphological descriptions appearing inside the boxes refer to single consti-

    tuents. For instance, in the graph in Fig. 1 each non empty box refers to oneof the seven previously annotated constituents ($1 through $7). If a constituentnumber appears in a box with no extra information then the token should be leftunchanged. For instance the boxes containing $2 , $4 , $5 , and $7 indicatethat each space and quote should be recopied as such. If however a constituentnumber is followed by a set of category-value equations then the constituent shouldbe inected into the desired form. For instance, the box $1 : Case = $c indicatesthat the rst constituent (Jan ) should be inected for case, while keeping its othercategories unchanged (m1 and sg ). The unication variable $c , which is commonto components $1, $3 and $6, allows us to express their case agreement.

    Morphological descriptions appearing under the boxes describe the inectionalfeatures of the whole compound. Here, Gen = $3.Gen ; N b = $3.N b ; Case = $csays that the gender and the number of the whole compound is inherited from thethird constituent, as it appears in the compound lemma (for this entry m1 andsg ), while the case is determined by the current value of variable $c .

  • 8/10/2019 Poljski Rad - Toposlaw

    8/14

    8

    The full exploration of this graph results in the generation of all inectedforms and variants of the compound, in particular all the variants in example (20)inected for all cases. Note that numbering of the constituents allows the graphto represent their omissions, insertions and order change.

    (20) Jan Rodowicz Anoda, Jan Rodowicz Anoda,

    Jan Anoda Rodowicz, Jan Anoda RodowiczJan Rodowicz, Rodowicz Anoda, Anoda Rodowicz, Rodowicz

    Jan Rodowicz Anoda $1 $2 $3 $4 $5 $6 $7

    lemma: Jan class: subst homonym: 0 Nb: sg Case : nom Gen: m1

    lemma: Rodowicz class: subst homonym: 0 Nb: sg Case : nom Gen: m1

    lemma: anoda class: subst homonym: 0 Nb: sg Case : nom Gen: f

    Figure 1 : Lemma annotation and inection graph for the patronym Jan Rodowicz An-oda containing elliptical variants and inversions

    Embedded compounding, frequent in urban toponyms, as discussed in sec-tion 2, can be easily expressed in Multiex . Given the description in Fig. 1 thename of the avenue dedicated to this person can be annotated as a three-componentcompound in which the third component is a compound itself, as shown in Fig. 2.The inection graph in this case is rather trivial3 . It says that the rst constituent,Aleja , inects for case only, and that it can be omitted. The third constituent cantake any variant of its genitive form. Thus, all sequences in example (21) can begenerated. The whole compound inherits its number and gender from the rstconstituent, as it appears in the lemma, and its case from the rst constituentas it appears in the particular inected form. Note that omitting the head nounAleja does not provoke a head shifting: the third constituent remains always ingenitive.

    3 For the sake of simplicity of this presentation the two graphs discussed here are shown in theirsimplied versions containing no pragmatic features official, neutral and neutral spoken.

  • 8/10/2019 Poljski Rad - Toposlaw

    9/14

    Polish Urban Proper Names Dictionary 9

    (21) Aleja Jana Rodowicza Anody, Jana Rodowicza Anody,Aleja Jana Rodowicza Anody, Jana Rodowicza Anody,Aleja Jana Anody Rodowicza, Jana Anody RodowiczaAleja Jana Rodowicza, Jana Rodowicza, Aleja Rodowicza, Rodowicza, . . .

    Aleja Jana Rodowicza Anody $1 $2 $3

    lemma: aleja class: subst homonym: 0 Nb: sg Case : nom Gen: f

    lemma: Jan Rodowicz Anoda class: subst homonym: 0 Nb: sg Case : gen Gen: m1

    Figure 2 : Embedded compounding in Aleja Jana Rodowicza Anody

    Due to the possibility of embedding descriptions, the graph in Fig. 2 can be ap-plied to most street names having a genitive government structure Noun NounGen(cf section 4), independently of the complexity of their genitive complement. Inparticular, examples (11), (14) and (15) are described with the same graph.

    Savary et al. (2009) gives a more detailed discussion on using Multiex withMorfeusz and their application to the present study of Polish urban toponyms.

    3.3 A Dictionary Editor Toposaw

    To facilitate the creation of the dictionary a Java application named Toposaw

    was developed. The list of names to be described is loaded to the applicationsdatabase. The task of the operator is to describe inection and provide somefeatures of the object referenced by each name. To describe inection one needsto construct the labelled lemma as needed by Multiex and then to assign aninection graph to the name.

    As explained in section 3.2 the lemma of a compound has to be labelled withmorphological features of the words constituting the compound. For that purposenames get analysed by Morfeusz . If any of the words has multiple interpretations,the lexicographer has to select the one appropriate for the lemma of the compound.Obviously, the form need not be the lemma of the word in question. For example,Aleja Jana Rodowicza Anody is a compound lemma where Aleja is in nominative ,but the three other tokens represent genitive forms of their respective lexemes.

    The operator can also mark a fragment of the name as a sub-compound to bedescribed separately. As noted in section 2 we use this mechanism for compoundnames of persons, which tend to occur in several urban toponyms.

  • 8/10/2019 Poljski Rad - Toposlaw

    10/14

    10

    Table 1 : Types of city objects represented by proper names

    Areas 380administrative areas, e.g. districts, city areas 220public areas, e.g. parks, national parks, cemeteries 142closed areas, e.g. depots, military training grounds 18Roads 4999streets, boulevards, avenues 4857squares, roundabouts 133bridges, tunnels 9Communication points 1901bus/tram stops, metro stations 1788railway stations 112airports 3Buildings, e.g. theatres, cinemas, churches 473Hydronyms, rivers, lakes, canals 37Monuments 110TOTAL 7898

    The tool keeps a pool of inection graphs, which can be assigned to names.

    A Unitex-like editor is used to create and edit the graphs (Paumier, 2006). Newgraphs can be created from scratch or based on any existing one. Since inectionalbehaviour is often shared by large groups of names, the tool has a mechanism forselecting a group of names and assigning the same graph to all of them simulta-neously.

    The pragmatic labels mentioned in section 2.1 (official, neutral, neutralspoken) are attached to particular paths within a graph, which means they applyonly to particular variants of the name.

    Each name can be linked to a city object referenced by it. As stated in section2.1, some city objects have several names. Such names (as opposed to variantsof one name) are described separately and only then linked to one city object.The links can be labelled with the type of the name: common, marked, orformer. City objects are also categorised using the hierarchy described in section2.3.

    4 Description of the Collected Data

    The collected proper names come from the city hall, offices and websites, webpages of Zarzd Drg Miejskich (the creator of the City Information System) anddirectly from Biuro Geodezji i Katastru (a geodesic office).

    At present the database of city proper names includes 7898 records. Thetypology of these names and their current frequency (according to the classicationpresented in section 2.3) is shown in Tab. 1.

    The collected urban proper names have various linear and syntactic features.The names consist of 1 to 25 components when counted according to the rulesof Multiex (not only words but also all dots, spaces and punctuation marks arecounted as a component). Compare two examples: Belweder (the palace name)

  • 8/10/2019 Poljski Rad - Toposlaw

    11/14

    Polish Urban Proper Names Dictionary 11

    and Biblioteka Publiczna im. Juliana Ursyna Niemcewicza w Dzielnicy Ursynw m.st. Warszawy (the library name). The data contain not only words of POSclasses which inect, namely nouns, adjectives and numerals but also those whichdo not change their forms, such as prepositions and conjunctions. Each of thementioned POS classes can be found, e.g. in the set of cultural institutions names:

    (22) KinoN KulturaN (cinema),(23) FilharmoniaN NarodowaADJ (concert hall),(24) MuzeumN XNUM PawilonuN CytadeliN WarszawskiejADJ (museum),(25) TeatrN NaPREP WoliN (theater),(26) MuzeumN AzjiN iCONJ PacykuN (museum).

    In particular names also contain acronyms, abbreviations and digits which rep-resent numbers. The signicant number of those can be found in square names:Skwer im. Grupy AK Granat , Skwer 1. Dywizji Grenadierw Francja 1940 .

    According to the data, compound proper names consist of nominal phrasesbased on grammatical agreement or government. We assume here that an agree-ment between components of a phrase occurs if all subordinate complements inectin the same way as their head, see example (27). If only the head of a phrase un-dergoes inection but the grammatical form of complements depends on it, thenthe relation between the head and the subordinate components of a name is de-ned as a government. Example (28) illustrates a government relation betweenthe head Aleja and the subordinate clause Jana Rodowicza Anody . Governedphrases can be in genitive , dative , instrumental and locative .

    Some multi-word proper names have specic inection patterns rarely observedin common nominal phrases. If a name contains another proper name whose form isin nominative case, as in example (29), the inner-name remains uniected whereasthe head of this phrase takes the forms of appropriate cases. The next example (30)shows that the name previously frozen does inect if it is used independently(without the head Kino ).

    (27) Jan Rodowicz Anoda (nom.), Jana Rodowicza Anody (gen.), Janem Rodowiczem Anod (inst.), etc.

    (28) Aleja Jana Rodowicza Anody (nom.), Alei Jana Rodowicza Anody(gen.), Alej Jana Rodowicza Anody (inst.), etc.

    (29) Kino Femina (nom.), Kina Femina (gen.), Kinem Femina (inst.), etc(30) Femina (nom.), Feminy (gen.), Femin (inst.), etc.

    Table 2 shows name distribution of the concept ROAD (streets, avenues,.. . )from the collected database. As components we count not only words but alsospaces, punctuation marks, e.g. quotation marks, Arabic and Roman numerals,e.g. 1920, IX, acronyms, e.g. ZUS. The table shows that structures based onagreement are the most numerous in this group of proper names (2661), nonethe-less they are mostly formed only by three components (2649). Phrases based ongovernment are a smaller set (2354), they are formed by 3 to 14 components. Thevariety of name subordinate structures will be reected in visual graphs describing

  • 8/10/2019 Poljski Rad - Toposlaw

    12/14

  • 8/10/2019 Poljski Rad - Toposlaw

    13/14

    Polish Urban Proper Names Dictionary 13

    resource can be used for natural language processing applications such as infor-mation extraction, dialogue systems, or geographical information systems withnatural language access.

    References

    Aleksandra Cielikowa , editor (2008), May sownik odmiany nazw wasnych , Rytm,Warszawa.

    Jan Grzenia (2003), Sownik nazw wasnych , Wydawnictwo naukowe PWN.Kwiryna Handke (1998), Sownik nazewnictwa Warszawy , Slawistyczny Orodek

    Wydawniczy, Warszawa.Krzysztof Marasek , ukasz Brocki , Danel Korinek , Krzysztof Szklanny , and

    Ryszard Gubrynowicz (2009), User Centered Design for a Voice Portal, Lecture Notes in Articial Intelligence , 5070.

    Magorzta Marciniak , Joanna Rabiega-Winiewska , and Agnieszka Mykowiecka(2008), Proper Names in Dialogs from the Warsaw Transportation Call Center, inIntelligent Information Systems XVI , EXIT.

    Agnieszka Mykowiecka , Krzysztof Marasek , Magorzata Marciniak , RyszardGubrynowicz

    , and Joanna Rabiega-Winiewska

    (2007), Annotation of Polish spo-ken dialogs in LUNA project, in Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of 3rd Language & Technology Confer-ence. October 5-7, 2007, Pozna, Poland .

    Sbastien Paumier (2006), Unitex 1.2 User Manual, http://www-igm.univ-mlv.fr/ uni-tex/UnitexManual.pdf.

    Jakub Piskorski and Marcin Sydow (2007), Usability of String Distance Metrics forName Matching Tasks in Polish, in Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of 3rd Language & Technology Confer-ence. October 5-7, 2007, Pozna, Poland .

    Jakub Piskorski , Marcin Sydow , and Anna Kup (2007), Lemmatization of PolishPerson Names, in ACL 2007. Proceedings of the Workshop on Balto-Slavonic Natu-ral Language Processing 2007 Special Theme: Information Extraction and Enabling Technologies .

    Adam Przepirkowski and Marcin Woliski (2003), A Flexemic Tagset for Polish,in Proceedings of the Workshop on Morphological Processing of Slavic Languages,EACL 2003 , pp. 3340.

    Ewa Rzetelska-Feleszko , editor (2005), Polskie nazwy wasne , Instytut Jzyka Pol-skiego Polskiej Akademii Nauk, Krakw.

    Zygmunt Saloni , Wodzimierz Gruszczyski , Marcin Woliski , and Robert Woosz(2007), Sownik gramatyczny jzyka polskiego , Wiedza Powszechna, Warszawa.

    Agata Savary (2005a), A formalism for the computational morphology of multi-wordunits, Archives of Control Sciences , 15(3):437449.

    Agata Savary (2005b), MULTIFLEX. Users Manual and Technical Documentation.Version 1.0, Technical Report 285, LI-Franois Rabelais University of Tours, France.

    Agata Savary , Joanna Rabiega-Winiewska , and Marcin Woliski (2009), Inectionof Polish Multi-Word Proper Names with Morfeusz and Multiex, Lecture Notes in Articial Intelligence , 5070.

  • 8/10/2019 Poljski Rad - Toposlaw

    14/14

    14

    Jan Tokarski (2002), Schematyczny indeks a tergo polskich form wyrazowych , ed. Zyg-munt Saloni, Wydawnictwo Naukowe PWN, Warszawa, 2 edition.

    Marcin Woliski (2003), System znacznikw morfosyntaktycznych w korpusie IPI PAN,Polonica , XXIIXXIII:3955.

    Marcin Woliski (2006), Morfeusz a Practical Tool for the Morphological Analy-sis of Polish, in Mieczysaw Kopotek , Sawomir Wierzcho , and Krzysztof Tro-janowski , editors, Intelligent Information Processing and Web Mining, IIS:IIPWM06 Proceedings , pp. 503512, Springer.