1.An Arabic Transformation Based Approach to Automatic

Microsoft Word - 1.An Arabic Transformation Based Approach to Automatic Paraphrasing of Syntactic Sentences
Paraphrasing of Syntactic Sentences
Azeddine Rhazi
Qadi Ayyad University, Marrakesh, Morocco
The aim of this paper is to exploit the existing Lexicon-Grammar (LG) tables, as well as to assess their relative
importance vis-à-vis the concept of transformation and automatic paraphrasing. These operations include multiple
processes at the lexical, morpho-syntactic, and semantic levels. Our proposal is to model highly productive
phenomena of the Arabic language, such as pronominalization and passivization, dedicated to the both Arabic verb
classes and Multiword Expressions (MWEs), in order to formalize the relation between structures and their
semantic properties and thus to represent the symmetry and pairs between sentences that share a predicate that links
the noun and a support verb. Furthermore, the automatic process of paraphrasing involves both the distributional
and transformative features of each class of verbs or other structures such as Arabic MWEs. This research in
progress outlines how to build Lexicon-Grammar tables for Arabic syntactic sentences by using automatic
paraphrasing in a large transformational grammar on the one hand, and to implement it into both NooJ electronic
dictionaries and local grammars on the other hand.
Keywords: Lexicon-Grammar, transformation, automatic paraphrasing, Arabic, nominalization, passivization, NooJ
Introduction
The measuring of the semantic relationship and similarity between sentence pairs is crucial for many
Natural Language Processing (NLP) tasks such as Sentiment Analysis, Natural Language Inference, Lexical
Synonymy, Ontology, Question Answering, and Paraphrasing Identification, but it has proven to be very
challenging to tackle. Recently, with the surge of transformational theory based models, the semantic symmetry,
as a composition of the meaning of a sentence, has turned out to be efficient at NLP. However, the
transformational and distributional procedures have become ubiquitous for semantic similarity as they reflect
the paraphrasing of word forms, phrases, and syntax forms or sentences using different levels and transition
tools. Consider that the Lexicon-Grammar (LG) table tool plays an increasing role in NLP and in its various
models since Harris’s theory (1968), Chomsky’s Transformational-Generative Grammar (TGG) model, which
accounts for the transformations of phrase structures (1957), and Gross’s Syntactic method (1975), in which
co-occurring sentence features are required to agree in some morpho-syntactic properties, even when they are
not immediately adjacent to one another. The assumption behind the above-mentioned models must be
Ali Boulaalam, Ph.D., Assistant Professor, Computational Linguistics, Moulay Ismail University, Meknes, Morocco. Azeddine Rhazi, Ph.D., Associate Professor, Linguistic Engineering, Qadi Ayyad University, Marrakesh, Morocco.
DAVID PUBLISHING

138
evidence to be labelled as an instance of transformation and changing linguistic structures that can be enriched
with adverbs, redundant expressions, adjectives, and useless word forms (Boujelben & Benhamadou, 2012),
and computed via various linguistic phenomena such as pronominalization, passivization, extraction, and
negation.
In the same context, many methods have been proposed for transformation and paraphrasing sentences in
different languages (Boujelben & Benhamadou, 2012; Sagot & Tolone, 2009; Silberztein, 2003; 2016; 2018),
such as Arabic, which largely used the LG for classification of the lexicon or building structured information
(Harris, 1968) represented by the syntactic-semantic characteristics of each lexical unity.
Our work has two objectives:
(1) Firstly, we propose a mixed approach which combines constituent and dependency grammars; at the
same time, we adopt a hybrid recommended method compatible with transformational theory. We start by
addressing the lexical semantic properties of the problem by employing both simple words as well as multiword
expressions, which imposes selectional restrictions on its subjects and complements (Gross, 1981). Then, we
train an architecture composed of embedding a subcategorization frame followed by a complex semantic
classification dedicated to the syntactic parser.
(2) Secondly, for some experiments (in Section 5), we have used a pre-trained and trained model for
Arabic language data—mainly pronomalization along with passivization phenomena—using the performance
of the NooJ platform.
The rest of this paper is organized as follows. Section 2 explains the automatic paraphrasing used for
sentence transformation and semantic modeling. A brief survey of related woks is given in Section 3. Section 4
details the proposed methodology in addition to the framework of the study. The experimental settings with its
results are discussed in Sections 5 and 6, respectively. We conclude the paper in Section 7.
Automatic Paraphrasing and Sentence Transformation
A distinction is made between transformations and paraphrasing. Transformation is used to refer to
existing sentences, that is, the rendering of a structure in another structure. On the other hand, transformation
refers to the change of a written sentence into another derived structure. In addition, no correlation exists
between lexical synonymy and paraphrasing, which means that A ≡ (A, B, C...) produces various senses
using a set of syntactic components that can be complex (Gross, 1981). However; it should be pointed out that,
in Arabic, the term transformation is used to refer to both unary and binary transformation because of their
rendering to original sense (Silberztein, 2018). Automatic paraphrasing is as a system that takes one structure as
units input that produces all the sentences which share the same lexical materials with the original structure
(Silberztein, 2016). The aim of this study is to enhance a paraphrasing tool mainly morpho-syntactic as well as
semantic relation for both simple words and multiword expressions.
Related Work
Transformational and distributional methods are NLP tasks that have been defined as a relation of
equivalence within the set of propositions. Harris (1968) then has been approached using various linguistic and
computational methods mainly based on LG and dictionary which use simple sentences (subject-verb-objects)
as a dictionary entry (Gross, 1975). In addition, the transformation links two sentences of the same meaning P1
and P2; then the lists of the morphemes which compose each of them must be very close (Gross, 1981). The
AN ARABIC TRANSFORMATION BASED APPROACH TO AUTOMATIC PARAPHRASING

139
created robust method using Lexicon-Grammar tables relies on structured data as well as some sort of prior
logical semantic knowledge of this data for semantic properties and feature extraction.
In recent years, new methods on Transformational Grammar implemented for many systems and focused
on relationships between sentences that share the same lexical material, for example: how to link a sentence to
its negative or passive forms, or how to combine several sentences to build a complex sentence (Silberztein,
2016). Each table corresponds to a class which groups together the lexical units of a given grammatical
category sharing along with accepting common properties (Gross, 1975), based on two fundamental concepts:
distribution and transformation, using predicate and arguments logic. Special semantic properties of
paraphrasing are generated by adopting the linguistic tools developed within the framework of dependency and
constituent grammars. In this sense, even when referring to Gross (1975), one of the pioneers of Syntactic
method based LG tables. A new surge of studies has suggested a potential of LG information (data); each table
groups together the elements (classes) sharing certain defining properties, which generally full under the
subcategorization (Sagot & Tolone, 2009). In the state of the art Lexicon-Grammar tables are sources of
syntactic lexical information for different languages.
These approaches, however, relied on building resources from pre-processing step, so as to deal with the
different structures and sentences, as well as the possible properties. In contrast to the constituent grammar that
generative grammars have placed on its hypothesis, the dependency grammar has followed a much more
approaches to be generalized in various natural language processing methods.
As far as the Arabic language is concerned by the construction of LG table and Paraphrasing method in a
large transformational grammar for simple words, verbs, and idiomatic expressions (Elhannach, 1988), it
should be noted that the Arabic language is formed syntactically into five basic classes of Arabic structures
(Rhazi & Boulaalam, 2018):
1. V + N0;
4. V + N0 + (N1 + Pr + N2);
5. V + N0 + (N1 + N2).
Each basic structure generates several sub-structures according to their semantico-syntactic features linked
by the arguments required by the predicate.
Methodology
In order to build the layer hierarchy, we have applied a mixed approach according to the idea mentioned
above; that contributes to adopt the prerogative combined method; which associates between Constituent
Grammar and Dependency Grammars (Silberztein, 2016).
The proposed model is an enhancement to Arabic Syntactic Parser (ASP) that generates sentences into
different embedded syntactic forms and the way of using the transformation and the paraphrasing procedures
(Silberztein, 2018).
1. Morphological and Syntactic analysis;
2. Lexicon-Grammar Transformations and Paraphrasing;
3. Dictionaries and grammars for implementing linguistic environment;

Figure 1. The proposed approach.
Experimental Setup
This section describes the pre-training and training details and the datasets of the experiments.
To answer to specific questions concerning the sentences transformation, we proposed highly productive
model phenomena of the Arabic language such as pronominalization (verb-arguments), passivization, by
performing formal operators plus (+) and minus (-) (as intersection of raw and column), in order to formalize
the relation between structures in accordance with their semantic properties to represent the symmetry between
sentences that share a predicate (Gross, 1975; 1981) (linking predicates to its arguments/predicative schema)
that links the noun and a support verb, which can also be an adjective or an auxiliary verb. Furthermore, the
automatic process of paraphrasing involves both the distributional and transformative features of each class of
verbs or other structures, taking into account their degree of distributional semantic homogeneity (Sagot &
Tolone, 2009).
This classical transformational point of view defines semantic features according to some
morpho-syntactic characteristics as well as the semantic value of each lexical item that decides to valid these
features (Sagot & Tolone, 2009) (mnemonic identifiers) according to + (: resp-) (not valid); symbols indicate
the corresponding information of the lexical entry.
As far as the Arabic language is concerned by the construction of LG table’s project for simple verbs,
predicative noun, adjectives, and idiomatic expressions (see Tables 1, 2), the evaluation has conducted in two
experiments that have been previously used to evaluate the Arabic syntactic analyzer. We used the NooJ
dictionary rules and the NooJ local grammar to evaluate the utility of each result.
Experiment 1
For this experiment we used Arabic qualitative verb classes as dataset designed to test the
Lexicon-Grammar ability. The training data contain a pair of annotated verb classes (Laporte, 2020) ranging in

tables; columns with corresponding rows, where the relatedness data represent semantic features and reflect
distributional context. Results are shown with several standard evaluation datasets for example:
NoV: possible head of an intransitive construction with initial subject noun phrase;
(+): positive features;
/aθlaga/: ( = to cause)/NoN-V N1 (+ positive trait);
(-): negative features;
/asifa/: ) = to cause)/NoN-V N1;
/ataba/:( = to cause)/NoN-V N1;
Some verb can be inserted automatically as resource without modifying the distributional constraints between
the verb and its arguments (Laporte, 2020); for instance: “ ” (he/she/it causes the pain to X).
Table 1 A Sample of Distributional and Transformational Characteristics for the Verb /a:ða:/ (to hurt) as Masdar + Pr + Phrase
Table 2
/a:ða:/ (to hurt )...
Recently, the Lexicon-Grammar of Arabic language has become a large lexical, syntactic, and semantic
database (Elhannach, 1988). The visual readability of the format, degree of formalization, degree of validity,
and quantity of information content (Laporte, 2020) facilitate, in particular, language processing lexicons. For
example, some lexical items have not been encoded such as those of “” (ahzana) and “” (athlaja) as
shown in Figure 2:

References Boujelben, I., & Benhamadou A. (2012). Transformational analysis of Arabic sentences: Application to automatically extracted
biomedical symptoms. In C. Vuckovic et al., Automatic processing of various levels of linguistic phenomena from the NooJ International Conference (pp. 182-194). CSP, UK.
Chomesky, N. (1957). Syntactic structures. Paris: Mouton & Co-S-Gravenhage. Elhannach, M. (1988). Syntaxe des verbes psychologique de l’Arabe (Thèse de doctorat d’état (LADL), Université de Paris 7,
1988). Gavriilidou, Z., Papadoppoulou, E., & Chadjipapa, E. (2012). Processing Greek frozen expressions with NooJ. In C. Vuckovic et
al., The NooJ 2011 International Conference (pp. 63-74). CSP, UK. Gross, M. (1975). Méthodes en syntaxe: Régime des constructions complétives. Hermann, Paris: Actualités Scientifiques et
Industrielles. Gross, M. (1981). Les bases empiriques de la notion de prédicat sémantique. Langages, 15(63), 7-52. Harris, Z. S. (1968). Mathematical structures of language. New York: Wiley. Laporte, E. (26 Février 2020). Is the Lexicon-Grammar exploitable for language processing? Retrieved from
http:hal.archives-ouvertes.fr/hal-00858302 Mesfar, S. (2008). Analyse morpho-syntaxique automatique et reconnaissance des entités nommées en arabe standard (Ph.D.
thesis, Franche-Comte University, 2008). Mota, C., Baptista, J., & Barreioro, A. (2018). The Lexicon-Grammar of predicate nouns with ser de in Port4NooJ. In
Formalizing natural languages with NooJ 2018, and its natural languages processing applications. Communications in computer and information science, Vol. 987 (pp. 124-137). New York: Springer.
Rhazi, A., & Boulaalam, A. (2018). Corpus based extraction and translation of Arabic MWEs. In NooJ proceedings 2017 (pp. 143-155). New York: Springer.
Sagot, B., & Tolone, E. (2009). Intégrer les tables du lexique-grammaire à un analyseur syntaxique robuste à grande échelle. In Actes de la Conference TALN 2009 (pp. 1-11). Senlis, France.
Silberztein, M. (2003). The NooJ manual. Retrieved from http://www.nooj4nlp.net/pages/references.html Silberztein, M. (2016). Formalizing natural languages: The NooJ approach. London: Wiley-ISTE. Silberztein, M. (2018). Unary transformations for French transitive sentences. In Formalizing natural languages with NooJ 2018,
and its natural languages processing applications. Communications in computer and information science, Vol. 987 (pp. 138-151). New York: Springer.

Documents

1.An Arabic Transformation Based Approach to Automatic