1.An Arabic Transformation Based Approach to Automatic
10
Sino-US English Teaching, June 2021, Vol. 18, No. 6, 137-146 doi:10.17265/1539-8072/2021.06.001 An Arabic Transformation Based Approach to Automatic Paraphrasing of Syntactic Sentences Ali Boulaalam Moulay Ismail University, Meknes, Morocco Azeddine Rhazi Qadi Ayyad University, Marrakesh, Morocco The aim of this paper is to exploit the existing Lexicon-Grammar (LG) tables, as well as to assess their relative importance vis-à-vis the concept of transformation and automatic paraphrasing. These operations include multiple processes at the lexical, morpho-syntactic, and semantic levels. Our proposal is to model highly productive phenomena of the Arabic language, such as pronominalization and passivization, dedicated to the both Arabic verb classes and Multiword Expressions (MWEs), in order to formalize the relation between structures and their semantic properties and thus to represent the symmetry and pairs between sentences that share a predicate that links the noun and a support verb. Furthermore, the automatic process of paraphrasing involves both the distributional and transformative features of each class of verbs or other structures such as Arabic MWEs. This research in progress outlines how to build Lexicon-Grammar tables for Arabic syntactic sentences by using automatic paraphrasing in a large transformational grammar on the one hand, and to implement it into both NooJ electronic dictionaries and local grammars on the other hand. Keywords: Lexicon-Grammar, transformation, automatic paraphrasing, Arabic, nominalization, passivization, NooJ Introduction The measuring of the semantic relationship and similarity between sentence pairs is crucial for many Natural Language Processing (NLP) tasks such as Sentiment Analysis, Natural Language Inference, Lexical Synonymy, Ontology, Question Answering, and Paraphrasing Identification, but it has proven to be very challenging to tackle. Recently, with the surge of transformational theory based models, the semantic symmetry, as a composition of the meaning of a sentence, has turned out to be efficient at NLP. However, the transformational and distributional procedures have become ubiquitous for semantic similarity as they reflect the paraphrasing of word forms, phrases, and syntax forms or sentences using different levels and transition tools. Consider that the Lexicon-Grammar (LG) table tool plays an increasing role in NLP and in its various models since Harris’s theory (1968), Chomsky’s Transformational-Generative Grammar (TGG) model, which accounts for the transformations of phrase structures (1957), and Gross’s Syntactic method (1975), in which co-occurring sentence features are required to agree in some morpho-syntactic properties, even when they are not immediately adjacent to one another. The assumption behind the above-mentioned models must be Ali Boulaalam, Ph.D., Assistant Professor, Computational Linguistics, Moulay Ismail University, Meknes, Morocco. Azeddine Rhazi, Ph.D., Associate Professor, Linguistic Engineering, Qadi Ayyad University, Marrakesh, Morocco. DAVID PUBLISHING D
1.An Arabic Transformation Based Approach to Automatic
Microsoft Word - 1.An Arabic Transformation Based Approach to
Automatic Paraphrasing of Syntactic Sentences
Paraphrasing of Syntactic Sentences
Azeddine Rhazi
Qadi Ayyad University, Marrakesh, Morocco
The aim of this paper is to exploit the existing Lexicon-Grammar
(LG) tables, as well as to assess their relative
importance vis-à-vis the concept of transformation and automatic
paraphrasing. These operations include multiple
processes at the lexical, morpho-syntactic, and semantic levels.
Our proposal is to model highly productive
phenomena of the Arabic language, such as pronominalization and
passivization, dedicated to the both Arabic verb
classes and Multiword Expressions (MWEs), in order to formalize the
relation between structures and their
semantic properties and thus to represent the symmetry and pairs
between sentences that share a predicate that links
the noun and a support verb. Furthermore, the automatic process of
paraphrasing involves both the distributional
and transformative features of each class of verbs or other
structures such as Arabic MWEs. This research in
progress outlines how to build Lexicon-Grammar tables for Arabic
syntactic sentences by using automatic
paraphrasing in a large transformational grammar on the one hand,
and to implement it into both NooJ electronic
dictionaries and local grammars on the other hand.
Keywords: Lexicon-Grammar, transformation, automatic paraphrasing,
Arabic, nominalization, passivization, NooJ
Introduction
The measuring of the semantic relationship and similarity between
sentence pairs is crucial for many
Natural Language Processing (NLP) tasks such as Sentiment Analysis,
Natural Language Inference, Lexical
Synonymy, Ontology, Question Answering, and Paraphrasing
Identification, but it has proven to be very
challenging to tackle. Recently, with the surge of transformational
theory based models, the semantic symmetry,
as a composition of the meaning of a sentence, has turned out to be
efficient at NLP. However, the
transformational and distributional procedures have become
ubiquitous for semantic similarity as they reflect
the paraphrasing of word forms, phrases, and syntax forms or
sentences using different levels and transition
tools. Consider that the Lexicon-Grammar (LG) table tool plays an
increasing role in NLP and in its various
models since Harris’s theory (1968), Chomsky’s
Transformational-Generative Grammar (TGG) model, which
accounts for the transformations of phrase structures (1957), and
Gross’s Syntactic method (1975), in which
co-occurring sentence features are required to agree in some
morpho-syntactic properties, even when they are
not immediately adjacent to one another. The assumption behind the
above-mentioned models must be
Ali Boulaalam, Ph.D., Assistant Professor, Computational
Linguistics, Moulay Ismail University, Meknes, Morocco. Azeddine
Rhazi, Ph.D., Associate Professor, Linguistic Engineering, Qadi
Ayyad University, Marrakesh, Morocco.
DAVID PUBLISHING
138
evidence to be labelled as an instance of transformation and
changing linguistic structures that can be enriched
with adverbs, redundant expressions, adjectives, and useless word
forms (Boujelben & Benhamadou, 2012),
and computed via various linguistic phenomena such as
pronominalization, passivization, extraction, and
negation.
In the same context, many methods have been proposed for
transformation and paraphrasing sentences in
different languages (Boujelben & Benhamadou, 2012; Sagot &
Tolone, 2009; Silberztein, 2003; 2016; 2018),
such as Arabic, which largely used the LG for classification of the
lexicon or building structured information
(Harris, 1968) represented by the syntactic-semantic
characteristics of each lexical unity.
Our work has two objectives:
(1) Firstly, we propose a mixed approach which combines constituent
and dependency grammars; at the
same time, we adopt a hybrid recommended method compatible with
transformational theory. We start by
addressing the lexical semantic properties of the problem by
employing both simple words as well as multiword
expressions, which imposes selectional restrictions on its subjects
and complements (Gross, 1981). Then, we
train an architecture composed of embedding a subcategorization
frame followed by a complex semantic
classification dedicated to the syntactic parser.
(2) Secondly, for some experiments (in Section 5), we have used a
pre-trained and trained model for
Arabic language data—mainly pronomalization along with
passivization phenomena—using the performance
of the NooJ platform.
The rest of this paper is organized as follows. Section 2 explains
the automatic paraphrasing used for
sentence transformation and semantic modeling. A brief survey of
related woks is given in Section 3. Section 4
details the proposed methodology in addition to the framework of
the study. The experimental settings with its
results are discussed in Sections 5 and 6, respectively. We
conclude the paper in Section 7.
Automatic Paraphrasing and Sentence Transformation
A distinction is made between transformations and paraphrasing.
Transformation is used to refer to
existing sentences, that is, the rendering of a structure in
another structure. On the other hand, transformation
refers to the change of a written sentence into another derived
structure. In addition, no correlation exists
between lexical synonymy and paraphrasing, which means that A ≡ (A,
B, C...) produces various senses
using a set of syntactic components that can be complex (Gross,
1981). However; it should be pointed out that,
in Arabic, the term transformation is used to refer to both unary
and binary transformation because of their
rendering to original sense (Silberztein, 2018). Automatic
paraphrasing is as a system that takes one structure as
units input that produces all the sentences which share the same
lexical materials with the original structure
(Silberztein, 2016). The aim of this study is to enhance a
paraphrasing tool mainly morpho-syntactic as well as
semantic relation for both simple words and multiword
expressions.
Related Work
Transformational and distributional methods are NLP tasks that have
been defined as a relation of
equivalence within the set of propositions. Harris (1968) then has
been approached using various linguistic and
computational methods mainly based on LG and dictionary which use
simple sentences (subject-verb-objects)
as a dictionary entry (Gross, 1975). In addition, the
transformation links two sentences of the same meaning P1
and P2; then the lists of the morphemes which compose each of them
must be very close (Gross, 1981). The
AN ARABIC TRANSFORMATION BASED APPROACH TO AUTOMATIC
PARAPHRASING
139
created robust method using Lexicon-Grammar tables relies on
structured data as well as some sort of prior
logical semantic knowledge of this data for semantic properties and
feature extraction.
In recent years, new methods on Transformational Grammar
implemented for many systems and focused
on relationships between sentences that share the same lexical
material, for example: how to link a sentence to
its negative or passive forms, or how to combine several sentences
to build a complex sentence (Silberztein,
2016). Each table corresponds to a class which groups together the
lexical units of a given grammatical
category sharing along with accepting common properties (Gross,
1975), based on two fundamental concepts:
distribution and transformation, using predicate and arguments
logic. Special semantic properties of
paraphrasing are generated by adopting the linguistic tools
developed within the framework of dependency and
constituent grammars. In this sense, even when referring to Gross
(1975), one of the pioneers of Syntactic
method based LG tables. A new surge of studies has suggested a
potential of LG information (data); each table
groups together the elements (classes) sharing certain defining
properties, which generally full under the
subcategorization (Sagot & Tolone, 2009). In the state of the
art Lexicon-Grammar tables are sources of
syntactic lexical information for different languages.
These approaches, however, relied on building resources from
pre-processing step, so as to deal with the
different structures and sentences, as well as the possible
properties. In contrast to the constituent grammar that
generative grammars have placed on its hypothesis, the dependency
grammar has followed a much more
approaches to be generalized in various natural language processing
methods.
As far as the Arabic language is concerned by the construction of
LG table and Paraphrasing method in a
large transformational grammar for simple words, verbs, and
idiomatic expressions (Elhannach, 1988), it
should be noted that the Arabic language is formed syntactically
into five basic classes of Arabic structures
(Rhazi & Boulaalam, 2018):
1. V + N0;
4. V + N0 + (N1 + Pr + N2);
5. V + N0 + (N1 + N2).
Each basic structure generates several sub-structures according to
their semantico-syntactic features linked
by the arguments required by the predicate.
Methodology
In order to build the layer hierarchy, we have applied a mixed
approach according to the idea mentioned
above; that contributes to adopt the prerogative combined method;
which associates between Constituent
Grammar and Dependency Grammars (Silberztein, 2016).
The proposed model is an enhancement to Arabic Syntactic Parser
(ASP) that generates sentences into
different embedded syntactic forms and the way of using the
transformation and the paraphrasing procedures
(Silberztein, 2018).
1. Morphological and Syntactic analysis;
2. Lexicon-Grammar Transformations and Paraphrasing;
3. Dictionaries and grammars for implementing linguistic
environment;
AN ARABIC TRANSFORMATION BASED APPROACH TO AUTOMATIC
PARAPHRASING
Figure 1. The proposed approach.
Experimental Setup
This section describes the pre-training and training details and
the datasets of the experiments.
To answer to specific questions concerning the sentences
transformation, we proposed highly productive
model phenomena of the Arabic language such as pronominalization
(verb-arguments), passivization, by
performing formal operators plus (+) and minus (-) (as intersection
of raw and column), in order to formalize
the relation between structures in accordance with their semantic
properties to represent the symmetry between
sentences that share a predicate (Gross, 1975; 1981) (linking
predicates to its arguments/predicative schema)
that links the noun and a support verb, which can also be an
adjective or an auxiliary verb. Furthermore, the
automatic process of paraphrasing involves both the distributional
and transformative features of each class of
verbs or other structures, taking into account their degree of
distributional semantic homogeneity (Sagot &
Tolone, 2009).
This classical transformational point of view defines semantic
features according to some
morpho-syntactic characteristics as well as the semantic value of
each lexical item that decides to valid these
features (Sagot & Tolone, 2009) (mnemonic identifiers)
according to + (: resp-) (not valid); symbols indicate
the corresponding information of the lexical entry.
As far as the Arabic language is concerned by the construction of
LG table’s project for simple verbs,
predicative noun, adjectives, and idiomatic expressions (see Tables
1, 2), the evaluation has conducted in two
experiments that have been previously used to evaluate the Arabic
syntactic analyzer. We used the NooJ
dictionary rules and the NooJ local grammar to evaluate the utility
of each result.
Experiment 1
For this experiment we used Arabic qualitative verb classes as
dataset designed to test the
Lexicon-Grammar ability. The training data contain a pair of
annotated verb classes (Laporte, 2020) ranging in
AN ARABIC TRANSFORMATION BASED APPROACH TO AUTOMATIC
PARAPHRASING
tables; columns with corresponding rows, where the relatedness data
represent semantic features and reflect
distributional context. Results are shown with several standard
evaluation datasets for example:
NoV: possible head of an intransitive construction with initial
subject noun phrase;
(+): positive features;
/aθlaga/: ( = to cause)/NoN-V N1 (+ positive trait);
(-): negative features;
/asifa/: ) = to cause)/NoN-V N1;
/ataba/:( = to cause)/NoN-V N1;
Some verb can be inserted automatically as resource without
modifying the distributional constraints between
the verb and its arguments (Laporte, 2020); for instance: “ ”
(he/she/it causes the pain to X).
Table 1 A Sample of Distributional and Transformational
Characteristics for the Verb /a:ða:/ (to hurt) as Masdar + Pr +
Phrase
Table 2
/a:ða:/ (to hurt )...
Recently, the Lexicon-Grammar of Arabic language has become a large
lexical, syntactic, and semantic
database (Elhannach, 1988). The visual readability of the format,
degree of formalization, degree of validity,
and quantity of information content (Laporte, 2020) facilitate, in
particular, language processing lexicons. For
example, some lexical items have not been encoded such as those of
“” (ahzana) and “” (athlaja) as
shown in Figure 2:
References Boujelben, I., & Benhamadou A. (2012).
Transformational analysis of Arabic sentences: Application to
automatically extracted
biomedical symptoms. In C. Vuckovic et al., Automatic processing of
various levels of linguistic phenomena from the NooJ International
Conference (pp. 182-194). CSP, UK.
Chomesky, N. (1957). Syntactic structures. Paris: Mouton &
Co-S-Gravenhage. Elhannach, M. (1988). Syntaxe des verbes
psychologique de l’Arabe (Thèse de doctorat d’état (LADL),
Université de Paris 7,
1988). Gavriilidou, Z., Papadoppoulou, E., & Chadjipapa, E.
(2012). Processing Greek frozen expressions with NooJ. In C.
Vuckovic et
al., The NooJ 2011 International Conference (pp. 63-74). CSP, UK.
Gross, M. (1975). Méthodes en syntaxe: Régime des constructions
complétives. Hermann, Paris: Actualités Scientifiques et
Industrielles. Gross, M. (1981). Les bases empiriques de la notion
de prédicat sémantique. Langages, 15(63), 7-52. Harris, Z. S.
(1968). Mathematical structures of language. New York: Wiley.
Laporte, E. (26 Février 2020). Is the Lexicon-Grammar exploitable
for language processing? Retrieved from
http:hal.archives-ouvertes.fr/hal-00858302 Mesfar, S. (2008).
Analyse morpho-syntaxique automatique et reconnaissance des entités
nommées en arabe standard (Ph.D.
thesis, Franche-Comte University, 2008). Mota, C., Baptista, J.,
& Barreioro, A. (2018). The Lexicon-Grammar of predicate nouns
with ser de in Port4NooJ. In
Formalizing natural languages with NooJ 2018, and its natural
languages processing applications. Communications in computer and
information science, Vol. 987 (pp. 124-137). New York:
Springer.
Rhazi, A., & Boulaalam, A. (2018). Corpus based extraction and
translation of Arabic MWEs. In NooJ proceedings 2017 (pp. 143-155).
New York: Springer.
Sagot, B., & Tolone, E. (2009). Intégrer les tables du
lexique-grammaire à un analyseur syntaxique robuste à grande
échelle. In Actes de la Conference TALN 2009 (pp. 1-11). Senlis,
France.
Silberztein, M. (2003). The NooJ manual. Retrieved from
http://www.nooj4nlp.net/pages/references.html Silberztein, M.
(2016). Formalizing natural languages: The NooJ approach. London:
Wiley-ISTE. Silberztein, M. (2018). Unary transformations for
French transitive sentences. In Formalizing natural languages with
NooJ 2018,
and its natural languages processing applications. Communications
in computer and information science, Vol. 987 (pp. 138-151). New
York: Springer.