16
1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

Embed Size (px)

Citation preview

Page 1: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

1

Computational Investigation of Palestinian Arabic Dialects

Ezra DayaRafi TalmonShuly Wintner

Page 2: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

2

Background

Fieldwork study refers to Arabic

dialects spoken by people in 250 localities –

Northern and central parts of Israel. Localities in the West Bank. Southern Lebanese communities in Galilee. 1948’s Palestinian refugees in existing Arabic

localities .

Page 3: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

3

Background cont.

Colloquial Arabic featuresColloquial Arabic features:

Non-official spoken language, usually not written. Differs from place to place. The similarity/distance between the Arabic dialects can be measured Considered by the speakers as less prestigious compared to the official Arabic.

Page 4: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

4

Background cont.

Work performed by special teamsWork performed by special teams :: Collecting and processing fieldwork material such as

recorded interviews and linguistic questionnaires. Transcription of the material that constitutes the

basis of our work. Defining an accurate description of the language

varieties of Palestinian colloquial Arabic, their characteristics, and their geographical distribution.

Page 5: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

5

Transcribed Text Sample

Page 6: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

6

Objectives

Publication of the vast collected material using computational linguistic techniques in order to:

Create lexicons and glossaries for Arabic dialects automatically. Create a linguistic atlas to graphically measure the similarities

among the dialects. Better understanding of morphological and phonemic

dialectology features.

Page 7: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

7

Linguistic Atlas

Page 8: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

8

The challenge – Rich Morphology

Semitic languages such as Arabic, have a rich morphology and contain highly inflected forms. Example:

axdat is 3nd, singular, feminine, past form of the verb axad Obtained by concatenating the suffix ‘at’ and reducing the vowel ‘a’ to the base axad.

Page 9: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

9

Rich Morphology cont.

Arabic has a complex system of morphology based

on triconsonantal roots that is common in Semitic

languages.

For example, there are 10 verb patterns, each

of which can be inflected in 3 numbers, 2 genders,

3 persons, several tenses and aspects, and can be

suffixed by several pronominal forms.

Page 10: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

10

Traditional Approach

Assignment of linguists performing grammatical analysis of the transcribed texts and manually creating lexicon, glossaries and linguistic atlas.

Disadvantages: Lack of sophistication. Time consuming. Expensive human resources.

Page 11: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

11

Innovative Approach

Devise an automated analysis of these transcribed texts, in order to obtain: An automated creation of a glossary to organize all the

lexical items by grammatical features. i.e. root, pattern etc.

Isolation of the phonetic and morphological features and characteristic of specific dialects in this surveyed area.

Measurement of dialect similarity. Automated processing provides accuracy and efficiency .

Page 12: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

12

Linguistic Technologies

For this research we intend to exploit existing computational linguistics technology for the investigation of Palestinian Arabic dialectsby using:

Finite-State technology. Machine learning techniques. Computational dialectology.

Page 13: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

13

Finite State Technology

Employing the Xerox finite state tools and techniques which are:

Useful and efficient programs that process text in natural languages.

Concentrating on morphological analysis and generation.

Giving access to finite state operations and a regular expression compiler.

Page 14: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

14

Machine Learning

Machine learning is concerned with the question of how to construct computer programs that

automatically improve with experience. Two distinguished learning frameworks according to the amount of supervision used:

– Supervised learning when the learning algorithm is presented with pairs of

strings of symbols., i.e. inflected and uninflected forms.– Unsupervised learning when the algorithm is presented merely with a single

set of words, and must work out what the morphological relationships are.

Page 15: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

15

Computational Dialectology

Use measures to compute the distance

between two given dialects and to define

geographical dialect boundaries.

Example: Edit Distance The distance could be set sensitive to

phonological similarities.

Example: |)||,(||||,| fddisttddist

Page 16: 1 Computational Investigation of Palestinian Arabic Dialects Ezra Daya Rafi Talmon Shuly Wintner

16

Previous Related Work

Morphological Tagging of the Qur’an:Morphological Tagging of the Qur’an:

The system facilitates a variety of queries on the Qur’anic text that make reference to the words and their linguistic attributes and provides full morphological tagging of its words.

The core of the system is a set of finite-state based rules which describe the morpho-phonological and morpho-syntactic phenomena of the Qur’anic language. The system is currently being used for teaching and

research purposes.