30
Arabic Morphology Template Grammar-based Hassanin M. Al-Barhamtoshy, Khalid O. Thabit and Basil. Ba-Aziz KAU, Faculty of Computing and Information Technology, Jeddah Abstract This research presents a multi natural language processing model to be used in machine translation and language processing systems. We will describe problems of analysis, taken into our consideration ambiguity (lexically and syntactically). Different types of linguistic and non-linguistic knowledge are necessary to resolve these problems of ambiguity, and in this research we examine in more detail how to represent this knowledge. In addition, the research describes a system for generating natural-language sentences from syntax and lexical structures, taken into our point of view an internal (or interlingual) representation. Such model will be developed as part of an Arabic–English Machine Translation (MT) system; however, it is designed to be used for many other MT language pairs and natural language applications. Consequently, the contributions of this work include building dictionary to be used in automatic translation. 1. Introduction 1

Arabic Dictionary based on Stemming - hassanin.kau.edu.sahassanin.kau.edu.sa/Files/0052053/Files/5203_Arabic Morphology...  · Web viewIn information retrieval systems, such as CLIR,

Embed Size (px)

Citation preview

Arabic Morphology Template Grammar-based Hassanin M. Al-Barhamtoshy, Khalid O. Thabit and Basil. Ba-

AzizKAU, Faculty of Computing and Information Technology, Jeddah

Abstract This research presents a multi natural language processing model to be used in

machine translation and language processing systems. We will describe problems of

analysis, taken into our consideration ambiguity (lexically and syntactically).

Different types of linguistic and non-linguistic knowledge are necessary to resolve

these problems of ambiguity, and in this research we examine in more detail how to

represent this knowledge.

In addition, the research describes a system for generating natural-language sentences from syntax and lexical structures, taken into our point of view an internal (or interlingual) representation. Such model will be developed as part of an Arabic–English Machine Translation (MT) system; however, it is designed to be used for many other MT language pairs and natural language applications. Consequently, the contributions of this work include building dictionary to be used in automatic translation.

1. Introduction To make a good natural language processing (NLP) in translation models, the following subsection describes different sub-models of the NLP.1.1. DictionaryDictionaries are the largest components of machine translation (MT: or automatic

translation) systems in terms of the amount of information they hold. If they are more

then simple word lists, then they may well be the most expensive components to

construct [1-3]. Consequently, a user can make some additions to system dictionaries

to make a system useful.

One aspect point of view get an idea of the dictionary information size that may be

needed for commercial purposes a lexicon with 20 000 entries is often considered as

1

the minimum. However existing dictionary contains words - the Oxford English

Dictionary contains about 250 000 entries without being exhaustive even of general

usage. In a matter of fact, no dictionary can ever be complete [1, 2].

1.2. Word TypesIt is useful to make a distinction between the characteristics of a word and its inherent

properties with respect to its places (in sentence) in its grammatical environment.

Each word has type with respect to its morphological analysis. Although this types

include grammatical properties, like the indication of gender in some languages (the

Arabic or the French part of the bilingual dictionary entry), and the indication of

number on nouns. Typically, the citation form of nouns is the singular form [1-5].

1.3. Dictionaries and Morphology

Morphology means the internal structure of words, and how words can be formed. In

Arabic it is usual to categorize three different word formation processes [1,4,7]:

1 Inflectional processes, by means of which a word is derived from another word

form, acquiring certain grammatical features but maintaining the same part of

speech or category (e.g. walk, walks);

2 Derivational processes in which a word of a different category is derived from

another word or word stem by the application of some process (e.g. grammar

grammatical, grammatical grammaticality);

3 Compounding, in which independent words come together in some way to form a

new sentence unit, (in Arabic شكرناهم).

In Arabic, inflectional and derivational processes involve prefixes (as in فنشكر) and

suffixes (as in شكرناهم), and what is called pronouns inflection or subword. In other

languages, a range of devices such as changes in the vowel patterns of words,

doubling or reduplication of syllables, etc., are also found. Clearly, these prefixes and

suffixes (collectively known as affixes) cannot "stand alone" as words. Compounding

is quite different in that the parts can each occur as individual words.

1.4. Ambiguity

2

Most Natural Language Processing is concerned with only one meaning. However, as

we all know, this is not the case. When a word has more than one meaning, it is said

to be lexically ambiguous. When a phrase or sentence can have more than one

structure, it is said to be structurally ambiguous [4,5].

1.5. Semantic

Semantic is concerned with the meaning of words and how they combine to form

sentence meanings [5]. It is useful to distinguish lexical semantics, and structural

semantics- the former is to do with the meanings of words, the latter to do with the

meanings of phrases, including sentences [6].

There are many ways of thinking about and representing word meanings, but one that

has proved useful in the field of machine translation involves associating words with

semantic features which correspond to their sense components. For example, the

words man, woman, boy, and girl might be represented as [1, 5, 6]:

man = (+HUMAN, +MASCULINE and +ADULT)

woman = (+HUMAN, -MASCULINE and +ADULT)

boy = (+HUMAN, +MASCULINE and -ADULT)

girl = (+HUMAN, -MASCULINE and -ADULT)

In case of designing an Arabic translation dictionary, it must be professional in

linguist's translation. The following figures (1 and 2) give example as case studies for

English to Arabic and French to English translation examples [6].

Fig. (1): English to Arabic simple translator Fig. (2): French to English to simple translator

3

2. Building Arabic DictionaryIn information retrieval systems, such as CLIR, queries in one language retrieve relevant documents in other languages Machine-Readable Dictionary (MRD) and Machine Translation (MT) are important resources for query translation in CLIR[8]. Mohammed Aljlay and et al investigate MT and MRD to Arabic-English CLIR. The translation ambiguity associated with these resources is the key problem. They present three methods of query translation using a bilingual dictionary for Arabic-English CLIR [8].Out of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of the second language. In the present study, research of [9] presents a simple statistical technique to train English to Arabic transliteration model from pairs of names.Arabic requires good stemming for effective information retrieval due to highly inflected in derivations, yet no standard approach to stemming has emerged [10-13]. Several light stemmers is developed based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. The retrieval effectiveness of such stemmers are compared with morphological analyzer on the TREC-2001 data [10].The inflectional structure of word affects the retrieval accuracy of information retrieval systems of Latin-based languages. Different stemming algorithms for Arabic information retrieval systems are presented [11-18]. The effectiveness of surface-based retrieval is also investigated. This approach degrades retrieval precision since Arabic is a highly inflected language. Therefore, root-based retrieval model is proposed [11]. Also, a statistically significant improvement over the surface-based approach noticed.Arabic inflectional morphology requires infixation, prefixation and suffixation, giving rise to a large space of morphological variation

4

[12]. In this project an approach is described to reducing the complexity of Arabic morphology generation using grammar-based rules. By decoupling the problem of stem changes from that of prefixes and suffixes, significant reduction is gained in addition to the number of rules required, as much as a factor of three for certain verb types [18].Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compare them with the topic models. A native language hypothesis proposed stating that comparisons would be more effective in the original language of the story [21].Due to the high number of inflectional variations of Arabic words, empirical results suggest that stemming is essential for Arabic information retrieval. However, current light stemming algorithms do not extract the correct stem of irregular (so-called broken) plurals, which constitute ~10% of Arabic texts and ~41% of plurals. Although light stemming in particular has led to improvements in information retrieval [22].There have been advances in Cross-Language Information Retrieval (CLIR) in recent years. One of the major remaining reasons that CLIR does not perform as well as monolingual retrieval is the presence of out of vocabulary (OOV) terms. Previous work either has relied on manual intervention or has only been partially successful in solving this problem. Method is used to extend earlier work in this area by augmenting this with statistical analysis, and corpus-based translation [23]. In another paper, a system that recognizes place names in natural language text is described to produce geographic maps and animations showing the geographical coverage of texts about a certain subject as it changes over time. As the system is built to

5

analyze texts in many different languages, it restricts the usage of linguistic analysis tools to the minimum. Instead, it relies on a gazetteer (geo dictionary) containing place names in different languages and uses heuristics for disambiguation purposes [24]. A methodology for implementing natural language morphology in the functional language Haskell introduced in [25]. The main idea behind is simple as stated in [25], instead of working with un-typed regular expressions, which is the state of the art of morphology in computational linguistics, finite functions and algebraic data types are used. The definitions of these data types and functions are the language-dependent part of the morphology. For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between Japanese and English. In the article of [26], it describes a method for automatically creating and validating candidate Japanese transliterated terms of English words. A phonetic English dictionary and a set of probabilistic mapping rules are used [26].As participants in the TIDES Surprise language exercise, researchers at the University of Massachusetts helped collect Hindi-English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language-modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi [27]. A novel two-step fuzzy translation technique is presented for cross-lingual spelling variants. In the first stage, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second stage, the intermediate forms obtained in the first stage are translated into a target language using fuzzy matching [28].

6

While many investigations have explored the use of query expansion techniques to combat errors induced by translation, no study has yet examined the effectiveness of these techniques across resources of varying quality. This paper presents results using parallel corpora and bilingual wordlists that have been deliberately degraded prior to query [29].A cross-lingual, question-answering (CLQA) system for Hindi and English are developed [30]. It accepts questions in English, finds candidate answers in Hindi newspapers, and translates the answer candidates into English along with the context surrounding each answer. The system was developed as part of the surprise language exercise (SLE) within the TIDES program [30].

3. Proposed Model System Structure

The proposed model includes the following rules:

Step 1: The Arabic words are looked up in an Arabic electronic dictionary, and then

employees the morphological component that contains specific rules that deal

with the regularities of inflection. The appropriate category (for example:

noun or verb or special character) is assigned.

Step 2: Some rules of an Arabic grammar are used to try to parse the entire words.

Therefore, an advanced parser might work out that it is in fact a measure

modifier. However, it is quite possible that the parser parses the entire word to

find out its components (extract its implicit pronouns from affixes). This is

because the difference between the Arabic and some possible English

translations is not great.

Step 3: The Engine now applies source to target language (Arabic to English)

transformation rules. The first step here is to find translations of the Arabic

words in an Arabic to English dictionary.

We can now summarize some of the distinctive design features of this engine:

7

Input sentences are automatically parsed only as it is necessary for the

successful operation using various morphological and lexical rules (structured-

based) and phrasal transformation rules. The transformer engine is often content

to find out just a few incomplete pieces of information about the structure of

some of the phrases in a sentence, and where the main verb might be.

Morphological rules employed firstly, within all the possibilities of derivation

rules for all the words inside sentences. In practice, transformer model takes

some of analyzed features and then translate it into the target features. Thus in

the Arabic to English transformer system, we assumed that the grammar covered

only some features of Arabic.

Syntactic rules takes such analyzed features in added to the extracted features,

and therefore find the syntactic form of the sentence (surface representation).

The Lexical rules are done to find out if there are meaning of such

representation or not?

The use of limited grammars and incomplete parsing means that transformer

systems do not generally construct complex representations of input sentences-

in many cases, not even the simplest surface constituent structure tree.

Most of the engine's translational competence lies in the rules which transform

bits of input sentence into bits of output sentence, including the bilingual

dictionary rules. In a sense a transformer system has some knowledge of the

comparative grammar of the two languages-of what makes the one structurally

different from the other.

The proposed model is based on bilingual dictionary. Therefore, we'll try to create a new dictionary based on the philosophy of Word.Net dictionary [31]. Consequently, reports on the design and model implementation will be illustrated and executed based on bilingual Arabic/English dictionary. In a matter of fact, a relational database may be employed to store the syntactic and lexical indicators and conceptual relations.

3.1. Model Activity DiagramAs described in many literatures, activity diagram shows the flow of control,

using rounded rectangles. Figure 3 shows flow of control for the Find Root for a verb.

8

All transitions between activities are represented by an arrow. Horizontal bars are

used to simulate activities performed parallel.

The model is based on Arabic template dictionaries (Arabic verb types, Roots and

template patterns). Consequently, each rule will be illustrated according to the

relational database dictionary.

Fig. 3 : Find Arabic Root Activity Diagram

3.2. Generate Non Diacritic Arabic WordThis function is to generate a non diacritic Arabic word from an input which is the

Arabic root and the template for the word the following example can explain more

Parameter Value

Arabic Root شربTemplate بون 3 2 1ي

As described later in 4 about the vowel and letter mask as following Table

Letter Present Symbol

1 Present First Letter

2 Present Second Letter

9

3 Present Third Letter

Any Arabic Letter Same Arabic Letter

أبتثجحخدذرزسشصضطظعغفقكلمنهوي

Extended Arabic letters

إأآؤءئ

The output will be

Output Value

Generate Non Diacritic Arabic word يشربون

3.3. Generate Diacritic Arabic WordThis function is to generate a non diacritic Arabic word from an input which is the

Arabic root and the template for the word the following example can explain more

Parameter Value

Arabic Root شربTemplate Q 1 Q 2 Q 3X ي

As described later in 4.6 about the vowel and letter mask as following Table

Letter Present Symbol

1 Present First Letter

2 Present Second Letter

3 Present Third Letter

Q Fatha tَX Skoon uَAny Arabic Letter Same Arabic Letter أبتثجحخدذرزسشصضطظعغف

قكلمنهويExtended Arabic letters

إأآؤءئ

The output will be

Output Value

Generate Diacritic Arabic word uب tر tشt ي

3.3. Extract the Root of Arabic wordExtracting the root of Arabic using a little bit complex Algorithm which is using multiple functions and multiple mask, and in beginning the function should find the

10

Matched Templates to the input word. After that we remove all the non required characters and keep the original verb characters, the output will be all matched Roots.

3.4. Generate all possible derivative pattern Generating all possible derivative pattern uses different functions, at beginning we find the Type for the input root verb, and what kind of templates that applied to this verb, as example:verb TypeID Present RealRoot

وفى 29 2 وtفtىشرب

1 1 tب tر tش

شرب

1 5 tرِب tش

As we see in the table the verb وفى the verb type is 29 and the Present Type is 2 and which is matching only one table as following:

4. Arabic Template Rules of the Proposed Model

The three operations of affixes (prefixes, infixes and suffixes) can be used to extract the roots from Arabic words using derivations templates.

Also, the derived Arabic Words can be derived from Arabic roots after applying the three affixes templates.

The input Arabic word is employed with the second input (affixes templates: called Mask) to find out the Arabic root, as shown in Figure 4.

11

Fig. 4: Arabic Template Mask

4.1. Unsetting RulesOne of the rules may use AND operator, others may use OR operator or XOR operators to do so, use an unsetting mask with the same character length.Consequently, such rules for extracting root can be summarized as follows:

1. To unset a character in input Arabic word, use 0 fore the corresponding character in the mask.

2. To leave a character in the input Arabic word unchanged, use 1 for the corresponding character in the mask.

3. Use the AND/OR/XOR operators to extract the Arabic root and additional indicators.

To understand how these rules work, refer to figure 5 as an example

Fig. (5): Example of Arabic Template Mask for (يشكرون)

4.2. Setting Rules

12

Morphological Rules : AND, OR, XOR

Arabic Word

Mask : AffixesTemplates

Arabic Root+ Indicators

AND operator

011100

Root : = Output

Prefix = Suffix =

Infix = -

I/P Mask Template

I/P Word

This rule is employed to find out another derivation of words after the first rule (unsetting rules) or sole. Therefore use a setting mask with the same manner except the OR operator is used instead of AND. The setting rules algorithm can be summarized as the following steps:

1. To set a character in the input word, use 1 for corresponding character in the mask.

2. To leave a character in the input word unchanged, use 0 for the corresponding character in the mask.

To simplify those rules, refer to the characteristic of the OR operator as shown in figure (6) and assume that the input Arabic word is ( The mask should have stream of alternatives to find out all the .(شكرpossible derivations from the word (شكر).

Fig.(6): Example of Arabic Template Mask for (شكر) to find out its Arabic derivations

5. Results and DiscussionThere is a triliteral, quadrilateral, or pentaliteral Arabic verbs. Every Arabic verb has its own derivatives and these derivatives are depend on its type. About 30 types of the triliteral verbs contain 5321 verbs. This can produce 20000 templates. If affixes rules are

13

OR operator

000

Derivations O/P :

e.g.

Output

...

….

...

Adding Masking Rules

I/P Word

Prefixes Mask

Suffixes Mask

Infixes Mask

applied for these templates (4 prefixes and 30 suffixes), therefore the total number of Arabic word derived from verbs are 28,140,005 derivations.

5.1 Testing Arabic artificial words were used in testing the proposed model. Such words include all their various possible derived verbs, nouns, adjectives, adverbs, etc and various combinations of using affixes (prefixes, suffixes, infixes, and connected pronouns). The testing sample included 50 roots and their derivations. The results of this experiment are presented in table (1). The sample was composed of 60% (30 roots) of which was derived from sound verbs and 40% (20 roots) belonged to weak verbs.Table (1): Results of Testing the Proposed Model

Total number of hits

Correct Ratio Error ratio

Sound Verbs 30 100 % 0.00 %Weak Verbs 20 98 % 0.02 %Total 50 99 % 0.01 %The testing is used to find out:(1) Roots of entire Arabic words (figures 7-a,b and c).(2) Morphological analysis of the entire Arabic words with associated

analyzed properties, (figures (8-a, and b)(3) Possible diactrize of the entire Arabic words (figure 9).

14

Fig. (7-a): Example to find Root of the Arabic word (فسيكفيكهما)

15

Fig. (7-b): Example to find Root of the Arabic word (المهزوم)

Fig. (7-c): Example to find Root of the Arabic word (ِق)

Fig. (8-a): Example to find Properties of the Arabic word (فسيكفيكهم)

16

Fig. (8-b): Example to find Properties of the Arabic word (ضارب)

Fig. (9): Example to find Properties of the Arabic word (ضارب)

5.2 Complexity Due to proposed model complexity, so turn our attention to how morphological analysis is conducted by the proposed model, we find that the running time cost is determined by three component of the following algorithm:Step 1: Checking the existence of the entire Arabic word and order

of root using the proposed Arabic dictionary.Step 2: Validating prefixes and suffixes of the entire Arabic word

using the proposed template Arabic grammar.

17

Step 3: Validating infixes of the entire Arabic word – if needed.

Therefore, for the first step “Checking the existence of the entire Arabic word and order of root using the proposed Arabic dictionary”, the comparison is carried out character by character, i.e.; we should assume that the number of comparisons would be:

T1 = nWhere n is length of the entire Arabic word (n=3 for trilateral or 4 for quadrilateral, or 5 for pentaliteral).At the second step, if the entire Arabic word exists in a proper sequence after validating prefixes and suffixes, such that are checked against a list of stored prefixes and suffixes, the number of comparisons determined as follows:

T2 = Log Nps

Where, Nps is the number of prefixes and suffixes.The validation of word infixes depends on two factors [32]: the size of difference between positions of the letters of root in the entire word, and the list of infix letters to be checked. Accordingly, the number of comparisons would be calculated as follows:

T3 = D + MWhere, D is the number of comparisons for checking the difference, M is the number of character comparisons to match an infix against the set infixes.Consequently, the overall running time for our proposed model is computed as the sum of the three factors listed above.

T = T1 + T2 + T3

= n + (Log Nps) + (D + M)

References[1] Doug Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys, and Louisa Sadler,

MACHINE TRANSLATION: An Introductory Guide, 1995.[2] W.J. Hutchins and H.L. Somers. An Introduction to Machine Translation.

Academic Press, London, 1992. [3] http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/

18

[4] Hassanin M. Al-Barhamtoshy, Understanding of Arabic Text, Ph. D. dissertation, Al-Azhar University, 1992.

[5] Ronnie Cann. Formal Semantics. Cambridge University Press, Cambridge, 1993.[6] http://www.worldlingo.com/products_services/worldlingo_translator.html[7] Ashraf I Madkour and Hassanin M. Al-Barhamtoshy, Arabic Morphological

Analyzer, Al-Azhar Engineering International Conference, AEIC 1993, Cairo-Egypt.

[8] Mohammed Aljlayl, Ophir Frieder, Corpus Linguistics: Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation, Proceedings of the tenth international conference on Information and knowledge management, October 2001 .

[9] Nasreen AbdulJaleel, Leah S. Larkey, Information retrieval session 3: cross language retrieval: Statistical transliteration for english-arabic cross language information retrieval, Proceedings of the twelfth international conference on Information and knowledge management, November 2003.

[10] Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell, Arabic Information Retrieval: Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 2002.

[11] Mohammed Aljlayl, Ophir Frieder, Information retrieval 1: On arabic search: improving the retrieval effectiveness via a light stemming approach, Proceedings of the eleventh international conference on Information and knowledge management, November 2002.

[12] M. A. madkour, A. Al-samahy and Hassanin M. Al-Barhamtoshy, “An Arabic Morphological Analyzer”, Al Azhar Engineering International Confrence, AEIC 1991, Cairo, December 1991.

[13] N. H. Hegazi, and A. A. Elsharkawi. "An Approach to a Computerized Lexical Analyzer for Natural Arabic Text". Proceedings of the Arabic Language conference, Kuwait,1985.

[14] M. Geith, T. El-Sadany. "Arabic Morphological Analyzer on a Personal Computer". Proceedings of the First KSU Symposium on Computer Arabization.1987.

[15] S. S. Al-Fadaghi and F. S. Al-Anzi.” A new algorithm to generate Root-pattern Forms”. Proceedings of the 11th National Computer Conference, KFUPM, P.391. 1989.

[16] Y. Hilal “Morphological Analysis of Arabic Morphology", Computer Processing of the Arabic Language,Workshop Papers, vol. I, April, Kuwait.1985

[17] Botrous Thalouth and Abdullah Al-Dannan. “ A Comprehensive Arabic Morphological Analyzer /Generator”. IBM Kuwait Scientific Center. Feb. 1987.

[18] Imad A. Al-Sughaiyer and Ibrahim A Al-Kharashi “Arabic Morphological Analysis Techniques: A Comprehensive Survey”, CERI internal report, KACST 2003.

اللغوية،] 20[ الذخيرة باستخدام العربية اللغة لمفردات الصرفية القاعدة بناء مشروعواإللكترونيات، الحاسب بحوث معهد والتقنية، للعلوم العزيز عبد الملك مدينة

هـ.3/4/1424[20]Violetta Cavalli-Sforza, Abdelhadi Soudi, Teruko Mitamura , Arabic morphology

generation using a concatenative strategy, Proceedings of the first conference on North American chapter of the Association for Computational Linguistics, April 2000.

19

[21] Leah S. Larkey, Fangfang Feng, Margaret Connell, Victor Lavrenko, Machine learning for IR: Language-specific models in multilingual topic tracking, Proceedings of the 27th annual international conference on Research and development in information retrieval, July 2004.

[22] Abduelbaset Goweder, Massimo Poesio, Anne De Roeck, Posters: Broken plural detection for arabic information retrieval, Proceedings of the 27th annual international conference on Research and development in information retrieval, July 2004.

[23] Ying Zhang, Phil Vines, Cross-language information retrieval: Using the web for automated translation extraction in cross-language information retrieval, Proceedings of the 27th annual international conference on Research and development in information retrieval, July 2004.

[24] Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Tom De Groeve, Information access and retrieval (IAR): Geographical information recognition and visualization in texts written in various languages, Proceedings of the 2004 ACM symposium on Applied computing, March 2004.

[25] Markus Forsberg, Aarne Ranta, Functional morphology, ACM SIGPLAN Notices , Proceedings of the ninth ACM SIGPLAN international conference on Functional programming, Volume 39 Issue 9, September 2004 .

[26] Yan Qu, Gregory Grefenstette, David A. Evans, Cross-lingual information retrieval: Automatic transliteration for Japanese-to-English text retrieval, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 2003.

[27] Leah S. Larkey, Margaret E. Connell, Nasreen Abduljaleel, Hindi CLIR in thirty days, ACM Transactions on Asian Language Information Processing (TALIP), Volume 2 Issue 2, June 2003.

[28] Ari Pirkola, Jarmo Toivonen, Heikki Keskustalo, Kari Visala, Kalervo Järvelin, Cross-lingual information retrieval: Fuzzy translation of cross-lingual spelling variants, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, July 2003.

[29] Paul McNamee, James Mayfield, Cross-language Information Retrieval: Comparing cross-language query expansion techniques by degrading translation resources, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 2002.

[30] Satoshi Sekine, Ralph Grishman, Hindi-english cross-lingual question-answering system, ACM Transactions on Asian Language Information Processing (TALIP), Volume 2 Issue 3, September 2003.

[31] William J. Black and Sabri El-Kateb, A Prototype English-Arabic Dictionary based on Word Net, UMIST, Department of Computation, Manchester, M60 1QD, UK, Piek Vossen (Eds): GWC 2004, Proceedings, pp. 67-74.

[32] Suleiman H. Mustafa (2003), A Morphology-driven string matching approach to Arabic text searching, the Journal of Systems and Software 67 (2003) 77-87.

Hassanin M. Al-Barhamtoshy is a professor of computer science in the Department of Information Technology at King Abdulaziz University (Jeddah, Saudi Arabia).He earned his Ph.D. in computers and systems engineering from the University of Al-Azhar (Egypt) in 1992. He was granted several academic awards and scholarships. After graduation, he worked at Al-Azhar University and chaired many external projects for this four years. In 1996 he went on leave from Al-Azhar for six years during which he worked in the Department of Computer Science at King AbdulAziz University, Faculty of Science. He is at present a full professor at KAU, Faculty of Computing and Information Technology. He has published several papers in a number of research areas in computer science and computer engineering including natural language processing (especially

20

Arabization of computers), database and information retrieval systems, software engineering and artificial intelligence.

21