View
420
Download
1
Embed Size (px)
Citation preview
Ramzy [email protected]
ATKSArabic Toolkit Service
Outline
• Introduction • ATKS Modules• Link ATKS with C#• Example • Comparison
Introduction
Introduction
• The Arabic Toolkit Service (ATKS) offers a set of APIs for basic processing of writtenArabic language.
• The Toolkit is designed to help the Arabic developer by providing high-quality Arabic NLP APIs.
• The ATKS provides a rich set of APIs as SOAP Web Services, and covering the basic language processing operations through the following components
ATKS Modules
Modules
• Colloquial Converter (Work)• Diacritizer (Work)• Named Entity Recognition(Work)• Parser(Work)• POS Tagging (Work)• Sarf (Work)• Speller (dose not work)• Transliterater (dose not work)
Modules (Cont.)• Colloquial Converter
• The Colloquial Converter provides translation of Egyptian colloquial text into the equivalent Modern Standard Arabic text along with rich mapping information
• Diacritizer
Modules (Cont.)
• The automatic Diacritizer component performs vowel restoration on input Arabic text.• The main objective of the Diacritizer is to insert both missing vowels—diacritics—of the
stem and the missing vowel for the case ending.
• Named Entity Recognition
Modules (Cont.)
• The Named Entity Recognizer (NER) detects and classifies named entities in Arabic text.• It classifies them into three categories: persons, locations, and organizations. • It also provides a character index at which the named entity is located in the original text.
• Parser
Modules (Cont.)
• The Parser determines the grammatical structure of Arabic sentences, such as which groups of words combine to form phrases and which words are the subject or the object of a verb.
• The Parser relies heavily on the Arabic POS Tagger to identify the correct part of speech for each token in an input Arabic sentence, and the Arabic Named-Entity Recognizer to identify named entities in the input sentence after it has been corrected using the Arabic Auto-Corrector.
• POS Tagging
Modules (Cont.)
• The Part of Speech (POS) Tagger is responsible for identifying the correct part of speech for each token of any given Arabic sentence.
• The POS Tagger relies heavily on the Morphological Analyzer to extract the relevant morpho-syntactic features for the input words.
• The POS Tagger also relies on the Auto-Corrector to correct input text.
• Sarf
Modules (Cont.)
• Sarf provides automatic morphological analysis of Arabic words.• It provides all possible morphological analyses for any given input Arabic word.• Each analysis consists of the diacritized word and the morphological breakdown of the
analysis in terms of prefixes, stem, and suffixes. The stem is further decomposed into its root and morphological pattern.
• Moreover, each analysis carries the part of speech and a set of morpho-syntactic features such as gender, number, transitivity, verb voice, and verb mood.
• Speller
Modules (Cont.)
• The Speller detects and corrects misspelled words in Arabic text and is designed for Modern Standard Arabic.
• The Speller APIs also enable auto-correction of Common Arabic Mistakes, frequent orthographical errors.
• The main objective of the Speller is to enhance the quality of written Arabic text, hence improving the accuracy of the various Arabic text-processing components.
• Transliterater
Modules (Cont.)
• Transliteration is the conversion of text from one script to another while preserving the same pronunciation.
• The Transliterator provides translation of named entities, such as human and city names, from English to Arabic and vice versa—and conversion of text from Romanized Arabic to native Arabic script.
Link ATKS with C#
Steps• Sign up in this website :
http://research.microsoft.com/en-US/projects/atks/default.aspx• If you already have Microsoft account, you will just need to sign in and then complete
registration details for using ATKS Tool.• After registration they will send you a verification email to active your account.• After 1-2 days they will review your Application and then send other email that contains
AppID which will be use in your Application later.• After that go to Visual Studio – create new C# windows or console Application.• Go to solution explorer on the right side of your Visual studio on your screen and then
click on References menu the choose the second option (add service reference).• A pop-up window will appear copy your module link and name it the press OK.• In your Project code your must include the namespace of your service to make you be
able to use it public classes• After that write your own code and debug it. Wow Success and I get result • It’s good try other modules now.
Example
Comparison
ComparisonComparison
faces ATKS MADAMIRA NLTK
Simplicity Simple HardBecause of it
standard dealing
average
Programming languages
C# Java Python
Accessibility Web service only Stand –alone versionClient- server version
Python downloaded module
Arabic Support Guaranteed Guaranteed Limited
Adjustability Not available just sending feedbacks
Not available Available you can add your own grammer