14
Centre for Development of Advanced Computing Pune University Campus, Ganesh Khind, Pune-411007 Tel. : 00-91-20-5694000-2 E-mail : [email protected] Website : http://.parc.cdacindia.com Resource Centre For Indian Language Technology Solutions – Urdu, Sindhi, Kashmiri C-DAC, Pune Achievements Achievements Achievements Achievements Achievements

Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

Centre for Development of Advanced ComputingPune University Campus Ganesh Khind Pune-411007

Tel 00-91-20-5694000-2 E-mail mdkcdacindiacomWebsite httpparccdacindiacom

Resource Centre ForIndian Language Technology Solutions ndash Urdu Sindhi Kashmiri

C-DAC Pune

AchievementsAchievementsAchievementsAchievementsAchievements

RCILTS-Urdu Sindhi amp KashmiriCentre for Development of Advanced

Computing Gist Pune

Introduction

ldquoDigital unite amp Knowledge for all the vision of TDILrdquo

MCampIT wishes to make a difference to the qualityof life of people of India through its TDIL missionand is investing significant amount of resourcestowards fulfilling this wish One such effort was theestablishment of ldquoResource centersrdquo for designdevelopment amp deployment of tools amp technologiesfor Indian languages

ldquoDissolving the language barrier the vision ofGISTrdquo

C-DAC has pioneered the Graphics and Intelligencebased Script Technology (GIST) which facilitatesthe use of Indian languages in IT GIST is gearedfor the Internet enabled world where all activitiesare gradually going online In its endeavor to stayabreast with technologies worldwide GIST has beenadopting the latest concepts to be able to stay tunedwith the changing IT scenario C-DAC GISTthrough its continuous RampD efforts has to its creditmany innovative products amp continues to be leadersin the language technology field

MCampIT through its TDIL Programme hasawarded C-DAC GIST with the project ldquoResourcecentrerdquo for development of tools amp technologies forPerso-Aarbic languages

Overall Goal of the Resource Centre

To empower people of India through the use ofInformation Technology solutions in Indianlanguages

Project Purpose

To improve the quality of life of people of Indiaby enabling them to use information technologyin Indian languages

To develop new products and services forprocessing information in Indian languages

To conduct research in computer processing ofIndian languages

Broad areas of work under the Perso-ArabicResource center

GIST workshop for usage of GIST tools fororganizations concerned with development ampdeployment of Indian language processing systems

Creation of a network of persons and organizationsworking on IT applications in Perso-Arabic( PA)languages namely Urdu Sindhi amp Kashmiri

Development of Language tools amp applications inPA scripts

Conduction of workshop to discuss onstandardization of PA scripts data amp fontrepresentations

Deployment of the developed technology productsamp services through a country-wide network

Services and Knowledge Bases

bull Creation of a web site for language tools solutionsand knowledge available in PA language

bull Imparting training for the usage of PA

bull language applications on IT

bull Establishments of Translation amp Subtitlingservices in PA languages

Products

bull High quality PA fonts

bull Dictionary Spell Checker tools in PA languages

bull Word processor package on PA languages

bull Transliteration tool between Hindi amp Urdu

bull Prototype of Pocket translator with Urdu support

bull Product for Subtitling Character GeneratorTeleprompter DVD in PA scripts

Activities Undertaken Under The Project

GIST workshop was conducted at C-DAC Punefrom 17th to 22nd July 2000 for usage of GISTseries of Indian Language Technologies tools andproducts for Resource Centers concerned withdevelopment and deployment of Indian Languagesolutions A team of 23 members from 8 Resourcecentres have attended

168

1 ER amp DC Trivendrum

2 Jawaharlal Nehru University New Delhi

3 Indian Institute Of Technology Kanpur

4 Orrisa Computer Application CenterBhubneswar

5 Thapar Institute Of Engg And TechnologyPunjab

6 Anna University Chennai

7 M S University Baroda

8 Indian Statistical Institute Calcutta

Covering over five days and total of 12 interactivesessions the subjects covered were

1 Building dictionary amp Designing spell checker

2 Keyboard driver

3 GIST software development kit

4 Multilingual web tools

5 Script code Font storage amp Design

During the workshop training kits were distributedwhich contained work manuals on following topics

1 Designing Indian Language Spell Checkers

2 Myths about Standards for Indian Languages

3 Enabling Indian Languages on Web

4 ISCII- Foundations amp Future

5 GIST Software Development Kit

6 Using GIST SDK in VB

7 Internet First Steps Upgrading an ExistingActiveX Control

8 Inscript Keyboard layout

9 Typing in Phonetic Keyboard

Two CDs on GIST sw and brochures on variouslanguage products amp fonts type face catalogue weredistributed

1 GIST based SW Developer CD

2 GIST Products amp Channel Information

A visit was organized to various departments of C-DAC GIST Laboratory National MultimediaResource Centre and National PARAM SuperComputing facility

Evolving Storage Font amp Inputting Standards

The main task before C-DAC was to evolvestandards for storage fonts amp inputting for thePerso-Arabic languages Development of USCII(Urdu standard for Information Interchange) wascarried out well before awarding of the project GISTTerminals designed amp developed by C-DAC arebased on these standards The support for the GISTterminals was limited to only the Naskhimplementation of the fonts

Accordingly with lots of efforts and valuable inputsfrom experts all over C-DAC drafted PASCIIstandards for storage keyboard standards amp fontstandards These proposed standards are publishedin the TDIL magazine ndash Vishwa Bharat forcomments suggestions from experts all over India

1 The PASCII standard - The Perso-ArabicStandard for Information Interchange

There are some peculiarities that can be seen inPerso-Arabic scripts

These scripts join letters with each other andtherefore letters have different forms as per theirposition in a ligature (the positions are beginningmiddle ending)

Some shapes do not have middle shape ie they donot join at both the ends For example alif vaodaal etc Every letter also has a standalone form

11 Characteristics of Proposed Standard

Its an 8-bit standard Supports letters for UrduArabic Sindhi Kashmiri

Defines PA alphabets in the upper ASCII leavinglower ASCII free for English Bilingual supportpossible

Defines numerals other than ASCII numbers (48to 57)(This may help supporting both ArabicNumerals 0-9 and language specific numerals)

Maintains the order of alphabets for perso-arabiclanguages Khate-kasheed is given lower value than

169

alphabets This will sort the words correctly eventhough they have khate-kasheed in between

Alphabets for different languages are placed in theirascending order Letters like ldquobheyrdquo are not providedfor URDU but kept for languages like SINDHIThe URDU may make use of ldquoberdquo and ldquochoTi-hairdquofor that

Minimal erabs are provided Tanveen for exampledo-zabar can be formed with the help of twoconsecutive zabar

Superscripts

Place for superscripts like khaRa-alif is provided

Place for superscripts for ARABIC is provided

Place for superscripts like ldquore-zerdquo ldquoainrdquo etc isprovided

Numerals are placed after erabs and super scripts(This is provided only to support display forlanguage specific numerals and standard numeralsie the ASCII numerals are available)

Only a few control characters are defined

12 Standardization of Urdu Keyboard

Overview

There is no standard keyboard available for URDUlanguage There are vendor specific keyboardsavailable which all differ in their layouts There arelot of keyboards designed by different companieswho provide Perso-Arabic support in theirapplications Unfortunately every company has itsown Keyboard layout that entirely differs from theotherrsquos Keyboard Layouts For example Microsofthas its own Keyboard layouts for Perso-Arabiclanguages

Drawbacks of Available Keyboards (in view ofURDU language)

No standard keyboard available

All vendor specific keyboards differ in their layout

Most of the keyboards have been designed forArabic

Although Arabic and Urdu have similar script theyare both different languages and hence a Keyboard

designed for Arabic may not be that useful for Urdulanguage CDAC has taken this care of languages todesign the keyboard for URDU SINDHI ampKASHMIRI and tried to come up with an optimumsolution

(Details of these standards can be found in TDILVishwabharat magazine)

13 Standardisation of Fonts

Characterstics of Perso-Arabic languages

Urdu has traditionally been written in the Nastaleeqscript Although the script employs the basic lettersof the language the rendering of these letters in aword is extremely complex The reason for thiscomplexity is that Urdu text has traditionally beencomposed through calligraphy a medium whoseprecepts are based on the aesthetic sense of thecalligrapher rather than on any formula So great isthe variation in calligraphy that many times it isdifficult to recognize the letters in a constituentword This is because in their calligraphed formthe individual letters partially or completely fuse intoeach other thereby losing their identity A degree offusion is purposely introduced to make the resultingfused glyph visually appealing

Another characteristic of the Urdu is the existenceof diacritics Diacritics although sparingly used helpin the proper pronunciation of the constituent wordThe diacritics appear above or below a character todefine a vowel or emphasize a particular sound Theyare essential for removal of ambiguities naturallanguage processing and speech synthesis

A Nastaliq style font ndash designed to be used forprinting and display The font is based on traditionalNastaliq style written in India

Arabic Font ndash A high quality Arabic Font is designedto compliment religious text in Urdu

Considerations for Digital Font Design

Following is a group-wise list of characters thatwere taken into consideration while designingfonts for the Perso-Arabic languages

1 Alphabet

2 Numerals

170

3 Special Characters

4 Diacritics

5 Religious and linguistic Symbols

6 Control characters

The 16-bit Naskh amp Nastaliq Font

The Fonts developed by CDAC are 16-bit and arealso defined in the User Area of the Unicode rangeThe ASCII range is not used and can be used fordifferent purposes (can be used to English supportfor example)

Includes

bull all the basic shapes

bull all the starting shapes and variations

bull all the middle shapes and variations

bull all the ending shapes and variations

bull levels for erabs (short vowels)

bull complete ligatures

bull Beginning ligatures

bull Middle ligatures

bull Ending ligatures

bull missing character glyph

Glyph Standards decided for FONTDevelopment

Urdu amp Kashmiri- Naskh amp Nastaliq script

Sindhi - Naskh script

bull Basic letters (Vowels Consonants)

bull Full shape letters

bull Beginning

bull Medial

bull Final

bull Ligatures

bull Graphic components

bull Numerals Signs and Symbols etc

2 Script Core Technology amp Rule Engine

Because of the complex nature of the Nastaliq scriptwhich is written right to left and top to bottomdiagonally it was required to make certain databasesand use them along with the rule engine to makedisplay possible Following are some tools which weredeveloped to create these databases

21 Glyph Property Editor

In Urdu and all other Perso Arabic languages eachcharacter has many different shapes dependingupon the position of that character 1) the standalone 2) the starting position 3) the middle position4) the ending position etc So according to theposition of the character and the letters with it itsjoining that particular shape is to be displayed Thisis an application that allows to store the data andthe compositional information for various shapesin a font file for proper alignment of characters fordisplay ( Urdu requires a larger number of glyphsthan other scripts) This takes care of the horizontaland vertical movement for Nashtaliq script

22 Rule Engine

A module that containing rules for each shape Therules indicate how and which shape will join withother shape or shapes based on the succeeding andpreceding shapes that occur in Nastaliq textcomposition This facilitates more accurate and fasterdisplay of the shape composition in the Nashtaliqtext composition

23 Testing version of Urdu Editor

A simple text editor is ready to test the Rule Engineand check the various rules formed for displayingthe glyphs This editor allows to write the text in leftto right and handles all keyboard mappings fromASCII to USCII and Urdu text is displayed

24 Font Glyph Editor

In Nastaleeq script (for Urdu) the shapes join fromright to left and within a ligature they also shiftvertically This forms a ligature So in a ligatureshapes join right to left and top to bottom Thevertical positioning information cannot be put inthe font itself Hence it was decided to make a utilitywhich will let one set this information in separate file

171

Example of Nastaleeq joints (ldquosehanrdquo =gt suad baRi-he noon)

Purpose

To select a glyph from the given font and characterand define on it following information

1 Starting point (SP) the point where a glyph endsThis is generally on baseline or zero

2 Ending point (EP) the point where a glyph joinswith another glyph on its starting point (SP ofanother glyph) This point is less than or equal tothe glyph width

3 Dot points (UP and DP) dot points where a dotor diacritic mark will be positioned on the glyph

25 The Font Editor Utility

The Font Editor utility lets us place the dotinformation for each glyph The file thus generatedis used for retrieving dot information at run timeWersquoll call it a dot file

The following figure depicts a glyph with dot andcaret information As shown in the figure the utilityhelps one define following for a glyph

1 Font Name

2 Unicode value of the glyph

3 Glyph Index in dot file (this is the unique id ofthe glyph)

4 Start and End points

5 Dot positions

6 Caret position

Fonts following are samples of fonts developed

(Ghalib)

(Ghalib Bold)

(Imaad)

(Kamran)

(Maghribi)

(Makhdoum)

(Murasilat)

(Riyadh)

172

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 2: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

RCILTS-Urdu Sindhi amp KashmiriCentre for Development of Advanced

Computing Gist Pune

Introduction

ldquoDigital unite amp Knowledge for all the vision of TDILrdquo

MCampIT wishes to make a difference to the qualityof life of people of India through its TDIL missionand is investing significant amount of resourcestowards fulfilling this wish One such effort was theestablishment of ldquoResource centersrdquo for designdevelopment amp deployment of tools amp technologiesfor Indian languages

ldquoDissolving the language barrier the vision ofGISTrdquo

C-DAC has pioneered the Graphics and Intelligencebased Script Technology (GIST) which facilitatesthe use of Indian languages in IT GIST is gearedfor the Internet enabled world where all activitiesare gradually going online In its endeavor to stayabreast with technologies worldwide GIST has beenadopting the latest concepts to be able to stay tunedwith the changing IT scenario C-DAC GISTthrough its continuous RampD efforts has to its creditmany innovative products amp continues to be leadersin the language technology field

MCampIT through its TDIL Programme hasawarded C-DAC GIST with the project ldquoResourcecentrerdquo for development of tools amp technologies forPerso-Aarbic languages

Overall Goal of the Resource Centre

To empower people of India through the use ofInformation Technology solutions in Indianlanguages

Project Purpose

To improve the quality of life of people of Indiaby enabling them to use information technologyin Indian languages

To develop new products and services forprocessing information in Indian languages

To conduct research in computer processing ofIndian languages

Broad areas of work under the Perso-ArabicResource center

GIST workshop for usage of GIST tools fororganizations concerned with development ampdeployment of Indian language processing systems

Creation of a network of persons and organizationsworking on IT applications in Perso-Arabic( PA)languages namely Urdu Sindhi amp Kashmiri

Development of Language tools amp applications inPA scripts

Conduction of workshop to discuss onstandardization of PA scripts data amp fontrepresentations

Deployment of the developed technology productsamp services through a country-wide network

Services and Knowledge Bases

bull Creation of a web site for language tools solutionsand knowledge available in PA language

bull Imparting training for the usage of PA

bull language applications on IT

bull Establishments of Translation amp Subtitlingservices in PA languages

Products

bull High quality PA fonts

bull Dictionary Spell Checker tools in PA languages

bull Word processor package on PA languages

bull Transliteration tool between Hindi amp Urdu

bull Prototype of Pocket translator with Urdu support

bull Product for Subtitling Character GeneratorTeleprompter DVD in PA scripts

Activities Undertaken Under The Project

GIST workshop was conducted at C-DAC Punefrom 17th to 22nd July 2000 for usage of GISTseries of Indian Language Technologies tools andproducts for Resource Centers concerned withdevelopment and deployment of Indian Languagesolutions A team of 23 members from 8 Resourcecentres have attended

168

1 ER amp DC Trivendrum

2 Jawaharlal Nehru University New Delhi

3 Indian Institute Of Technology Kanpur

4 Orrisa Computer Application CenterBhubneswar

5 Thapar Institute Of Engg And TechnologyPunjab

6 Anna University Chennai

7 M S University Baroda

8 Indian Statistical Institute Calcutta

Covering over five days and total of 12 interactivesessions the subjects covered were

1 Building dictionary amp Designing spell checker

2 Keyboard driver

3 GIST software development kit

4 Multilingual web tools

5 Script code Font storage amp Design

During the workshop training kits were distributedwhich contained work manuals on following topics

1 Designing Indian Language Spell Checkers

2 Myths about Standards for Indian Languages

3 Enabling Indian Languages on Web

4 ISCII- Foundations amp Future

5 GIST Software Development Kit

6 Using GIST SDK in VB

7 Internet First Steps Upgrading an ExistingActiveX Control

8 Inscript Keyboard layout

9 Typing in Phonetic Keyboard

Two CDs on GIST sw and brochures on variouslanguage products amp fonts type face catalogue weredistributed

1 GIST based SW Developer CD

2 GIST Products amp Channel Information

A visit was organized to various departments of C-DAC GIST Laboratory National MultimediaResource Centre and National PARAM SuperComputing facility

Evolving Storage Font amp Inputting Standards

The main task before C-DAC was to evolvestandards for storage fonts amp inputting for thePerso-Arabic languages Development of USCII(Urdu standard for Information Interchange) wascarried out well before awarding of the project GISTTerminals designed amp developed by C-DAC arebased on these standards The support for the GISTterminals was limited to only the Naskhimplementation of the fonts

Accordingly with lots of efforts and valuable inputsfrom experts all over C-DAC drafted PASCIIstandards for storage keyboard standards amp fontstandards These proposed standards are publishedin the TDIL magazine ndash Vishwa Bharat forcomments suggestions from experts all over India

1 The PASCII standard - The Perso-ArabicStandard for Information Interchange

There are some peculiarities that can be seen inPerso-Arabic scripts

These scripts join letters with each other andtherefore letters have different forms as per theirposition in a ligature (the positions are beginningmiddle ending)

Some shapes do not have middle shape ie they donot join at both the ends For example alif vaodaal etc Every letter also has a standalone form

11 Characteristics of Proposed Standard

Its an 8-bit standard Supports letters for UrduArabic Sindhi Kashmiri

Defines PA alphabets in the upper ASCII leavinglower ASCII free for English Bilingual supportpossible

Defines numerals other than ASCII numbers (48to 57)(This may help supporting both ArabicNumerals 0-9 and language specific numerals)

Maintains the order of alphabets for perso-arabiclanguages Khate-kasheed is given lower value than

169

alphabets This will sort the words correctly eventhough they have khate-kasheed in between

Alphabets for different languages are placed in theirascending order Letters like ldquobheyrdquo are not providedfor URDU but kept for languages like SINDHIThe URDU may make use of ldquoberdquo and ldquochoTi-hairdquofor that

Minimal erabs are provided Tanveen for exampledo-zabar can be formed with the help of twoconsecutive zabar

Superscripts

Place for superscripts like khaRa-alif is provided

Place for superscripts for ARABIC is provided

Place for superscripts like ldquore-zerdquo ldquoainrdquo etc isprovided

Numerals are placed after erabs and super scripts(This is provided only to support display forlanguage specific numerals and standard numeralsie the ASCII numerals are available)

Only a few control characters are defined

12 Standardization of Urdu Keyboard

Overview

There is no standard keyboard available for URDUlanguage There are vendor specific keyboardsavailable which all differ in their layouts There arelot of keyboards designed by different companieswho provide Perso-Arabic support in theirapplications Unfortunately every company has itsown Keyboard layout that entirely differs from theotherrsquos Keyboard Layouts For example Microsofthas its own Keyboard layouts for Perso-Arabiclanguages

Drawbacks of Available Keyboards (in view ofURDU language)

No standard keyboard available

All vendor specific keyboards differ in their layout

Most of the keyboards have been designed forArabic

Although Arabic and Urdu have similar script theyare both different languages and hence a Keyboard

designed for Arabic may not be that useful for Urdulanguage CDAC has taken this care of languages todesign the keyboard for URDU SINDHI ampKASHMIRI and tried to come up with an optimumsolution

(Details of these standards can be found in TDILVishwabharat magazine)

13 Standardisation of Fonts

Characterstics of Perso-Arabic languages

Urdu has traditionally been written in the Nastaleeqscript Although the script employs the basic lettersof the language the rendering of these letters in aword is extremely complex The reason for thiscomplexity is that Urdu text has traditionally beencomposed through calligraphy a medium whoseprecepts are based on the aesthetic sense of thecalligrapher rather than on any formula So great isthe variation in calligraphy that many times it isdifficult to recognize the letters in a constituentword This is because in their calligraphed formthe individual letters partially or completely fuse intoeach other thereby losing their identity A degree offusion is purposely introduced to make the resultingfused glyph visually appealing

Another characteristic of the Urdu is the existenceof diacritics Diacritics although sparingly used helpin the proper pronunciation of the constituent wordThe diacritics appear above or below a character todefine a vowel or emphasize a particular sound Theyare essential for removal of ambiguities naturallanguage processing and speech synthesis

A Nastaliq style font ndash designed to be used forprinting and display The font is based on traditionalNastaliq style written in India

Arabic Font ndash A high quality Arabic Font is designedto compliment religious text in Urdu

Considerations for Digital Font Design

Following is a group-wise list of characters thatwere taken into consideration while designingfonts for the Perso-Arabic languages

1 Alphabet

2 Numerals

170

3 Special Characters

4 Diacritics

5 Religious and linguistic Symbols

6 Control characters

The 16-bit Naskh amp Nastaliq Font

The Fonts developed by CDAC are 16-bit and arealso defined in the User Area of the Unicode rangeThe ASCII range is not used and can be used fordifferent purposes (can be used to English supportfor example)

Includes

bull all the basic shapes

bull all the starting shapes and variations

bull all the middle shapes and variations

bull all the ending shapes and variations

bull levels for erabs (short vowels)

bull complete ligatures

bull Beginning ligatures

bull Middle ligatures

bull Ending ligatures

bull missing character glyph

Glyph Standards decided for FONTDevelopment

Urdu amp Kashmiri- Naskh amp Nastaliq script

Sindhi - Naskh script

bull Basic letters (Vowels Consonants)

bull Full shape letters

bull Beginning

bull Medial

bull Final

bull Ligatures

bull Graphic components

bull Numerals Signs and Symbols etc

2 Script Core Technology amp Rule Engine

Because of the complex nature of the Nastaliq scriptwhich is written right to left and top to bottomdiagonally it was required to make certain databasesand use them along with the rule engine to makedisplay possible Following are some tools which weredeveloped to create these databases

21 Glyph Property Editor

In Urdu and all other Perso Arabic languages eachcharacter has many different shapes dependingupon the position of that character 1) the standalone 2) the starting position 3) the middle position4) the ending position etc So according to theposition of the character and the letters with it itsjoining that particular shape is to be displayed Thisis an application that allows to store the data andthe compositional information for various shapesin a font file for proper alignment of characters fordisplay ( Urdu requires a larger number of glyphsthan other scripts) This takes care of the horizontaland vertical movement for Nashtaliq script

22 Rule Engine

A module that containing rules for each shape Therules indicate how and which shape will join withother shape or shapes based on the succeeding andpreceding shapes that occur in Nastaliq textcomposition This facilitates more accurate and fasterdisplay of the shape composition in the Nashtaliqtext composition

23 Testing version of Urdu Editor

A simple text editor is ready to test the Rule Engineand check the various rules formed for displayingthe glyphs This editor allows to write the text in leftto right and handles all keyboard mappings fromASCII to USCII and Urdu text is displayed

24 Font Glyph Editor

In Nastaleeq script (for Urdu) the shapes join fromright to left and within a ligature they also shiftvertically This forms a ligature So in a ligatureshapes join right to left and top to bottom Thevertical positioning information cannot be put inthe font itself Hence it was decided to make a utilitywhich will let one set this information in separate file

171

Example of Nastaleeq joints (ldquosehanrdquo =gt suad baRi-he noon)

Purpose

To select a glyph from the given font and characterand define on it following information

1 Starting point (SP) the point where a glyph endsThis is generally on baseline or zero

2 Ending point (EP) the point where a glyph joinswith another glyph on its starting point (SP ofanother glyph) This point is less than or equal tothe glyph width

3 Dot points (UP and DP) dot points where a dotor diacritic mark will be positioned on the glyph

25 The Font Editor Utility

The Font Editor utility lets us place the dotinformation for each glyph The file thus generatedis used for retrieving dot information at run timeWersquoll call it a dot file

The following figure depicts a glyph with dot andcaret information As shown in the figure the utilityhelps one define following for a glyph

1 Font Name

2 Unicode value of the glyph

3 Glyph Index in dot file (this is the unique id ofthe glyph)

4 Start and End points

5 Dot positions

6 Caret position

Fonts following are samples of fonts developed

(Ghalib)

(Ghalib Bold)

(Imaad)

(Kamran)

(Maghribi)

(Makhdoum)

(Murasilat)

(Riyadh)

172

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 3: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

1 ER amp DC Trivendrum

2 Jawaharlal Nehru University New Delhi

3 Indian Institute Of Technology Kanpur

4 Orrisa Computer Application CenterBhubneswar

5 Thapar Institute Of Engg And TechnologyPunjab

6 Anna University Chennai

7 M S University Baroda

8 Indian Statistical Institute Calcutta

Covering over five days and total of 12 interactivesessions the subjects covered were

1 Building dictionary amp Designing spell checker

2 Keyboard driver

3 GIST software development kit

4 Multilingual web tools

5 Script code Font storage amp Design

During the workshop training kits were distributedwhich contained work manuals on following topics

1 Designing Indian Language Spell Checkers

2 Myths about Standards for Indian Languages

3 Enabling Indian Languages on Web

4 ISCII- Foundations amp Future

5 GIST Software Development Kit

6 Using GIST SDK in VB

7 Internet First Steps Upgrading an ExistingActiveX Control

8 Inscript Keyboard layout

9 Typing in Phonetic Keyboard

Two CDs on GIST sw and brochures on variouslanguage products amp fonts type face catalogue weredistributed

1 GIST based SW Developer CD

2 GIST Products amp Channel Information

A visit was organized to various departments of C-DAC GIST Laboratory National MultimediaResource Centre and National PARAM SuperComputing facility

Evolving Storage Font amp Inputting Standards

The main task before C-DAC was to evolvestandards for storage fonts amp inputting for thePerso-Arabic languages Development of USCII(Urdu standard for Information Interchange) wascarried out well before awarding of the project GISTTerminals designed amp developed by C-DAC arebased on these standards The support for the GISTterminals was limited to only the Naskhimplementation of the fonts

Accordingly with lots of efforts and valuable inputsfrom experts all over C-DAC drafted PASCIIstandards for storage keyboard standards amp fontstandards These proposed standards are publishedin the TDIL magazine ndash Vishwa Bharat forcomments suggestions from experts all over India

1 The PASCII standard - The Perso-ArabicStandard for Information Interchange

There are some peculiarities that can be seen inPerso-Arabic scripts

These scripts join letters with each other andtherefore letters have different forms as per theirposition in a ligature (the positions are beginningmiddle ending)

Some shapes do not have middle shape ie they donot join at both the ends For example alif vaodaal etc Every letter also has a standalone form

11 Characteristics of Proposed Standard

Its an 8-bit standard Supports letters for UrduArabic Sindhi Kashmiri

Defines PA alphabets in the upper ASCII leavinglower ASCII free for English Bilingual supportpossible

Defines numerals other than ASCII numbers (48to 57)(This may help supporting both ArabicNumerals 0-9 and language specific numerals)

Maintains the order of alphabets for perso-arabiclanguages Khate-kasheed is given lower value than

169

alphabets This will sort the words correctly eventhough they have khate-kasheed in between

Alphabets for different languages are placed in theirascending order Letters like ldquobheyrdquo are not providedfor URDU but kept for languages like SINDHIThe URDU may make use of ldquoberdquo and ldquochoTi-hairdquofor that

Minimal erabs are provided Tanveen for exampledo-zabar can be formed with the help of twoconsecutive zabar

Superscripts

Place for superscripts like khaRa-alif is provided

Place for superscripts for ARABIC is provided

Place for superscripts like ldquore-zerdquo ldquoainrdquo etc isprovided

Numerals are placed after erabs and super scripts(This is provided only to support display forlanguage specific numerals and standard numeralsie the ASCII numerals are available)

Only a few control characters are defined

12 Standardization of Urdu Keyboard

Overview

There is no standard keyboard available for URDUlanguage There are vendor specific keyboardsavailable which all differ in their layouts There arelot of keyboards designed by different companieswho provide Perso-Arabic support in theirapplications Unfortunately every company has itsown Keyboard layout that entirely differs from theotherrsquos Keyboard Layouts For example Microsofthas its own Keyboard layouts for Perso-Arabiclanguages

Drawbacks of Available Keyboards (in view ofURDU language)

No standard keyboard available

All vendor specific keyboards differ in their layout

Most of the keyboards have been designed forArabic

Although Arabic and Urdu have similar script theyare both different languages and hence a Keyboard

designed for Arabic may not be that useful for Urdulanguage CDAC has taken this care of languages todesign the keyboard for URDU SINDHI ampKASHMIRI and tried to come up with an optimumsolution

(Details of these standards can be found in TDILVishwabharat magazine)

13 Standardisation of Fonts

Characterstics of Perso-Arabic languages

Urdu has traditionally been written in the Nastaleeqscript Although the script employs the basic lettersof the language the rendering of these letters in aword is extremely complex The reason for thiscomplexity is that Urdu text has traditionally beencomposed through calligraphy a medium whoseprecepts are based on the aesthetic sense of thecalligrapher rather than on any formula So great isthe variation in calligraphy that many times it isdifficult to recognize the letters in a constituentword This is because in their calligraphed formthe individual letters partially or completely fuse intoeach other thereby losing their identity A degree offusion is purposely introduced to make the resultingfused glyph visually appealing

Another characteristic of the Urdu is the existenceof diacritics Diacritics although sparingly used helpin the proper pronunciation of the constituent wordThe diacritics appear above or below a character todefine a vowel or emphasize a particular sound Theyare essential for removal of ambiguities naturallanguage processing and speech synthesis

A Nastaliq style font ndash designed to be used forprinting and display The font is based on traditionalNastaliq style written in India

Arabic Font ndash A high quality Arabic Font is designedto compliment religious text in Urdu

Considerations for Digital Font Design

Following is a group-wise list of characters thatwere taken into consideration while designingfonts for the Perso-Arabic languages

1 Alphabet

2 Numerals

170

3 Special Characters

4 Diacritics

5 Religious and linguistic Symbols

6 Control characters

The 16-bit Naskh amp Nastaliq Font

The Fonts developed by CDAC are 16-bit and arealso defined in the User Area of the Unicode rangeThe ASCII range is not used and can be used fordifferent purposes (can be used to English supportfor example)

Includes

bull all the basic shapes

bull all the starting shapes and variations

bull all the middle shapes and variations

bull all the ending shapes and variations

bull levels for erabs (short vowels)

bull complete ligatures

bull Beginning ligatures

bull Middle ligatures

bull Ending ligatures

bull missing character glyph

Glyph Standards decided for FONTDevelopment

Urdu amp Kashmiri- Naskh amp Nastaliq script

Sindhi - Naskh script

bull Basic letters (Vowels Consonants)

bull Full shape letters

bull Beginning

bull Medial

bull Final

bull Ligatures

bull Graphic components

bull Numerals Signs and Symbols etc

2 Script Core Technology amp Rule Engine

Because of the complex nature of the Nastaliq scriptwhich is written right to left and top to bottomdiagonally it was required to make certain databasesand use them along with the rule engine to makedisplay possible Following are some tools which weredeveloped to create these databases

21 Glyph Property Editor

In Urdu and all other Perso Arabic languages eachcharacter has many different shapes dependingupon the position of that character 1) the standalone 2) the starting position 3) the middle position4) the ending position etc So according to theposition of the character and the letters with it itsjoining that particular shape is to be displayed Thisis an application that allows to store the data andthe compositional information for various shapesin a font file for proper alignment of characters fordisplay ( Urdu requires a larger number of glyphsthan other scripts) This takes care of the horizontaland vertical movement for Nashtaliq script

22 Rule Engine

A module that containing rules for each shape Therules indicate how and which shape will join withother shape or shapes based on the succeeding andpreceding shapes that occur in Nastaliq textcomposition This facilitates more accurate and fasterdisplay of the shape composition in the Nashtaliqtext composition

23 Testing version of Urdu Editor

A simple text editor is ready to test the Rule Engineand check the various rules formed for displayingthe glyphs This editor allows to write the text in leftto right and handles all keyboard mappings fromASCII to USCII and Urdu text is displayed

24 Font Glyph Editor

In Nastaleeq script (for Urdu) the shapes join fromright to left and within a ligature they also shiftvertically This forms a ligature So in a ligatureshapes join right to left and top to bottom Thevertical positioning information cannot be put inthe font itself Hence it was decided to make a utilitywhich will let one set this information in separate file

171

Example of Nastaleeq joints (ldquosehanrdquo =gt suad baRi-he noon)

Purpose

To select a glyph from the given font and characterand define on it following information

1 Starting point (SP) the point where a glyph endsThis is generally on baseline or zero

2 Ending point (EP) the point where a glyph joinswith another glyph on its starting point (SP ofanother glyph) This point is less than or equal tothe glyph width

3 Dot points (UP and DP) dot points where a dotor diacritic mark will be positioned on the glyph

25 The Font Editor Utility

The Font Editor utility lets us place the dotinformation for each glyph The file thus generatedis used for retrieving dot information at run timeWersquoll call it a dot file

The following figure depicts a glyph with dot andcaret information As shown in the figure the utilityhelps one define following for a glyph

1 Font Name

2 Unicode value of the glyph

3 Glyph Index in dot file (this is the unique id ofthe glyph)

4 Start and End points

5 Dot positions

6 Caret position

Fonts following are samples of fonts developed

(Ghalib)

(Ghalib Bold)

(Imaad)

(Kamran)

(Maghribi)

(Makhdoum)

(Murasilat)

(Riyadh)

172

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 4: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

alphabets This will sort the words correctly eventhough they have khate-kasheed in between

Alphabets for different languages are placed in theirascending order Letters like ldquobheyrdquo are not providedfor URDU but kept for languages like SINDHIThe URDU may make use of ldquoberdquo and ldquochoTi-hairdquofor that

Minimal erabs are provided Tanveen for exampledo-zabar can be formed with the help of twoconsecutive zabar

Superscripts

Place for superscripts like khaRa-alif is provided

Place for superscripts for ARABIC is provided

Place for superscripts like ldquore-zerdquo ldquoainrdquo etc isprovided

Numerals are placed after erabs and super scripts(This is provided only to support display forlanguage specific numerals and standard numeralsie the ASCII numerals are available)

Only a few control characters are defined

12 Standardization of Urdu Keyboard

Overview

There is no standard keyboard available for URDUlanguage There are vendor specific keyboardsavailable which all differ in their layouts There arelot of keyboards designed by different companieswho provide Perso-Arabic support in theirapplications Unfortunately every company has itsown Keyboard layout that entirely differs from theotherrsquos Keyboard Layouts For example Microsofthas its own Keyboard layouts for Perso-Arabiclanguages

Drawbacks of Available Keyboards (in view ofURDU language)

No standard keyboard available

All vendor specific keyboards differ in their layout

Most of the keyboards have been designed forArabic

Although Arabic and Urdu have similar script theyare both different languages and hence a Keyboard

designed for Arabic may not be that useful for Urdulanguage CDAC has taken this care of languages todesign the keyboard for URDU SINDHI ampKASHMIRI and tried to come up with an optimumsolution

(Details of these standards can be found in TDILVishwabharat magazine)

13 Standardisation of Fonts

Characterstics of Perso-Arabic languages

Urdu has traditionally been written in the Nastaleeqscript Although the script employs the basic lettersof the language the rendering of these letters in aword is extremely complex The reason for thiscomplexity is that Urdu text has traditionally beencomposed through calligraphy a medium whoseprecepts are based on the aesthetic sense of thecalligrapher rather than on any formula So great isthe variation in calligraphy that many times it isdifficult to recognize the letters in a constituentword This is because in their calligraphed formthe individual letters partially or completely fuse intoeach other thereby losing their identity A degree offusion is purposely introduced to make the resultingfused glyph visually appealing

Another characteristic of the Urdu is the existenceof diacritics Diacritics although sparingly used helpin the proper pronunciation of the constituent wordThe diacritics appear above or below a character todefine a vowel or emphasize a particular sound Theyare essential for removal of ambiguities naturallanguage processing and speech synthesis

A Nastaliq style font ndash designed to be used forprinting and display The font is based on traditionalNastaliq style written in India

Arabic Font ndash A high quality Arabic Font is designedto compliment religious text in Urdu

Considerations for Digital Font Design

Following is a group-wise list of characters thatwere taken into consideration while designingfonts for the Perso-Arabic languages

1 Alphabet

2 Numerals

170

3 Special Characters

4 Diacritics

5 Religious and linguistic Symbols

6 Control characters

The 16-bit Naskh amp Nastaliq Font

The Fonts developed by CDAC are 16-bit and arealso defined in the User Area of the Unicode rangeThe ASCII range is not used and can be used fordifferent purposes (can be used to English supportfor example)

Includes

bull all the basic shapes

bull all the starting shapes and variations

bull all the middle shapes and variations

bull all the ending shapes and variations

bull levels for erabs (short vowels)

bull complete ligatures

bull Beginning ligatures

bull Middle ligatures

bull Ending ligatures

bull missing character glyph

Glyph Standards decided for FONTDevelopment

Urdu amp Kashmiri- Naskh amp Nastaliq script

Sindhi - Naskh script

bull Basic letters (Vowels Consonants)

bull Full shape letters

bull Beginning

bull Medial

bull Final

bull Ligatures

bull Graphic components

bull Numerals Signs and Symbols etc

2 Script Core Technology amp Rule Engine

Because of the complex nature of the Nastaliq scriptwhich is written right to left and top to bottomdiagonally it was required to make certain databasesand use them along with the rule engine to makedisplay possible Following are some tools which weredeveloped to create these databases

21 Glyph Property Editor

In Urdu and all other Perso Arabic languages eachcharacter has many different shapes dependingupon the position of that character 1) the standalone 2) the starting position 3) the middle position4) the ending position etc So according to theposition of the character and the letters with it itsjoining that particular shape is to be displayed Thisis an application that allows to store the data andthe compositional information for various shapesin a font file for proper alignment of characters fordisplay ( Urdu requires a larger number of glyphsthan other scripts) This takes care of the horizontaland vertical movement for Nashtaliq script

22 Rule Engine

A module that containing rules for each shape Therules indicate how and which shape will join withother shape or shapes based on the succeeding andpreceding shapes that occur in Nastaliq textcomposition This facilitates more accurate and fasterdisplay of the shape composition in the Nashtaliqtext composition

23 Testing version of Urdu Editor

A simple text editor is ready to test the Rule Engineand check the various rules formed for displayingthe glyphs This editor allows to write the text in leftto right and handles all keyboard mappings fromASCII to USCII and Urdu text is displayed

24 Font Glyph Editor

In Nastaleeq script (for Urdu) the shapes join fromright to left and within a ligature they also shiftvertically This forms a ligature So in a ligatureshapes join right to left and top to bottom Thevertical positioning information cannot be put inthe font itself Hence it was decided to make a utilitywhich will let one set this information in separate file

171

Example of Nastaleeq joints (ldquosehanrdquo =gt suad baRi-he noon)

Purpose

To select a glyph from the given font and characterand define on it following information

1 Starting point (SP) the point where a glyph endsThis is generally on baseline or zero

2 Ending point (EP) the point where a glyph joinswith another glyph on its starting point (SP ofanother glyph) This point is less than or equal tothe glyph width

3 Dot points (UP and DP) dot points where a dotor diacritic mark will be positioned on the glyph

25 The Font Editor Utility

The Font Editor utility lets us place the dotinformation for each glyph The file thus generatedis used for retrieving dot information at run timeWersquoll call it a dot file

The following figure depicts a glyph with dot andcaret information As shown in the figure the utilityhelps one define following for a glyph

1 Font Name

2 Unicode value of the glyph

3 Glyph Index in dot file (this is the unique id ofthe glyph)

4 Start and End points

5 Dot positions

6 Caret position

Fonts following are samples of fonts developed

(Ghalib)

(Ghalib Bold)

(Imaad)

(Kamran)

(Maghribi)

(Makhdoum)

(Murasilat)

(Riyadh)

172

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 5: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

3 Special Characters

4 Diacritics

5 Religious and linguistic Symbols

6 Control characters

The 16-bit Naskh amp Nastaliq Font

The Fonts developed by CDAC are 16-bit and arealso defined in the User Area of the Unicode rangeThe ASCII range is not used and can be used fordifferent purposes (can be used to English supportfor example)

Includes

bull all the basic shapes

bull all the starting shapes and variations

bull all the middle shapes and variations

bull all the ending shapes and variations

bull levels for erabs (short vowels)

bull complete ligatures

bull Beginning ligatures

bull Middle ligatures

bull Ending ligatures

bull missing character glyph

Glyph Standards decided for FONTDevelopment

Urdu amp Kashmiri- Naskh amp Nastaliq script

Sindhi - Naskh script

bull Basic letters (Vowels Consonants)

bull Full shape letters

bull Beginning

bull Medial

bull Final

bull Ligatures

bull Graphic components

bull Numerals Signs and Symbols etc

2 Script Core Technology amp Rule Engine

Because of the complex nature of the Nastaliq scriptwhich is written right to left and top to bottomdiagonally it was required to make certain databasesand use them along with the rule engine to makedisplay possible Following are some tools which weredeveloped to create these databases

21 Glyph Property Editor

In Urdu and all other Perso Arabic languages eachcharacter has many different shapes dependingupon the position of that character 1) the standalone 2) the starting position 3) the middle position4) the ending position etc So according to theposition of the character and the letters with it itsjoining that particular shape is to be displayed Thisis an application that allows to store the data andthe compositional information for various shapesin a font file for proper alignment of characters fordisplay ( Urdu requires a larger number of glyphsthan other scripts) This takes care of the horizontaland vertical movement for Nashtaliq script

22 Rule Engine

A module that containing rules for each shape Therules indicate how and which shape will join withother shape or shapes based on the succeeding andpreceding shapes that occur in Nastaliq textcomposition This facilitates more accurate and fasterdisplay of the shape composition in the Nashtaliqtext composition

23 Testing version of Urdu Editor

A simple text editor is ready to test the Rule Engineand check the various rules formed for displayingthe glyphs This editor allows to write the text in leftto right and handles all keyboard mappings fromASCII to USCII and Urdu text is displayed

24 Font Glyph Editor

In Nastaleeq script (for Urdu) the shapes join fromright to left and within a ligature they also shiftvertically This forms a ligature So in a ligatureshapes join right to left and top to bottom Thevertical positioning information cannot be put inthe font itself Hence it was decided to make a utilitywhich will let one set this information in separate file

171

Example of Nastaleeq joints (ldquosehanrdquo =gt suad baRi-he noon)

Purpose

To select a glyph from the given font and characterand define on it following information

1 Starting point (SP) the point where a glyph endsThis is generally on baseline or zero

2 Ending point (EP) the point where a glyph joinswith another glyph on its starting point (SP ofanother glyph) This point is less than or equal tothe glyph width

3 Dot points (UP and DP) dot points where a dotor diacritic mark will be positioned on the glyph

25 The Font Editor Utility

The Font Editor utility lets us place the dotinformation for each glyph The file thus generatedis used for retrieving dot information at run timeWersquoll call it a dot file

The following figure depicts a glyph with dot andcaret information As shown in the figure the utilityhelps one define following for a glyph

1 Font Name

2 Unicode value of the glyph

3 Glyph Index in dot file (this is the unique id ofthe glyph)

4 Start and End points

5 Dot positions

6 Caret position

Fonts following are samples of fonts developed

(Ghalib)

(Ghalib Bold)

(Imaad)

(Kamran)

(Maghribi)

(Makhdoum)

(Murasilat)

(Riyadh)

172

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 6: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

Example of Nastaleeq joints (ldquosehanrdquo =gt suad baRi-he noon)

Purpose

To select a glyph from the given font and characterand define on it following information

1 Starting point (SP) the point where a glyph endsThis is generally on baseline or zero

2 Ending point (EP) the point where a glyph joinswith another glyph on its starting point (SP ofanother glyph) This point is less than or equal tothe glyph width

3 Dot points (UP and DP) dot points where a dotor diacritic mark will be positioned on the glyph

25 The Font Editor Utility

The Font Editor utility lets us place the dotinformation for each glyph The file thus generatedis used for retrieving dot information at run timeWersquoll call it a dot file

The following figure depicts a glyph with dot andcaret information As shown in the figure the utilityhelps one define following for a glyph

1 Font Name

2 Unicode value of the glyph

3 Glyph Index in dot file (this is the unique id ofthe glyph)

4 Start and End points

5 Dot positions

6 Caret position

Fonts following are samples of fonts developed

(Ghalib)

(Ghalib Bold)

(Imaad)

(Kamran)

(Maghribi)

(Makhdoum)

(Murasilat)

(Riyadh)

172

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 7: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

(Rubai)

(Safi)

(Saleem)

(Shibli)

(Shuja)

(Abid)

3 DevanagariGurmukhi to Urdu TransliterationldquoUTRANS rdquo

A computer based transliteration allows a user toconvert a given text (say in storage format - ISCII)of language lsquoArsquo to text in another language lsquoBrsquo (insome storage format say ASCII) on the basis of thephonetic rules that govern languages A and B

A software (automatic) tool providingtransliteration from one language to another isan effective tool which reduces the amount ofefforts required to create large databases of namesin multiple languages

Once a database in one language is created thetool can be used to generate the same data forother languages This results in considerablesavings in time and money

It is a useful tool for applications which requireto convert database of names addresses phonenumbers or electoral rolls from one language toanother

This can also be useful for converting entiredatabases from one language to another withoutre-entering the data This utility can convert datain English from a database directly to ISCII

This will facilitate automated transliteration acrossdatabases

A software tool for transliterating text fromDevanagari amp Gurumukhi (ISCII) to Urdu hasbeen developed This tool is also available as aseparate module and also been implemented inthe Text Editor Test Application which isdeveloped

31 Following is a sample of Hindi textTransliterated into Urdu (UTRANS)

Using this automated tool ndash UTRANS we havetransliterated the book ldquoMeri Ekyavan Kavitayenrdquowritten by our Hon Prime Minister Shri AtalBihari Vajpayee

173

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 8: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

32 Following is a sample of Punjabi textTransliterated into Urdu (UTRANS)

4 Dictionary Development Tool

Natural language processing tools such as lookupdictionaries thesaurus spellchecker grammarcheckers part of speech taggers machine aidedtranslations require language specific dictionariesSeveral other domains such as speech processingexpert systems require backend languagedictionaries to be able to process and generate validoutputs Each of these applications requiredictionaries to be created to cater for the specificrequirements

The Generic Dictionary Development Tool willallow users to create dictionaries specific toapplication domains As part of this developmentthe format of a generic dictionary shouldstandardized The tool will provide facilities forobtaining views of dictionaries that are subsets ofthe generic dictionary

This tool though lot of refinement is required isin the form of a standalone desktop utility whichallows the user to create dictionaries containingwords for Perso-Arabic languages The user isprovided with an interface to define inflection androot word rules add suffix prefix grammar tagsdomain for a given word entry

In order to build spellchecker amp domain specificdictionaries following dictionary creation work wasunder taken and completed

1 Official Hindi dictionary - Hindi 50000 wordsequivalent 50000

2 Urdu to English urdu words 17000 grammertag 17000

3 Hindi Urdu Shabdh Kosh Hindi words 10000Equivalent 10000

4 Dictionary Urdu words 10000 grammer tag10000

5 Dictionary of Urdu amp Hindi names 3200Equivalent 3200

6 Technical dictionary Hindi words 1500 equivalent 1500

7 Muslim names Dictionary Hindi 1500equivalent 1500

9 Hindi - English - Urdu Hindi words 8000equivalent 8000

Following is the typical GUI of the dictionarydevelopment tool for URDU

174

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 9: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

5 Nashir - Wordprocessorndash Publishing Made Easy

The Nashir is designed to be easy to use and createdocuments in Perso-Arabic languages and at thesame time powerful enough to layout completenewspapers and magazines in Urdu SindhiKashmiri Arabic and Farsi

Each document of the Nashir consists of a numberof pages On each page of the document you canplace items like text blocks graphics etc

The Nashir is kind of an Word processor amp alsobest suited for publishing segment Spellcheckersupport is added

Apart from the base dictionary for spellcheckervarious domain specific dictionaries addition isplanned

Salient Features of Nashir

1 Support Nastaliq True Type fonts (presently 2fonts)

2 Supports Naskh fonts (presently 12 fonts)

3 Fonts for Sindhi and Kashmiri added

4 Supports C-DAC amp Phonetic Keyboards

5 User defined keyboard support available

6 Drawing objects provided

7 OLE Automation supported

Other Features

Bilingual Support

Supports Urdu Sindhi amp Kashmiri along withEnglish

Transliteration

Transliteration engine (uTRANS) has beenimplemented in the Nashir One can insert an ldquoacirdquo(ISCII file) file into Nashir and see the transliteratedversion in Urdu script (Naskh or Nastaliq) Rulebased transliteration developed for Hindi amp Punjabi

Save As HTML

The user can save the document as HTML pageand thus Naskh as well Nastaliq scripts can be viewedon the Internet

Table

A Table control is provided to feed tabularinformation in the document

Keyboards

Phonetic keyboard supported A Floating keyboard will be supported soon

Tool tip

Tool tips are provided for the user

Master Page

A master page is provided with each document toprovide the user with the header footer andsimilar settings Whatever is designed on Masterpage appears on the pages

Kerning

Kerning feature is provided to manually adjust atext Both Horizontal and Vertical kerning ispossible

Variable Font Sizes amp Colors

Use variable font sizes and color

Others

The Nashir has a number of other features likeHorizontal and Vertical rulers in the GUIdynamic font settings for the Urdu and Englishfonts Indents and paragraph settings pagesettings etc

Nashir Screen Shots

175

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 10: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

Following screen shot shows Text wrapping aroundobject feature of Nashir

Spellchecker Support in Nashir

The full version of Nashir is now equipped with thespell checker facility

The Spellchecker program works as a word-levelerror detection and correction (single or multiplecorrection) tool Following consideration were givenwhile designing the spellchecker module

bull Percentage of invalid words pass through it

bull Average suggestion rate

bull Intended suggestion is given

bull Domain the spellchecker is serving

bull Dictionary structure

Error DetectionCorrection Approaches

1 Non-word error detection

2 Isolated word error detection

3 Context based word correction

Non-Word Error Detection

Typographic errors (typing mistake)

Cognitive errors (invalid entry by user)

Phonetic error

Real word error

1 Detection Block invalid words to pass through it

2 Correction Suggest alternate valid words that canpass through it

Error Detection Techniques

Lookup Dictionaries

bull Structure Hash Tries etc

bull Dictionary size

bull Inflection euphony assimilation (MorphologicalAnalyzer)

bull Domain

Bi-gram Tri-gram approach (based on conditionalprobability approach)

A two dimensional matrix having conditionalprobability state of letters occurring next to eachother Bi-grams are searched and if a bi-gram is notin the dictionary then the words are termedincorrect

176

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 11: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

6 The Urdu SDK Controls

The GIST Urdu SDK is a software developmenttool for Urdu Language on MS Windows

This is a set of software components that enablessoftware developers to facilitate the use of Urduscripts with MS Windows applications Unlike othersolutions the significant feature of GIST Urdu SDKis that it enables the MS Windows application toprocess PASCII data directly

The GIST SDK uses ActiveX Technology fromMicrosoft and provides a seamless transparent andself contained Indian language layer for data entrystorage retrieval and printing in Indian scripts foryour MS Windows applications

These controls are bilingual and also support Englishlanguage The package contains editing control thatsupport the complex scripts like Nastaliq Followingis a listing of controls that are part of the Urdu SDKcontrols package version 10 which are developedby C-DAC GIST

bull UEdit

bull UListbox

bull UCombobox

bull UStatic

bull UButton

bull UCheckbox

bull URadiobutton

With the above set of controls one can create allsorts of applications that support Urdu language

The UrduEdit OCX is multi-line text editor controlthat supports Perso-Arabic Languages This version(10) of the control supports URDU as the primarylanguage The control supports Naskh as the defaultscript but can also be used to depict Nastaliq scriptThe control has a number of features to providewith most of the functionalities needed for a Textediting control

Features

1 Read amp Write in Naskh Nastaliq script

2 Bilingual text support ie English and Naskh

3 Copy paste text glyphs into external applications

4 Use for database applications

5 Transliterate (import) Hindi (ISCII) documentsinto Urdu

6 Use as simple Editor with fixed font and size

7 Use Rich Edit text facilities (eg multiple colorsfont sizes multiple fonts)

8 A number of Naskh fonts provided along withthe control

Sample Urdu Active-X control in a form

Applications using GIST URDU SDK

Following are some examples of set of applicationswhere the UrduEdit OCX can be useful

DatabaseApplication

The UrduEdit Control can be connected to any typeof database and store text as data You can store (orload) text from a database from (or into) the controlTypical applications that can be created are mailmerge report generators and all types of businessapplications where documents are created frominformation stored in a database

Desktop Applications

With UrduEdit Control you can easily createapplications that need text fields for Perso-Arabiclanguages The control can be used as single line ormultiline control You can also use Rich Edit featureof the control to display text in different fonts colorsand sizes

Web Applications

Create user controls with Visual Basic use UrduEditControl in web pages

177

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 12: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

Sample Database application form using UrduActive-X Control

7 Pocket Translator

Multilingual Pocket Translator was developedkeeping into mind the foreigners traveling toIndia Even it is useful for persons traveling withinIndia (Inter State) Similar products availableworldwide but not for Indian Languages

Features

1 English to Urdu amp vice a versa translation

2 Fixed messages for 5 different categories suchas Social Travel Emergency shop amp restaurant both for Urdu amp English 500 messagesin each catagory

3 Messages can be browsed in Urdu amp English

4 Speech output for Fixed messages only

5 Single Toggle key for selection of various categories

6 Arrow keys for scrolling through the messagesdictionary

7 Script toggle key for changing the script

8 4 lines for English message display

9 2 lines of Urdu message display

10 Talk key for speech output

11 English - Urdu dictionary

12 Displays nearest matching word in English ifthe typed word is not available in Dictionary

13 Translation of the same to Urdu by pressingScript key

Hardware Details

1 Developed on Motorola 68EC000 (PQFP package) processor running at 10 MHz

2 2 MB of EPROM for program fonts fixedmessages dictionary amp speech op

3 16 KB of SRAM for Temporary variables

4 64 keys Matrix QWERTY type keyboard

5 100 x 32 Dot matrix LCD display panel

6 4-bit ADPCM decoder for speech output

Block Diagram of Pocket Translator

178

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 13: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

8 Multiprompter

An ideal system for news reading anddocumentary production Support for Naskh ampNastaliq scripts provided

Read news or your dialogues without tears Youlook into the camera and get to se the text rollinga a speed that suits your reading pace And thattoo in India Arabic or European languages ITcan have multiple mechanical attachments withoptical glass for mounting on camera tripods On-line skipping of stories

Salient Features

Provide support to control the srcolling speed amin to 9 min max and 0 for pause

1 Support spellchecker

2 Support Indian or European languages

3 Upgradeable to News Romm AutomationSystem

4 Works under Windows environment

5 Available in two models Desktop amp portablebased on Laptop

Following is the screen shot of the GUI (GraphicalUser Interface) of the Teleprompter applicationrunning on the windows platform Used forcreation of the news and documentaries whichare then prompted on a special equipment havingmonitor mirrored glass amp camera

9 LIPS Advanced Creation Station amp LIPS forDVD

LIPS is a revolutionary multilingual systemmeant for Electronic as well as fused subtitlingIt consistes of highly productive software and costeffective hardware to create subtitles in Indian aswell as foreign languagess along with the timecodes

LIPS for DVD allows you to create DVDs withmultilingual subtitles in various Indian as well sforeign languages It converts the LIPS subtitlingfiles into a DVD compatible format This systemsupports Diakin and Sonic formats Generates theheader indicating the background and foregroundcolors edge color the in-time and out-time aswell as the subtitling image file

Saient Features

1 Supports all Indian languages inclusive ofURDU Sindhi amp Kashmiri

2 Naskh amp Nastaliq support provided

3 Support TIF and BMP formats

4 Support different fonts and font sizes

5 Edge color for subtitles can be specified

6 Allows positioning of subtitles on the video

7 Provision for increment and decrement of inand out time

8 Compatible with PAL as well NTSC formats

Subtitling

179

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180

Page 14: Achievements - tdil.meity.gov.in urdu.pdfRCILTS-Urdu, Sindhi & Kashmiri Centre for Development of Advanced Computing, Gist, Pune Introduction “Digital unite & Knowledge for all the

10 Perso-Arabic Web Site Development

Under the Perso-Arabic resource centre one ofthe major deliverables was to create the web sitein PA scripts amp especially in the Nataliq scriptFollowing was the method adopted for design ampdevelopment of the website named httpparccdacindiacom

Designed a special activeX control (UEditWeb)for web that could display Nastaliq

This control supports HTML like tags to setvarious attributes

Converted USCII data into UEditWeb format

UEditWeb data was stored in files which were putin Database

Server side scripts were written to generate Urducontent for display

Ten books will be added to make the site contentrich

Following screen shots shows the main page ampthe page containing information related to theresource centre website

Courtesy Shri M D KulkarniChief Investigator - Resource Centre amp

the Resource Centre Team GISTCentre for Development of Advanced Computing

(C-DAC) Pune University CampusGaneshkhind Pune 411 007

Tel 00-91-20-56940000102 5694092Fax 91-20-5694059

E-mail mdkcdacindiacom

180