32
1 Proposal for a Telugu Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2018-08-08 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract This document lays down the Label Generation Rule Set for the Telugu script. Three main components of the Telugu Script LGR, viz. Code point repertoire, Variants and Whole Label Evaluation Rules have been described in detail here. All these components have been incorporated in a machine-readable format in the accompanying XML file: "Proposal-LGR-Telu-20180808.xml". In addition, a list of test labels has been provided in the following file, which covers the repertoire, variant code points and the whole label evaluation rules, providing examples for valid and invalid labels: “telugu-test-labels-20180808.txt”. 2. Script for which the LGR is proposed ISO 15924 Code: Telu ISO 15924 Key N°: 340 ISO 15924 English Name: Telugu Latin transliteration of native script name: telɯgɯ Native name of the script: !ెల$గ& Maximal Starting Repertoire [MSR] version: 3 The Unicode Standard, Version: 6.3 Telugu Unicode Range: 0C00–0C7F 3. Background of the Script and Principal Languages Using It The Telugu language uses the Telugu script which is written in the form of sequences of orthographic syllables. Each orthographic syllable is formed of one or more Telugu characters placed from left to right and top to bottom. Telugu is one of the 22 scheduled languages of India. The Telugu script is immediately related to Kannada and closely related to the Sinhala script.

Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

  • Upload
    others

  • View
    31

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

1

ProposalforaTeluguScriptRootZoneLabelGenerationRuleset(LGR)

LGRVersion:3.0Date:2018-08-08Documentversion:2.6Authors:Neo-BrahmiGenerationPanel[NBGP]

1. GeneralInformation/Overview/AbstractThisdocumentlaysdowntheLabelGenerationRuleSetfortheTeluguscript.Threemaincomponentsof theTeluguScriptLGR, viz. Code point repertoire, Variants andWholeLabelEvaluationRuleshavebeendescribed indetailhere.All thesecomponentshavebeen incorporated in a machine-readable format in the accompanying XML file:"Proposal-LGR-Telu-20180808.xml".

Inaddition,alistoftestlabelshasbeenprovidedinthefollowingfile,whichcoverstherepertoire,variantcodepointsandthewholelabelevaluationrules,providingexamplesforvalidandinvalidlabels:“telugu-test-labels-20180808.txt”.

2. ScriptforwhichtheLGRisproposedISO15924Code:TeluISO15924KeyN°:340ISO15924EnglishName:TeluguLatintransliterationofnativescriptname:telɯgɯNativenameofthescript:!ల$గ&MaximalStartingRepertoire[MSR]version:3TheUnicodeStandard,Version:6.3TeluguUnicodeRange:0C00–0C7F

3. BackgroundoftheScriptandPrincipalLanguagesUsingItTheTelugulanguageusestheTeluguscriptwhichiswrittenintheformofsequencesoforthographic syllables. Each orthographic syllable is formed of one or more Telugucharactersplacedfromlefttorightandtoptobottom.Teluguisoneofthe22scheduledlanguages of India. The Telugu script is immediately related to Kannada and closelyrelatedtotheSinhalascript.

Page 2: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

2

3.1TheEvolutionoftheScriptTheoriginsoftheTeluguscriptcanbetracedtotheBrahmialphabetofancientIndia,often known as Asokan Brahmi. Historically the script is derived from the SouthernBrahmiorBhattiproluBrahmialternativelyknownastheTeluguBrahmialphabetof3rdcentury BCE. Later, by 5th century during the Chalukyan period, it developed into acommonalphabetusedforTeluguandKannada.TheTelugu-Kannadacommonalphabetsplitintotwoseparatealphabetsduringthe12thand13thcenturiesADtobecalledtheTeluguandKannadascripts.Inadditiontothecommonorigin,alongerperiodofsharedpolitical and cultural confederation of the Telugu and Kannada speaking regions hasultimatelyresultedintheconsiderableproportionofthesharedidenticalcharactersignsbetweenthetwoscripts(34outof63characters,seeTable10).TheearliestknowninscriptionscontainingTeluguwordsappearonthebilingualcoinsofSatavahanas that date back to 2nd centuryAD [104]. The first inscription entirely inTeluguwasmadein575ADandwasprobablymadebyRenatiCholas,whostartedwritingroyalproclamations inTelugu insteadof Sanskrit.Telugudevelopedasapoetical andliterarylanguageduringthe11thcenturyAD.Untilthe20thcenturyTeluguwaswritteninGranthicstyleverydifferentfromthecolloquiallanguage.Duringthesecondhalfofthe20thcentury,amodernwrittenstyleemergedbasedonthemoderncolloquiallanguage.In2008TeluguwasdesignatedasaclassicallanguagebytheIndiangovernment.

Figure1:EvolutionofTeluguscript

3.2NotableFeaturesTheTeluguorthographysuperficiallyappearsasaseriesofcirclesandsemi-circles.MostconsonantscarryatickmarkcalledTalakattu.Thewritingsystemisclassifiedasabugidatype that employs alpha-syllabaries. The alphabet consistsof vowels, consonants andmodifiers.Eachofthesevowelsandconsonantshasoneormoresecondaryallographs.Thesecondaryallographsalwaysappearasdependentsymbolsonthefirstcharacterofasyllable.Eachsyllableisformedofasinglestandalonevoweloroneormoreconsonants.Eachoftheseconsonantsmayoccurwithaninherentvowelormodifiedbyasecondaryvowel.AConsonantclustermaybeformedwithasinglestandalonecharacterfollowed

Page 3: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

3

byoneormoresecondaryformsofconsonants.Theorderofcompositionofsyllabariesdoesnotmatchwith the readingorder.Thereare rules to learn to readorthographicsequencesintophoneticsequenceswhethersimpleorcomplexsyllables.

3.3TheTelugu(!ల$గ&)Language

The Telugu language is a Dravidian language spoken by about 75million (ca. 2001)peoplemainlyinthesouthernIndianstatesofAndhraPradeshandTelanganawhereitisthe official language. It is also spoken in such neighboring states asKarnataka, TamilNadu,Orissa,MaharashtraandChattisgarh,andisoneofthe22scheduledlanguagesofIndia. There are also quite a few Telugu speakers in Canada, the USA, South Africa,Malaysia,Mauritius,Myanmar,SriLankaandRéunion

3.4LanguagesthatUsetheTeluguScriptThescriptisalsousedfortenotherlanguages,viz.Gondi,Koya,Konda,Kuvi,KolavarorKolami,Yerukala,BanjaraorLambadi,SavaraorSora,AdivasiOdiyaandalsoSanskrit.IntheTeluguspeakingregion,thetraditionofwritingSanskritintheTeluguscripthasremained a commonpractice. During the last fewdecades, a considerable number ofpublicationsintheformoftextbooks,dictionariesandotherreadingmaterialhasbeenproduced in theTeluguscript inGondi,Koya,Konda,Kuvi,Kolami,Yerukala,Banjara,SavaraandAdivasiOdiya.

no. Nameofthelanguage(ISO639Code)

Languagefamily

Status EGIDSScale

1 Telugu(tel) Dravidian ScheduledandClassical

2

2 Gondi(gon) Dravidian ModernTribal 5

3 Koya(kff) Dravidian ModernTribal 5

4 Konda(knd) Dravidian ModernTribal 6b

5 Kuvi(kxv) Dravidian ModernTribal 5

6 KolavarorKolami(kfb) Dravidian ModernTribal 5

7 Yerukala(yeu) Dravidian ModernTribal 6

8 BanjaraorLambadi(lmn) Indo-Aryan ModernTribal 5

9 SavaraorSora(srb) Austro-Asiatic

ModernTribal 5

10 AdivasiOdiya(ort) Indo-Aryan ModernTribal 5

Page 4: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

4

no. Nameofthelanguage(ISO639Code)

Languagefamily

Status EGIDSScale

11 Sanskrit(san) Indo-Aryan ScheduledandClassical

4

Table1:MainlanguagesconsideredunderTeluguLGR

3.5TheStructureofWrittenTeluguTheTeluguscriptasitisusedfortheTelugulanguageconsistsofatotalof72characters[102]comprising40consonants,16charactersrepresentingvowelsthatcanstandaloneand16dependentsigns,eachcorrespondingoneofthesixteenvowelsexcepting/a/అ;no explicit dependent symbol exists for that sound, instead it is inherent with theconsonantsintheabsenceofadependentsign. Besidesthese,therearesixadditionaldependentsymbols,ofwhichfivealwaysoccurwiththevowels,asextensions.Thesixth,the halant sign◌U+0C4D,occurswithconsonants.Thefollowingsubsectionsgivefurtherdetails.

3.5.1ThevowelsandvowelmodifiersTherearefourteenvowelcharactersviz.అ[a],ఆ[ā],ఇ[i],ఈ[ī],ఉ[u],ఊ[ū],ఋ[r],ఌ[l],ఎ[e],ఏ[ē],ఐ[ai],ఒ[o],ఓ[ō],ఔ[au],inthecommoninventory[103]forallthelanguagesusingTeluguscript[111]specifiedaboveandtwo(ౠ[r],ౡ[ḹ])towriteSanskritloanwords.Forthesevowels,therearecorrespondingfifteenmarks,exceptforఅ[a](whichisinherent).ThesearelistedinTable2below. Therearesixmodifiersforvowels:◌ఁ[~],◌ం[ṃ],◌ః[ḥ],◌[~](aspecialsymbolnotcommoninstandardTeluguwritings),ఽ[:.](theavagrahasign,commonlyusedtoindicatedoublingthevowellengthandfollowsonlylongvowels), and ◌ [H] (thehalant sign,whenappended toa consonant,deducts theinherent vowel /a/ from it). The halant sign has similar characteristic as that of asecondaryvowelsigninthatbothofthemdeletetheinherentvowel[a]whenaddedtoconsonants.R1.Inherentvoweldeletionrule:Aninherentvowelofaconsonantgetsdeletedeitherbeforeamatrasignorbeforethehalantsign.C[ca]+M[◌,◌…]|H [◌]->C[c◌,◌]|H [◌]C[ca]+M[0C3E-3F,0C40-44,0C62-63,0C46-48,0C4A-4C]|[0C4D]->C[c]M[0C3E-3F,0C40-44,0C62-63,0C46-48,0C4A-4C]|[0C4D]C=Consonant,ca=aconsonantwithaninherent‘a’,M=Secondaryvowel;

Page 5: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

5

No. Independentvowelsprimaryallographswithcodepoints

Dependentvowelssecondaryallographswithcodepoints

1. అU+0C05 Noexplicitsignrecognizedorencoded

2. ఆU+0C06 ◌U+0C3E

3. ఇU+0C07 ◌U+0C3F

4. ఈU+0C08 ◌U+0C40

5. ఉU+0C09 ◌ుU+0C41

6. ఊU+0C0A ◌ూU+0C42

7. ఋU+0C0B ◌ృU+0C43

8. ౠU+0C60 ◌ౄU+0C44

9. ఌU+0C0F ◌U+0C62

10. ౡU+0C61 ◌U+0C63

11. ఎU+0C0E ◌U+0C46

12. ఏU+0C0F ◌U+0C47

13. ఐU+0C10 ◌U+0C48

14. ఒU+0C12 ◌U+0C4A

15. ఓU+0C13 ◌U+0C4B

16. ఔU+0C14 ◌U+0C4C

Table2:Vowelsandthecorrespondingdependentsigns

No. Modifiersigns CodePoints Commonname

1. ◌ U+0C00 Candrabindu

2. ◌ఁ U+0C01 ArdhānusvāraorArasunna

3. ◌ం U+0C02 PūrṇanusvāraorSunna

4. ◌ః U+0C03 Visarga

5. ఽ U+0C3D Avagraha

6. ◌ U+0C4D Halant

Table3:Vowelmodifiersandtheconsonantalmodifiers

Page 6: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

6

3.5.2TheAnusvāraorsunna(◌ం-U+0C02)

TheAnusvāraorsunnarepresentsahomorganicnasalbeforethecorrespondingconsonantandasasubstitutetotranscribewordfinal/mu/.EssentiallyitsubstitutesaclusterofaNasalConsonant+Halantbeforeaconsonant.Writingalternativelywithanasalconsonant+Halant+ConsonantisrareandoftenoccurwhiletranscribingSanskritwords.Otherwisethewritingpracticewithnasalconsonant+Halant+ConsonantofthelatertypeisvirtuallyabsentinTelugu.

No. Homorganicnasal=Archiphoneme/M/

Homorganicnasal+Halant

1. లంక/laMka/ లఙN/laŋka/‘island’

2. కంO/kaMce/ కఞQR[kaɲce]‘fence’

3. పంట/paMTa/ పణV /paṇTa/‘harvest’

4. కంత/kaMta/ కనY /kanta/ ‘hole’

5. కంప/kaMpa/ కమ[/kampa/‘thornybush’

6. కంస/kaMsa/ కమ]/kansa/‘kingKansa’

7. ^ంహ/siMha/ ^మ/simha/‘lion’

Table4:HomorganicnasalandHomorganicnasal+Halant

3.5.3Nasalization:Candrabindu(◌U+0C00)orarasunna(◌ఁU+0C01)

Candrabindu,whichdenotesnasalizationoftheprecedingvowel,isusedinthePrakrittextstranscribedintheTeluguscriptandthearasunnaasinoldTelugu!ల$ఁగ&/telũgu/‘telugu’.Present-dayTeluguusersdonotusethecandrabindufrequentlyunlesstobringspecialemphasisasinhãã,hũũ,etc.

3.5.4TheConsonantsTheTeluguconsonantshaveanimplicitvowel/a/includedinthem.Asperthetraditionalclassification theyare categorizedaccording to theirphoneticproperties.Thereare5vargagroups(classes)andonenon-vargagroup.Eachvargacorrespondstoaparticularsetofstopscharacterizedbyparticularplaceofarticulation.Eachvargacontainsfouroralstopsandonenasalstoporderedbythecomplexityoftheirmannerfromlefttorightas[-vd,-asp, -nas], [-vd, +asp, -nas], [+vd, -asp, -nas], [+vd, +asp, -nas], [+vd, -asp, +nas](where,vd=voiced,asp=aspirated,nas=nasal).Eachfeaturesetdefinesthecharacterbythevarga.Eachvargafromtoptobottomaredefinedbyanadditionalplacefeatureofarticulation.Thenon-vargasetisagaindividedintotwosubsets,eachischaracterizedbyabsenceorpresenceofsonority,i.e.[+/-son].Theobstruentscharacterizedby[–son]are

Page 7: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

7

fricatives,viz.శ[ś],ష[ṣ],స[s],హ[h],whiletheremainingcarrythefeatureofsonorityi.e.[+son].No.

PlaceofArticulation

-asp-vd-nas

ISO

+asp-vd-nas

ISO

-asp+vd-nas

ISO

+asp+vd-nas

ISO

-asp+vd+nas

ISO

1. Velar క k ఖ kh గ g ఘ gh ఙ ṅ

2. Palatal చ c ఛ ch జ j ఝ jh ఞ ñ

3. Retroflex ట ṭ ఠ ṭh డ ḍ ఢ ḍh ణ ṇ

4. Dental త t థ th ద d ధ dh న n

5. Bilabial ప p ఫ ph బ b భ bh మ m

Table5:Classificationofstopconsonants

SonorantsFricatives

య y ర r ఱ ṛ ల l ళ ḷ వ v

శ ś ష ṣ స s హ h

Table6:Non-stopconsonants

4.TheDevelopmentProcessandMethodologyTheNeo-BrahmiGenerationPanelinvolvesanumberofdifferentscriptswithdistinctUnicodeblocks.EachofthesescriptsusuallywillhaveaseparateLGR.However,acommonthreadrunsthroughtheneo-BrahmiscriptsintheprocessofLGRdevelopment.Anumberofguidingprinciplesthatarelaidoutwillbeusedinthedevelopmentofthescheme.Asspecifiedelsewhere,theNBGPadoptsthefollowingprinciplesintheselectionofcode-pointsfromthecode-pointrepertoirefortheTelugulanguagescript.Aprinciple,liketheInclusionprinciple,dealswithwhetherthecharacterisregularlyusedinthelanguage,besidesitsunambiguousnature.Thesecondimportantprinciple,theexclusionprinciple,dealswiththeuseofthecodepointrepertoireforrootzoneanddoesnotalloweverycharacterthatistabulatedintheUnicodechart.AbaselinelayerofrestrictionissetfortheDomainNameSystembytheprotocol known as IDNA (Internationalized Domain Names in Applications). IDNAexcludes some characters from the Unicode repertoire for the concerned script. Anadditionallayerisaddedfortherootzone,calledtheMaximalStartingRepertoire(MSR).Telugudoesnothavemanysuchcharactersthatarerestricted.Onesuchcharacterfor

Page 8: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

8

exampleis,theAvagraha"ఽ"(U+0C3D),whichisrestrictedbyMSRevenifallowedby

theIDNAprotocol.Similarly, certain punctuation marks that were used in the traditional texts are notassignedanycodepointsandhencenotnecessarytobeincludedhere.Othercasessuchas symbols and abbreviations are not permitted. In addition to the above, rare andobsolete characters though recognized in the Unicode chart of Telugu will not bepermittedintherootzoneLGR.

4.1ZeroWidthJoinerandZeroWidthNon-JoinerinTeluguDomainNamesMSRexcludesinvisiblecharacterslikeZeroWidthNon-Joiner(U+200C)andZeroWidthJoiner (U+200D), as they require ad hoc representation in different ways. These arerequiredincertaincaseswhereatypicalvisualshapeofanaksharisdesired.TherearecontrastiveusagesofwrittenformsderivedfromtheuseofZeroWidthJoiner(ZWJ)andZeroWidthNon-Joiner(ZWNJ).TheyhavespecialrolesinthewritingsystemofTelugu.ZWNJisusedinsequenceslikeConsonant(C)+Halant(U+0C4D)+Consonant,wherethesecond C is prevented from taking the usual dependent allograph (vattu) form after(below)thefirstconsonant,asinthefollowingexample:

1. క(U+0C15) + ◌ (U+0C4D)+స(U+0C38) + ◌ (U+0C4D)+వ (U+0C35) +◌ (0C3E)=

z]{–withoutusingZWNJ

Example: |z]{తంత}~ం

2. క(U+0C15) + ◌ (U+0C4D) + ZWNJ (U+200C)+స(U+0C38) + ◌ (U+0C4D)+వ

(U+0C35) +◌ (0C3E)= � ��– usingtheZWNJ

Example: |� ��తంత}~ం

Bothformsofthewordsthoughwrittenwithdifferentgraphicsignsmaymeanthesameand theyarealsosameeven in theirpronunciation.Though thesecond formwasnotpreviouslycommon,itsusageisgaininggroundduetotheinfluenceofEnglishandHindi.ItisfrequentlyusedintranscribingmanyEnglishwordsintoTelugu,suchas‘software’(��V |�� ,usingZWNJ). Theword‘software’willbecome���V{� ifZWNJisnotused.

4.2HowtoAvoidDuplicateDomainNamesInvolvingZWJandZWNJ?ZWJandZWNJareusedmainlytowritetwodistinctdisplaysofthesameconsonantclusterorsequencewhichdonothaveanysemanticandphoneticsignificance.WhenZWJandZWNJsareallowedindomainnamesforTelugu,theycreatetwodistinctformsofthesamedomainname.TomakethebrowsersandDNSstotreatthemasequal,we

Page 9: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

9

havetoignoreZWJandZWNJsforcomparingtwowords.Thesameprocedureisusuallyfollowedbythespell-checkersofthelanguage.AcceptingZWJandZWNJindomainnamescreatesconfusiontoamajorityofthelinguisticcommunityandjoinercharactersareprohibitedfortheRootZone,hencethisisexplicitlyprohibitedbytheNBGP.

Page 10: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

10

5.TheRepertoireIn this section, we present the discussion on the code points that would form the repertoire of code points licensed by the [MSR-3] to be validated and used in the root zone label generation rules. Section5.1providesthesectionofthe[MSR-3]applicabletotheTeluguscriptonwhichtheTelugucodepointrepertoireisbased.Section5.2detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanelproposestobeincludedintheTeluguLGR.5.1 Telugu section of Maximal Starting Repertoire [MSR] Version 3

Color convention1: Allcharactersthatareincludedinthe[MSR]arehighlightedinYellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]arehighlightedinPinkishbackgroundNotPVALIDinIDNA2008areinWhitebackground

Figure2:TeluguCodePagefrom[MSR-3] 1This document needs to be printed in color for this to be read correctly.

Page 11: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

11

5.2CodePointsRepertoireIn the following, the Telugu Script Unicode Code points have been presented anddiscussedwithreferencetothePrinciplesthatconstrainthelabelgenerationrules.ItisimportanttonotethatthepurposeofthisdocumentistostateunambiguouslytheTelugucodepointsthatcanbeusedintherootzonerepertoire.Thefollowingtablelists63codepointsfortheTeluguLGR,outofatotalnumberof67codepointslistedinMSR-3,excludingfourcodepointswhichareobsolete.

No. UnicodeCodePoint

Glyph CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference

1. 0C02 ◌ం TELUGUSIGNANUSVARA

2Tel4San5Others2

ANUSVĀRA 102,103

2. 0C03 ◌ః TELUGUSIGNVISARGA

2Tel4San5Others

VISARGA 102,103

3. 0C05 అ TELUGULETTERA 2Tel5Others

Vowel 102,103

4. 0C06 ఆ TELUGULETTERAA 2Tel5Others

Vowel 102,103

5. 0C07 ఇ TELUGULETTERI 2Tel5Others

Vowel 102,103

6. 0C08 ఈ TELUGULETTERII 2Tel5Others

Vowel 102,103

7. 0C09 ఉ TELUGULETTERU 2Tel5Others

Vowel 102,103

8. 0C0A ఊ TELUGULETTERUU 2Tel5Others

Vowel 102,103

9. 0C0B ఋ TELUGULETTERVOCALICR

2Tel5Others

Vowel 102,103

100C0E ఎ TELUGULETTERE 2Tel5Others

Vowel 102,103

11.0C0F ఏ TELUGULETTEREE 2Tel5Others

Vowel 102,103

2 Others are the EGIDS 5 languages, listed in Table 1: Main languages considered under Telugu LGR

Page 12: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

12

No. UnicodeCodePoint

Glyph CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference

12.0C10 ఐ TELUGULETTERAI 2Tel5Others

Vowel 102,103

13.0C12 ఒ TELUGULETTERO 2Tel5Others

Vowel 102,103

14.0C13 ఓ TELUGULETTEROO 2Tel5Others

Vowel 102,103

15.0C14 ఔ TELUGULETTERAU 2Tel5Others

Vowel 102,103

16.0C15 క TELUGULETTERKA 2Tel5Others

Consonant 102,103

17.0C16 ఖ TELUGULETTERKHA

2Tel5Others

Consonant 102,103

18.0C17 గ TELUGULETTERGA 2Tel5Others

Consonant 102,103

19.0C18 ఘ TELUGULETTERGHA

2Tel5Others

Consonant 102,103

20.0C19 ఙ TELUGULETTERNGA

2Tel5Others

Consonant,Nasal-Consonant

102,103

21.0C1A చ TELUGULETTERCA 2Tel5Others

Consonant 102,103

22.0C1B ఛ TELUGULETTERCHA

2Tel5Others

Consonant 102,103

23.0C1C జ TELUGULETTERJA 2Tel5Others

Consonant 102,103

24.0C1D ఝ TELUGULETTERJHA

2Tel5Others

Consonant 102,103

25.0C1E ఞ TELUGULETTERNYA

2Tel5Others

Consonant,Nasal-Consonant

102,103

26.0C1F ట TELUGULETTERTTA

2Tel5Others

Consonant 102,103

Page 13: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

13

No. UnicodeCodePoint

Glyph CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference

27.0C20 ఠ TELUGULETTERTTHA

2Tel5Others

Consonant 102,103

28.0C21 డ TELUGULETTERDDA

2Tel5Others

Consonant 102,103

29.0C22 ఢ TELUGULETTERDDHA

2Tel5Others

Consonant 102,103

30.0C23 ణ TELUGULETTERNNA

2Tel5Others

Consonant,Nasal-Consonant

102,103

31.0C24 త TELUGULETTERTA 2Tel5Others

Consonant 102,103

32.0C25 థ TELUGULETTERTHA

2Tel5Others

Consonant 102,103

33.0C26 ద TELUGULETTERDA 2Tel5Others

Consonant 102,103

34.0C27 ధ TELUGULETTERDHA

2Tel5Others

Consonant 102,103

35.0C28 న TELUGULETTERNA 2Tel5Others

Consonant,Nasal-Consonant

102,103

36.0C2A ప TELUGULETTERPA 2Tel5Others

Consonant 102,103

37.0C2B ఫ TELUGULETTERPHA

2Tel5Others

Consonant 102,103

38.0C2C బ TELUGULETTERBA 2Tel5Others

Consonant 102,103

39.0C2D భ TELUGULETTERBHA

2Tel5Others

Consonant 102,103

40.0C2E మ TELUGULETTERMA 2Tel5Others

Consonant,Nasal-Consonant

102,103

41.0C2F య TELUGULETTERYA 2Tel5Others

Consonant 102,103

Page 14: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

14

No. UnicodeCodePoint

Glyph CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference

42.0C30 ర TELUGULETTERRA 2Tel5Others

Consonant 102,103

43.0C32 ల TELUGULETTERLA 2Tel5Others

Consonant 102,103

44.0C33 ళ TELUGULETTERLLA

2Tel5Others

Consonant 102,103

45.0C35 వ TELUGULETTERVA 2Tel5Others

Consonant 102,103

46.0C36 శ TELUGULETTERSHA

2Tel5Others

Consonant 102,103

47.0C37 ష TELUGULETTERSSA

2Tel5Others

Consonant 102,103

48.0C38 స TELUGULETTERSA 2Tel5Others

Consonant 102,103

49.0C39 హ TELUGULETTERHA 2Tel5Others

Consonant 102,103

50.0C3E ◌ TELUGUVOWELSIGNAA

2Tel5Others

Matra 102,103

51.0C3F ◌ TELUGUVOWELSIGNI

2Tel5Others

Matra 102,103

52.0C40 ◌ TELUGUVOWELSIGNII

2Tel5Others

Matra 102,103

53.0C41 ◌ు TELUGUVOWELSIGNU

2Tel5Others

Matra 102,103

54.0C42 ◌ూ TELUGUVOWELSIGNUU

2Tel5Others

Matra 102,103

55.0C43 ◌ృ TELUGUVOWELSIGNVOCALICR

2Tel5Others

Matra 102,103

56.0C44 ◌ౄ TELUGUVOWELSIGNVOCALICRR

2Tel5Others

Matra 102,103

57.0C46 ◌ TELUGUVOWELSIGNE

2Tel5Others

Matra 102,103

Page 15: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

15

No. UnicodeCodePoint

Glyph CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference

58.0C47 ◌ TELUGUVOWELSIGNEE

2Tel5Others

Matra 102,103

59.0C48 ◌ TELUGUVOWELSIGNAI

2Tel5Others

Matra 102,103

60.0C4A ◌ TELUGUVOWELSIGNO

2Tel5Others

Matra 102,103

61.0C4B ◌ TELUGUVOWELSIGNOO

2Tel5Others

Matra 102,103

62.0C4C ◌ TELUGUVOWELSIGNAU

2Tel5Others

Matra 102,103

63.0C4D ◌ TELUGUSIGNVIRAMA

2Tel5Others

Matra 102,103

Table7:Includedcodepoints

5.3CodePointsNotIncludedReferringtotheprincipleinsection4,thecodepointstobeexcludedfromtherepertoirearethefollowing,forthereasonslisted.Thefollowingcodepointsarenotinwidespreaduse.

• 0C00◌TELUGULETTERCANDRABINDU• 0C01◌ఁ TELUGULETTERARASUNNA

• 0C0CఌTELUGULETTERVOCALICL

• 0C31ఱTELUGULETTERRRA

Varioussigns:Allographsofvoweldiacritics/a:/andpartofadiacriticspecifictoparticularconsonant/h/.

• 0C55◌TELUGULENGTHMARK

• 0C56◌TELUGUAILENGTHMARK

Historicphoneticvariants:Phonologicalvariantsshallnotbepermitted.TheyarenotinMSR-3.

• 0C58ౘTELUGULETTERTSA

Page 16: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

16

• 0C59ౙTELUGULETTERDZA

ThetwoadditionalvowelslistedbelowtotranscribeSanskritarenotpermitted.TheyarenotinMSR-3.

• 0C60ౠTELUGULETTERVOCALICRR

• 0C61ౡTELUGULETTERVOCALICLL

ThefollowingtwodependentvowelsusedtotranscribeSanskritsoundsarenotpermitted.TheyarenotinMSR-3.

• 0C62◌TELUGUVOWELSIGNVOCALICL

• 0C63◌TELUGUVOWELSIGNVOCALICLL

StartingfromtheMSR-3,Therearefourcodepointstobeexcluded.

No. UnicodeCodePoint

Glyph

CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference Note

1. 0C0C ఌ TELUGULETTERVOCALICL

2Telu5Gon6bother

Vowel 103,108,109

ItisnotusedinmodernTelugu

2. 0C31 ఱ TELUGULETTERRRA

2Telu5Gon6bother

Consonant 103,108,109

ItisnotusedinmodernTelugu

3. 0C55 ◌ TELUGULENGTHMARK

2Telu5Gon6bother

Matra 103,108,109

Itisnotavailableongeneralkeyboard.

4. 0C56 ◌ TELUGUAILENGTHMARK

2Telu5Gon6bother

Matra 103,108,109

ItisnotusedinmodernTelugu

Table8:Excludedcodepoints

6.VariantsTelugu code points representing the basic simple stand-alone characters and somedependentcharactersmayenterintodifferentcombinationstoformsyllables.TherearenocharactersintheTeluguUnicodechartthateitherinsimpleformorincombinedform

Page 17: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

17

aredeemedsimilarbyNBGP.However,Teluguhasasmallnumberofvariantsthathaveidenticalvaluesbutderivefromdifferentcharactercombinations.TheNBGPcategorizestheseconfusinglysimilarvariantsintwogroups.

6.1Type1:SimilaritywithintheScriptCertainvowels[o,ō]displaydifferentshapesincombinationwithcertainconsonants,thoughtheyhavesharedsoundandcodepointvalues.Forexample:

i. Ca+e+u(:)->mo(:)

ii. Ca+o(:)->ko(:)

Thevariants,whichareoftenconfusingandofvariableacceptanceareduetothedisplayoftheirrenderingdifferentlyduetotheidenticalcodepoints.These cases are interesting in that they present no similarity in their forms but havesimilarphoneticoutput.ItisnotunusualtofindsuchregionalvariationsandtheyareregularlyusedbyTeluguusers.Thesemaynotcauseconfusionbutbecomeannoyingtolearners.However,◌+◌ు(U+0C46+U+0C41)ismatra+matrasequence,whichisnotallowedintheWLE rules in section 7. Therefore, these are not defined as variant sequences byNBGP.Class Characterseq.[Ca+e+u] ->Co<- Ca+o

1 [క+◌+◌ు]->

0C15+0C46+0C41(This class includes otherconsonants like, kha, ga,nga,ca,cha,ja,nya,ta,tha,da,dha,na,ta,tha,da,dha,na, pa, pha, ba, bha, ra, la,va,Sa,sha,sa,andha)

z�ు

Blocked

z� క+◌

0C15+0C4A

2

[మ+◌+◌ు]->

0C2E+0C46+0C41

� మ

Blocked

మ+◌

0C2E+0C4A

[య+◌+◌ు]->

0C2F+0C46+0C41

� య Blocked య+◌

0C2F+0C4A

[ఝ+◌+◌ు]->

0C1D+0C46+0C41

� ఝ Blocked ఝ+◌

0C1D+0C4A

Page 18: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

18

[ఘ+◌+◌ు]->

0C18+0C46+0C41

� ఘ Blocked ఘ+◌

0C18+0C4A

Table9a:Similaritywithinthescript

6.2Type1:VariantswithinScriptduetoAlternativeSpellingSimilar to the above, there are a set of representations in Telugu syllable formationswhere a homorganic nasal (anusvāra) in a syllable has alternate spelling which isrepresentedvisuallydifferent,asshownbelow.

No. Homorganicnasal(anusvāra)+consonant

Homorganicnasalconsonant+halant+consonant

1. లంక/laMka/ లఙN/laŋka/‘island’

2. కంO/kaMce/ కఞQR[kaɲce]‘fence’

3. పంట/paMTa/ పణV /paNTa/‘harvest’

4. కంత/kaMta/ కనY /kanta/ ‘hole’

5. కంప/kaMpa/ కమ[/kampa/‘thornybush’

6. కంస/kaMsa/ కమ]/kansa/‘kingKansa’

7. ^ంహ/siMha/ ^మ/simha/‘lion’

Table9b:Variantswithanusvāraalternatingwithnasalconsonants

Writingalternativelywithanasalconsonant+halant+consonantisrareinTeluguandoftenoccurwhiletranscribingSanskritwords.Sincethevariantshaveexactlythesamepronunciation, the rarer representation of nasal consonant + halant + consonant isdisallowedinordertoavoidthesourceofconfusion.

NasalConsonantsare:1.U+0C19TELUGULETTERNGA(ఙ)2.U+0C1ETELUGULETTERNYA(ఞ)3.U+0C23TELUGULETTERNNA(ణ)4.U+0C28TELUGULETTERNA(న)5.U+0C2ETELUGULETTERMA(మ)

Similarlyandveryfrequently,thewordfinalమ&[mu]isoftenrepresentedalternativelybythevariantanusvāra◌ం[M]asinthefollowing:

కలంkalaM కలమ& kalamu ‘pen’

Page 19: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

19

ప�సYకం pustakaM ప�సYకమ& pustakamu ‘book’

ఆమ&దం a:mudaM ఆమ&దమ& a:mudamu ‘castoroil’

�శం deSaM �శమ&deSamu ‘country’

Insuchcases,oneoftheconfusablevariantsmustbedisallowed.ThiscanbedisallowedbytheWLErule:Hcannotfollowanasalconsonant.

6.3Type2:SharedSimilaritywiththeOtherRelatedScripts.There aremanyBrahmiderived scripts particularly in theSouthern part of India, SriLanka,andSouthEastAsia.Someofthecharactersofthesescriptsdisplaysimilaritywitheachother.Suchcases,relevantforTeluguscript,aregivenbelow.

6.3.1Type2:Cross-ScriptVariantsforTeluguandKannadaAnumberofcharactersoftheKannadascriptarealmostsimilartocharactersofTeluguscript,exceptfortheflattenedhead-strokeinKannadacontrastingwithatickmarkonthetopofthecharacterinTelugu.Outofthetotal,thereare34suchcaseswhicharecategorizedasvariantsets,asshowninthefollowingtable.

VariantSet TeluguCodePoint KannadaCodePoint 1 ◌ం (0C02) ◌ಂ (0C82)

2 ◌ః (0C03) ◌ಃ (0C83)

3 అ (0C05) ಅ (0C85)

4 ఆ (0C06) ಆ (0C86)

5 ఇ (0C07) ಇ (0C87)

6 ఈ (0C08) ಈ (0C88)

7 ఐ (0C10) ಐ (0C90)

8 ఒ (0C12) ಒ (0C92)

9 ఓ (0C13) ಓ (0C93)

10 ఔ (0C14) ಔ (0C94)

11 ఖ (0C16) ಖ (0C96)

12 గ (0C17) ಗ (0C97)

13 జ (0C1C) ಜ (0C9C)

14 ఝ (0C1D) ಝ (0C9D)

Page 20: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

20

VariantSet TeluguCodePoint KannadaCodePoint 15 ఞ (0C1E) ಞ (0C9E)

16 ట (0C1F) ಟ (0C9F)

17 ఠ (0C20) ಠ (0CA0)

18 డ (0C21) ಡ (0CA1)

19 ఢ (0C22) ಢ (0CA2)

20 ణ (0C23) ಣ (0CA3)

21 థ (0C25) ಥ (0CA5)

22 ద (0C26) ದ (0CA6)

23 ధ (0C27) ಧ (0CA7)

24 న (0C28) ನ (0CA8)

25 బ (0C2C) ಬ (0CAC)

26 భ (0C2D) ಭ (0CAD)

27 మ (0C2E) ಮ (0CAE)

28 య (0C2F) ಯ (0CAF)

29 ర (0C30) ರ (0CB0)

30 ల (0C32) ಲ (0CB2)

31 ళ (0C33) ಳ (0CB3)

32 ◌ (0C3F) ◌ (0CBF)

33 ◌ు (0C41) ◌ು (0CC1)

34 ◌ృ (0C43) ◌ೃ (0CC3)

Table10:Cross-scriptvariantcodepointsforTeluguandKannada TheTeluguandKannadavariantsetsinTable10arecross-scriptvariantcodepoints.Thedetailsofvariousaksharcombinationsandvariantdispositioncanbefoundinsection6.4Codepointswhichhavebeenanalyzedandfoundtobesimilar,butnotconsideredasvariants,arelistedinAppendixA.

Page 21: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

21

6.3.2Type2:Cross-ScriptVariantsforTeluguandDevanagariVisargaistheonlyidenticalcodepointthatexhibitsshapesimilaritybetweentheTeluguandDevanagariscripts.However,astherearenoothervariantcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.

DevanagariCodePoint TeluguCodePoint

◌ः (0903) ◌ః (0C03)Table11:Candidatecross-scriptvariantcodepointforTeluguandDevanagari

6.3.3Type2:Cross-ScriptVariantsforTeluguandGujaratiVisargaistheonlyidenticalcodepointthatexhibitsshapesimilaritybetweentheTeluguandGujaratiscripts.However,astherearenootheridenticalcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.

GujaratiCodePoint Telugu CodePoint

◌ઃ (0A83) ◌ః (0C03)

Table12:Candidatecross-scriptvariantcodepointforTeluguandGujarati

6.3.4Type2Cross-ScriptVariantsforTeluguandOriya ThefollowingcodepointsexhibitsimilaritybetweentheTeluguandOriyascripts.

TeluguCodePoint Oriya CodePointం (0C02)ANUSVĀRA

ଠ (0B20)LETTERTTHA

ః (0C03)SIGNVISARGA

ଃ (0B03)SIGNVISARGA

ర (0C30)LETTERRA

ଠ (0B20)LETTERTTHA

Table13:Candidatecross-scriptvariantcodepointsforTeluguandOriyaThefirsttwo(U+0C02–U+0B20andU+0C03–U+0B03)aredependentsignsandU+0C30isastand-alonecharacterinTelugu.NBGPdiscussionsconcludedthatthereisnoneedtorecognizethecross-scriptvariantcodepointsbetweentheOriyaandtheTeluguscripts.ThisisbecauseU+0C30andU+0B20aredistinguishableandtherearenotenoughothervariantcodepointsineachscripttoformlabelsthatlookthesame.Therefore,thesearenotdefinedasvariantcodepoints.

6.3.5Type2:Cross-ScriptVariantsforTeluguandMalayalamThetwocodepoints,viz.theanusvāraandthevisargaaretheonlyidenticalsignsbetweentheTeluguandMalayalamscripts.However,astherearenotenoughother

Page 22: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

22

variantcodepointstoformlabels,theyarenotdefinedasvariantcodepointsbetweenthetwolanguages.

TeluguCodePoint Malayalam CodePoint

◌ం (0C02) ം (0D02)

◌ః (0C03) ഃ (0D03)

Table14:Candidatecross-scriptvariantcodepointsforTeluguandMalayalam

6.3.6Type2:Cross-ScriptVariantsforTeluguandSinhalaThefollowingthreepairsofcharactersrepresentedbythecorrespondingcodepointsbetweentheTeluguandSinhalawhichmaybeconsideredashavingonlysimilarityifthesimilaritybetween0C30and0DBBisnotsustainable.HoweverNBGP,inconsultationwithSinhala,concludesthat0C30and0DBBcouldcauseconfusionfromthescriptuserpointofview.Therefore,theyareproposesascrossscriptvariantsbetweenthetwoscriptsandthedispositionisblocked.”ThisanalysisfollowstheNBGPCross-scriptVariantinclusionpolicyavailableinAppendixC.

TeluguCodePoint Sinhala CodePoint

◌ం (0C02) ං (0D82)

◌ః (0C03) ඃ (0D83)

ర (0C30) ර (0DBB)

Table15:Cross-scriptvariantcodepointsforTeluguandSinhala

6.4CrossScriptVariantsofVariousAksharCombinations6.4.1ConjunctConsonantCombinationsCrossscriptvariantsofvariousAksharcombinations(consonant-consonant-dependentcharacters)commonbetweentheTeluguandKannadascriptsincludethefollowing:

VariantSet TeluguCodePoint KannadaCodePoint 1 ◌ం (0C02) ◌ಂ (0C82)

2 ◌ః (0C03) ◌ಃ (0C83)

3 ఖ (0C16) ಖ (0C96)

4 గ (0C17) ಗ (0C97)

5 జ (0C1C) ಜ (0C9C)

6 ఝ (0C1D) ಝ (0C9D)

Page 23: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

23

VariantSet TeluguCodePoint KannadaCodePoint 7 ఞ (0C1E) ಞ (0C9E)

8 ట (0C1F) ಟ (0C9F)

9 ఠ (0C20) ಠ (0CA0)

10 డ (0C21) ಡ (0CA1)

11 ఢ (0C22) ಢ (0CA2)

12 ణ (0C23) ಣ (0CA3)

13 థ (0C25) ಥ (0CA5)

14 ద (0C26) ದ (0CA6)

15 ధ (0C27) ಧ (0CA7)

16 న (0C28) ನ (0CA8)

17 బ (0C2C) ಬ (0CAC)

18 భ (0C2D) ಭ (0CAD)

19 మ (0C2E) ಮ (0CAE)

20 య (0C2F) ಯ (0CAF)

21 ర (0C30) ರ (0CB0)

22 ల (0C32) ಲ (0CB2)

23 ళ (0C33) ಳ (0CB3)

24 ◌ (0C3F) ◌ (0CBF)

25 ◌ు (0C41) ◌ು (0CC1)

26 ◌ృ (0C43) ◌ೃ (0CC3)

Table16:Cross-scriptvariantsbetweenTeluguandKannadaforconjunctconsonantcombinationanalysis

Table16includes26distinctTelugucodepointsthatoccurintheformationofconjunctconsonantcombinationsinTeluguandKannada.ExcludingthestandalonevowelsfromthetotalcommonAksharcombinationsofcrossscriptvariants,thereareasetof21consonants(C),threevowelmatras(M)andtwovowelmodifiersthatenterintotheformationofthefollowingcombinations:

Page 24: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

24

Sl.No.

Aksharcombinations Number

1. CM =21*3=632. CB =21*1=213. CX =21*1=214. CHCM =21*21*3=13235. CHCB =21*21*1=4416. CHCX =21*21*1=4417. CHCMB =21*21*3*1=13238. CHCMX =21*21*3*1=13239. Allcombinations: =4956

Table-17totalnumberofAksharcombinations

Thereoccursatotalof4956conjunctconsonantcombinationsmodifiedbymatrasandvowelmodifiersthatareidenticalandcanbelabeledforvariantlabelsbetweenTeluguand Kannada scripts. These combinations are covered by the variant code points inSection6,Table10andTable15.

6.4.2OtherCombinations

NBGPcreatesthepossiblecombinationsofTelugucodepointsandcrosscheckwithotherNeo-Brahmiscriptsforcandidatevariants.Thepossiblecombinationsare:

1.CHCMB,CHCMX2.CHCM,CHCB,CHCX3.VB,VX,V4.CHC,CM,CB,CX,C

Where,

C → ConsonantM → MatraV → VowelB → Anusvāra(Bindu)X → Visarga H → Halant/Virama

NBGPconcludesthatbesidethoseidenticalcodepointsdefinedasvariantsinSection6,Table 10 and Table 15, there are no other variant code points between Telugucombinationsandotherscriptscodepointsorcodepointcombinations.6.5Variantdisposition

As variantsmentioned in Section 6, Table 10 and Table 15 can result inwhole labelvariants,theymaybeconsideredfor"blocked"disposition.Thereisnopreferenceamongthesevariants.Whicheverlabelcontainingeitherofthesevariantsischosenearlier,theotherequivalentvariantlabelshouldbeblocked.

Page 25: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

25

7.WholeLabelEvaluationRules(WLE)InthissectionweprovidetheWLEsthatarerequiredbythelanguage.Anumberofrules have been formulated so that they can be adopted for LGR specification.BelowarethesymbolsusedintheWLErules,foreachofthe"IndicSyllabicCategory"asmentionedintheTable7:Codepointrepertoireandthedetailsofsyllableformation,seeAppendixB.

C → ConsonantM → MatraV → VowelB → Anusvāra(Bindu)X → Visarga H → Halant/Virama Nasal-C → NasalConsonant

Rule1. HmustbeprecededbyC(Ref.AppendixB:SyllableformationRule4)Rule2. MmustbeprecededbyC(Ref.AppendixB:SyllableformationRule6)Rule3. XmustbeprecededbyVorMorC(Ref.AppendixB:syllableformationrule3c,

5cand7c)Rule4. BmustbeprecededbyVorMorC(Ref.AppendixB:syllableformationrule3b,

5band7b)Rule5. HcannotfollowNasal-C(Ref.Section6.2Type1)Rule6. VcannotbeprecededbyHForRule6,therecouldbecasesinvolvingmulti-worddomainswhereVmayneedtobeallowedtofollowanH.ThisisthecasewheretwodifferentwordsarejoinedtogetherbutfirstofwhichendswithaHalantandthesecondwordbeginswithaVowel.SomesectionsofthelinguisticusagerequiretheexplicitpresenceofHforfullrepresentationofthesoundintended.However,byandlarge,theformofthefirstwordwithouttheHisconsideredenoughforfullrepresentationofthesoundintended as in the following examples: Example:

‘houseofknowledge’:��� అ�ఉల��da:rHalHulu:mH/��� అల$ల��da:rHalulu:mH‘TheQor’an’:ఖు� ఆ� KhurHa:nH/ఖు�� Khura:nH‘inTelanganaRashtraSamiti’:ట�ఆ� ఎ� ల�ti:a:rHesHlo/ట�ఆ����  ti:a:rHesHlo‘Y.S.R.C.party’:|¡ఎ� ఆ� ¢vaiesHa:rHsi:pగ/|¡ఎ��£]¢vaiesHa:rHsi:pi ‘BritishIndia’:¤}ట¥¦ఇం§య©bHritiShHiMdiya /¤}ట¥ªం§య©britiShiMdiya

TherepresentationswheretherearecaseswithVprecededbyHagainstwhereVisnotprecededbyH,thelatterisawkwardandtheformerisindemandinmodernusage.

Thisisauniquesituationnecessitatedbythelackofhyphen,spaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire.Otherwise,VisneverrequiredtobeallowedtofollowanH.However,permittingthis

Page 26: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

26

maycreateaperceptuallydissimilarbutphoneticallyandsemanticallysimilaritybetweenthetwolabels(withandwithoutH)formajorityofthelinguisticcommunity,hencethisisexplicitlyprohibitedbytheNBGP.8.ContributorsGangadharPandayUmaMaheshwaraRao,G.NBGPmembers

9.References[MSR-3] IntegrationPanel,"MaximalStartingRepertoire—MSR-3Overviewand

Rationale",28March2018https://www.icann.org/sites/default/files/packages/lgr/msr/msr-3-wle-rules-28mar18-en.html

[101] Disanayaka,J.B.2017.EncyclopediaofSinhalaLanguageandCulture.Colombo:SumithaPublishers.Firstedition2012.

[102] Krishnamurti,Bhadriraju,Ed.,2000.Telugubhaashaacharitra.Hyderabad:P.S.TeluguUniversity.Firstedition1974.

[103] Krishnamurti,BhadrirajuandJPLGwynn.1985.AGrammarofModernTelugu.NewDelhi:OxfordUniversityPress.ISBN978-0-19-561664-4.Delhi.

[104] Sarma,I.K.1980.CoinageofSatavahanaEmpire.Delhi:AgamKalaPrakashan,

[105] Sridhar,S.N.1980.Kannada.NewYork:Routledge.

[106] Suresh,Kolichala.2012.ProposaltoencodeTeluguLLLA,Teluguೞ:http://eemaata.com/unicode-proposal/telugu-llla-proposal.pdf.Accessedon9July2018.

[107] Suresh,Kolichala.2012.Divergentdevelopmentsofalveolarstop*ṯinTeluguhttp://kolichala.com/dravidian/Divergent_developments_of_alveolar_stop_in_Telugu.pdf.Accessedon9July2018.

[108] TeluguUnicodeChart,TeluguRange:0C00–0C7F.TheUnicodeStandard,Version10.0.http://www.unicode.org/Public/10.0.0/charts.Accessedon9July2018.

[109] UmaMaheshwaraRao,G.2012.Telugubhaasha-saMgaNanaM.Hyderabad:P.S.TeluguUniversity.ISBN:81-86073-372-9.

[110] UmaMaheshwaraRao,G.2003.StandardTeluguWrittenLanguage.VIDYULLIPI-4.pp.1-14.Hyderabad:SCIL.

[111] UshaDevi,A.andChandraSekharaReddy.D.2015.PeoplesLinguisticSurveyof India.AndhraPradeshandTelanganarAshtraalabhaashalu,vol.3,part1.ISBN:978-93-85231-05-6.Hyderabad:emesco.

Page 27: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

27

AppendixA:ConfusableCodePointsAnalysis

A-1.TeluguandKannadaThefollowingtabledefinesTeluguandKannadacodepointswhichareconfusable.

No.

Telugu Kannada

CP Glyph CP Glyph

1 0C35 వ 0CB5 ವ

2 0C36 శ 0CB6 ಶ

3 0C38 స 0CB8 ಸ

TableA-1:ConfusablecodepointsofTeluguandKannadascript The following table lists other code points which have been analyzed and concluded that they are distinguishable.

No.

Telugu Kannada NBGPresolution

CP Glyph CP Glyph

1 0C0E ఎ 0C8E ಎ distinguishable

2 0C18 ఘ 0C98 ಘ distinguishable

3 0C19 ఙ 0C99 ಙ distinguishable

4 0C1A చ 0C9A ಚ distinguishable

5 0C1B ఛ 0C9B ಛ distinguishable

6 0C2A ప 0CAA ಪ distinguishable

7 0C2B ఫ 0CAB ಫ distinguishable

8 0C37 ష 0CB7 ಷ distinguishable

9 0C4C ◌ 0CCC ◌ distinguishable

TableA-2:OtherNBGPresolutionsonTeluguandKannadascript

A-2.TeluguandMalayalamBesidethose identicalcodepointsdefinedasvariants inSection6, therearenoothersimilarcodepointsbetweenTeluguandMalayalam.

A-3.TeluguandSinhalaBesidethose identicalcodepointsdefinedasvariants inSection6, therearenoothersimilarcodepointsbetweenTeluguandSinhala.

Page 28: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

28

AppendixB:SyllableformationintheTeluguScriptTheTeluguscriptgrammarallowsustostate thenatureandstructureof thegraphicsyllables in the formation ofwords. The extended notion of syllable is often used tocharacterize orthographies of South-Asian scripts especially Brahmi derived scriptswhere words are composed of sequences of one or more orthographic aksharas orsyllables.Theseaksharasareagaincomposedofsequencesofcertaincharactersfromthealphabet.TheTelugualphabethasthe followingtypesofcharacters(encoded intotheUnicode)thateitherontheirownorbyenteringlargercombinationsformaksharasasshownhere.Thereare12differenttypesofsyllablespossibleinTelugu:ThefollowingVariablesareinvolvedintheformationofsyllable[$]:

• C=Consonants, that arestandalonecharactersorgraphemeswithan inherentvowel`a’canfunctionassyllables;

Stops:క ఖ గ ఘ ఙ చ ఛ జ ఝ ఞ ట ఠ డ ఢ ణ త థ ద ధ నప ఫ బ భ మ;Fricatives:శ ష స హSonorants:య ర ఱ ల ళ వ

• V=Vowels,thatstandaloneandrepresentedbythegraphicsignsofthefollowingmayfunctionassyllables;

అ ఆ ఇ ఈ ఉ ఊ ఎ ఏ ఐ ఒ ఓ ఔ ఋ

• M = Matras or the dependent vowel signs when occurwith a consonant mayfunction as syllables (characteristically delete the inherent vowel of theconsonant);

Example.z z« z¬ క$ క� z� z­ z�® z� z¯ z°;etc.

• H=Halantorvirama= ◌;ItmayoccurwithoneoftheconsonantsrepresentedbyCtoformCHsyllables;

Example.� ± ² ³ ´

• B=Pūrṇānusvāra,thehomorganicnasalandanArchiphoneme= ◌ం,mayoccurwithoneoftheC,V,andthecombinedCMtoformCB,CMB,VB,andC([HC]*)B

• • X= visarga or the glottal check= ◌ః, may occur with one of the C, V, and the

combinedCMtoformCX,CMX,VXThe operators used: The following four operators are employed to define thedelimitationofthegraphicsyllablesinTelugu.

Page 29: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

29

No. Symbol Function;

1. | Alternative;

2. [] enclosesoptionalelements;

3. * Variableoccurrence;

4. () Thesequencecluster;

TableB-1symbolsandfunctionsAnAksharainTelugucanbedefinedasanyCorVandacombinationofM(dependentvowels),andthevowelmodifiersasinthefollowing:ThefollowingsyllableformationrulesderiveallpossiblegraphicsyllablesinTelugu.1.Thesyllableformationrule-1,a$=V;Everystandalonevowelcharactercanfunctionasasyllable,Ex.

అ,ఆ,ఇ,ఈ,ఉ,ఊ,ఎ,ఏ,ఐ,ఒ,ఓ,ఔ,ఋ;Aftertheexclusionofobsoletevowels13syllablesarepossible.2.Thesyllableformationrule-2,a$=C;Everystandaloneconsonantcharactercanfunctionasasyllable,Ex.

క ఖ గ ఘ ఙ, చ ఛ జ ఝ ఞ,

ట ఠ డ ఢ ణ, త థ ద ధ న, ప ఫ బ భ మ, య ర ఱ ల ళ వ,

శ ష స హ;Thereare35suchsyllablesarepossible.3.Syllableformationrule-3,$=VB|X;Example:

3a=V+B=$;అం ఆం ఇం ఈం ఉం ఊం ఎం ఏం ఐం ఒం ఓం ఔం;3b=V+X=$;అః ఆః ఇః ఈః ఉః ఊః ఎః ఏః ఐః ఒః ఓః ఔః;

IncombinationwithVandoneofthetwoBorX,atotal36syllablesarepossible.SyllablecombinationswithvocalicRarenotused.4.Syllableformationrule-4,a$=CH;AstandaloneconsonantmaybeappendedbythehalantmarkerHtoformthecorrespondinggraphicsyllablesasshownhere.

Page 30: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

30

Example:� ± ² ³ ´µ ¶ · ¸ ¹º » ¼ ½ ¾¿ À Á  �Ã Ä Å Æ �Ç � È � É ÊË ¦ � Ì

Thereare35suchgraphicsyllablesarepossible.5.Syllableformationrule-5,$=CB|X;Ex.Standaloneconsonantscantakeoneofthethreevowelmodifiersandformthecorrespondingsyllablesasshownbelow:Example:

5a.$=CB:కం ఖం గం ఘం ఙం చం ఛం జం ఝం ఞం టం ఠం etc.5b.$=CX:కః ఖః గః ఘః ఙః చః ఛః జః ఝః ఞః టః ఠః etc.

Thereare2*35=70graphicconsonantmodifiersyllablesarepossible.

6.Syllableformationrule-6,$=CM;Aconsonantmaygetattachedwithavowelmodifierorthedependentvoweldiacritictoformthecorrespondingsyllables;Example:

z z« z¬ క$ క� కృ క z� z­ z�® z� z¯ z°;etc.Atotalof35*13consonant+voweldiacriticcombinationsmayderive455graphicsyllablesinTelugu.

7.Syllableformationrule-7,$=CMB|X;Aconsonantwithadependentvowelwhenfollowedbyoneofthethreemodifiersmayderivethefollowinggraphicsyllables;Example:

7a.zం z«ం z¬ం క$ం క�ం z�ం z­ం z�®ం z�ం z¯ం z°ం7b.zః z«ః z¬ః క$ః క�ః z�ః z­ః z�®ః z�ః z¯ః z°ః

Atotalof35*12*2consonantplusadependentvowelandoneofthethreemodifiersderive840possiblegraphicsyllablesinTelugu.

8.Syllableformationrule-8,$=CH[(C)*C];Anyconsonantfollowedbythehalantmarkermaycombinewithanotherconsonantorconsonantstoformcomplexgraphicsyllables;Example:

2consonantclusters:ÍÎ గÏ ,ÐÑ ,ఙÒ ,చR,ఛÓ,జÔ ,ÕÖ ,ఞ× ,టV ,ఠØ ,డÙ ,ÚÛ ,ణÜ ,etc.

3consonantclusters:రÝÞ,షV ß,సY à,నá ß,ఙÑ ß షâã,త}~,త]ä etc.

4consonantclusters:త]ä~ ;

Page 31: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

31

Atotalof35*1*35=1225CHCsyllablesinvolvingtwoconsonantclustersarepossible;Further, a total of 35*1*35*1*35 =42,875 CHCHC syllables involving three consonantclustersarepossible;Thoughfourconsonantclustersareextremelyrarebuttheoreticallypossibleasshownabove.9.Syllableformationrule-9,$=CH(CH[CH])CM;Anyconsonantfollowedbythehalantmarkerandaconsonantorconsonantsmaybeappendedbyoneofthedependentvowelstoformcomplexgraphicsyllablesinvolvingtwotothreeconsonantclusters;Example:

క$N åÎ æçÏ ,èéÑ ,ఙêÒ ,O�R,ఛూÓ,జ­Ô ,ëìÖ ,ఞí× ,ట�V ,ఠîØ ,§�Ù ,ÚూÛ ,ణ�Ü ,etc.

రïÝÞ ,ªV ß,^Y à,ðá ß,ఙ¥Ñ ß ñ âã,!�} ~,తò]äetc.

!�]ä~

Atotalof35*1*35*1*12=14,700complexsyllablesinvolvingtwoconsonantclustersfollowedbydependentvowelsarepossible.

Atotalof35*1*35*1*35*12=5,14,500complexsyllablesinvolvingthreeconsonantclustersfollowedbydependentvowelsarepossible.ThefollowingisasummaryofpossiblesyllabletypeswiththeglyphsinTelugu:

$= V([B|X])|CM([B|X])|CH(CH[C])M([B|X])Asperourdefinitionthefollowing21subtypesofgraphicsyllablesarepossiblewhichhowevercanbegroupedunder8rulesasdiscussedabove.

$= V|VB|VX|C|CB|CX|CM|CH|CHC|CHCB|CHCX|CHCMreCHCH|CHCHC|CHCHCB|CHCHCX|CHCHCM

Therefore,typologically8distincttypesofgraphicsyllablescanbederivedinthelanguage.

Page 32: Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

32

AppendixC:NBGPCross-scriptVariantInclusionPolicy

If, inanytwogivenscripts,allthepotentialcross-scriptvariantsconsistofdependent(e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu etc.) charactersONLY, then thatentiresetcanbeignoredandnocross-scriptvariantsbeproposedbetweenthosetwoscripts.

If,inanytwogivenscripts,thereisATLEASTONEnon-dependent(e.g.Consonant,Voweletc.)cross-scriptvariantcharacter/sequencepresent,allthepotentialcross-scriptvariantsbeconsideredandproposedbetweenthetwoscripts.Thiscross-scriptanalysishasbeenrestrictedtothescriptsthathavedescendedfromtheBrahmiasmostofthemsharesimilarusagepatterns.Byandlarge,allofthesescriptshaveacommonsetofcharactersthatexistedinBrahmiscriptandbearthesameidentities.However,asthescriptsbranchedoutfromtheBrahmi,dependingonvariousfactors,theshapesofthecharacterschanged.Thischangeintheshapewasnotuniformacrossallthecharactersandthescripts.Somecharactersshapesdidchangesignificantlywhereassomeofthemstillretainedsimilarity.Thecross-scriptsimilarityanalysisalsoaimstoidentifysuchcaseswherethesamecharacterretainedalmostthesameshapedespitebeingpartofthedifferentscripts.Thesesetofcharactersarevariantsofeachotherintruesensethanmerelyofco-incidentalvisualsimilarity.Since,havingsuchlabelsisarealisticpossibilityandthecorrespondinglabelslookalmostexactlyalike,NBGPhasproposedthemasblockedvariants.

NBGPacknowledgestheconcernthatthisshapeisquitegenericandmayhaveparallelsinotherscriptsnotunderitsambit.However,asNBGPdoesnothaveanyexposureaboutactualusageofthosecharactersinthoseparticularscripts,NBGPdesistedfromincludingthemintheanalysis.AsNBGPhasalreadyconsideredalltherelatedscriptsunderthecross-scriptvariantanalysis,thesimilarityofthecharactersbelongingtoNBGPscriptswith other scripts not under the NBGP ambit,may be of amere co-incidental visualnature.

Additionally,thisconcernisnotlimitedtothesetwocharactersbutforallthecharactersinallthescriptsunderthescopeoftheRootLGRprocedure.Carryingoutthisanalysiscan practically be done onlywith theGeneration Panels that existwhile theNBGP isactive.ThisstillleavesoutthosescriptsoutofthescopewhichmaynothaveaGenerationPanelestablishedyet.Hence,carryingoutthisexerciseinentiretyisquiteimpracticable.Thisconundrumcanberesolvedifallthesuchcasesarehandledbythe"StringSimilarityAssessmentPanel"ofICANN.