Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states

1

ProposalforaTeluguScriptRootZoneLabelGenerationRuleset(LGR)

LGRVersion:3.0Date:2018-08-08Documentversion:2.6Authors:Neo-BrahmiGenerationPanel[NBGP]

1. GeneralInformation/Overview/AbstractThisdocumentlaysdowntheLabelGenerationRuleSetfortheTeluguscript.Threemaincomponentsof theTeluguScriptLGR, viz. Code point repertoire, Variants andWholeLabelEvaluationRuleshavebeendescribed indetailhere.All thesecomponentshavebeen incorporated in a machine-readable format in the accompanying XML file:"Proposal-LGR-Telu-20180808.xml".

Inaddition,alistoftestlabelshasbeenprovidedinthefollowingfile,whichcoverstherepertoire,variantcodepointsandthewholelabelevaluationrules,providingexamplesforvalidandinvalidlabels:“telugu-test-labels-20180808.txt”.

2. ScriptforwhichtheLGRisproposedISO15924Code:TeluISO15924KeyN°:340ISO15924EnglishName:TeluguLatintransliterationofnativescriptname:telɯgɯNativenameofthescript:!ల$గ&MaximalStartingRepertoire[MSR]version:3TheUnicodeStandard,Version:6.3TeluguUnicodeRange:0C00–0C7F

3. BackgroundoftheScriptandPrincipalLanguagesUsingItTheTelugulanguageusestheTeluguscriptwhichiswrittenintheformofsequencesoforthographic syllables. Each orthographic syllable is formed of one or more Telugucharactersplacedfromlefttorightandtoptobottom.Teluguisoneofthe22scheduledlanguages of India. The Telugu script is immediately related to Kannada and closelyrelatedtotheSinhalascript.

2

3.1TheEvolutionoftheScriptTheoriginsoftheTeluguscriptcanbetracedtotheBrahmialphabetofancientIndia,often known as Asokan Brahmi. Historically the script is derived from the SouthernBrahmiorBhattiproluBrahmialternativelyknownastheTeluguBrahmialphabetof3rdcentury BCE. Later, by 5th century during the Chalukyan period, it developed into acommonalphabetusedforTeluguandKannada.TheTelugu-Kannadacommonalphabetsplitintotwoseparatealphabetsduringthe12thand13thcenturiesADtobecalledtheTeluguandKannadascripts.Inadditiontothecommonorigin,alongerperiodofsharedpolitical and cultural confederation of the Telugu and Kannada speaking regions hasultimatelyresultedintheconsiderableproportionofthesharedidenticalcharactersignsbetweenthetwoscripts(34outof63characters,seeTable10).TheearliestknowninscriptionscontainingTeluguwordsappearonthebilingualcoinsofSatavahanas that date back to 2nd centuryAD [104]. The first inscription entirely inTeluguwasmadein575ADandwasprobablymadebyRenatiCholas,whostartedwritingroyalproclamations inTelugu insteadof Sanskrit.Telugudevelopedasapoetical andliterarylanguageduringthe11thcenturyAD.Untilthe20thcenturyTeluguwaswritteninGranthicstyleverydifferentfromthecolloquiallanguage.Duringthesecondhalfofthe20thcentury,amodernwrittenstyleemergedbasedonthemoderncolloquiallanguage.In2008TeluguwasdesignatedasaclassicallanguagebytheIndiangovernment.

Figure1:EvolutionofTeluguscript

3.2NotableFeaturesTheTeluguorthographysuperficiallyappearsasaseriesofcirclesandsemi-circles.MostconsonantscarryatickmarkcalledTalakattu.Thewritingsystemisclassifiedasabugidatype that employs alpha-syllabaries. The alphabet consistsof vowels, consonants andmodifiers.Eachofthesevowelsandconsonantshasoneormoresecondaryallographs.Thesecondaryallographsalwaysappearasdependentsymbolsonthefirstcharacterofasyllable.Eachsyllableisformedofasinglestandalonevoweloroneormoreconsonants.Eachoftheseconsonantsmayoccurwithaninherentvowelormodifiedbyasecondaryvowel.AConsonantclustermaybeformedwithasinglestandalonecharacterfollowed

3

byoneormoresecondaryformsofconsonants.Theorderofcompositionofsyllabariesdoesnotmatchwith the readingorder.Thereare rules to learn to readorthographicsequencesintophoneticsequenceswhethersimpleorcomplexsyllables.

3.3TheTelugu(!ల$గ&)Language

The Telugu language is a Dravidian language spoken by about 75million (ca. 2001)peoplemainlyinthesouthernIndianstatesofAndhraPradeshandTelanganawhereitisthe official language. It is also spoken in such neighboring states asKarnataka, TamilNadu,Orissa,MaharashtraandChattisgarh,andisoneofthe22scheduledlanguagesofIndia. There are also quite a few Telugu speakers in Canada, the USA, South Africa,Malaysia,Mauritius,Myanmar,SriLankaandRéunion

3.4LanguagesthatUsetheTeluguScriptThescriptisalsousedfortenotherlanguages,viz.Gondi,Koya,Konda,Kuvi,KolavarorKolami,Yerukala,BanjaraorLambadi,SavaraorSora,AdivasiOdiyaandalsoSanskrit.IntheTeluguspeakingregion,thetraditionofwritingSanskritintheTeluguscripthasremained a commonpractice. During the last fewdecades, a considerable number ofpublicationsintheformoftextbooks,dictionariesandotherreadingmaterialhasbeenproduced in theTeluguscript inGondi,Koya,Konda,Kuvi,Kolami,Yerukala,Banjara,SavaraandAdivasiOdiya.

no. Nameofthelanguage(ISO639Code)

Languagefamily

Status EGIDSScale

1 Telugu(tel) Dravidian ScheduledandClassical

2

2 Gondi(gon) Dravidian ModernTribal 5

3 Koya(kff) Dravidian ModernTribal 5

4 Konda(knd) Dravidian ModernTribal 6b

5 Kuvi(kxv) Dravidian ModernTribal 5

6 KolavarorKolami(kfb) Dravidian ModernTribal 5

7 Yerukala(yeu) Dravidian ModernTribal 6

8 BanjaraorLambadi(lmn) Indo-Aryan ModernTribal 5

9 SavaraorSora(srb) Austro-Asiatic

ModernTribal 5

10 AdivasiOdiya(ort) Indo-Aryan ModernTribal 5

4

no. Nameofthelanguage(ISO639Code)

Languagefamily

Status EGIDSScale

11 Sanskrit(san) Indo-Aryan ScheduledandClassical

4

Table1:MainlanguagesconsideredunderTeluguLGR

3.5TheStructureofWrittenTeluguTheTeluguscriptasitisusedfortheTelugulanguageconsistsofatotalof72characters[102]comprising40consonants,16charactersrepresentingvowelsthatcanstandaloneand16dependentsigns,eachcorrespondingoneofthesixteenvowelsexcepting/a/అ;no explicit dependent symbol exists for that sound, instead it is inherent with theconsonantsintheabsenceofadependentsign. Besidesthese,therearesixadditionaldependentsymbols,ofwhichfivealwaysoccurwiththevowels,asextensions.Thesixth,the halant sign◌U+0C4D,occurswithconsonants.Thefollowingsubsectionsgivefurtherdetails.

3.5.1ThevowelsandvowelmodifiersTherearefourteenvowelcharactersviz.అ[a],ఆ[ā],ఇ[i],ఈ[ī],ఉ[u],ఊ[ū],ఋ[r],ఌ[l],ఎ[e],ఏ[ē],ఐ[ai],ఒ[o],ఓ[ō],ఔ[au],inthecommoninventory[103]forallthelanguagesusingTeluguscript[111]specifiedaboveandtwo(ౠ[r],ౡ[ḹ])towriteSanskritloanwords.Forthesevowels,therearecorrespondingfifteenmarks,exceptforఅ[a](whichisinherent).ThesearelistedinTable2below. Therearesixmodifiersforvowels:◌ఁ[~],◌ం[ṃ],◌ః[ḥ],◌[~](aspecialsymbolnotcommoninstandardTeluguwritings),ఽ[:.](theavagrahasign,commonlyusedtoindicatedoublingthevowellengthandfollowsonlylongvowels), and ◌ [H] (thehalant sign,whenappended toa consonant,deducts theinherent vowel /a/ from it). The halant sign has similar characteristic as that of asecondaryvowelsigninthatbothofthemdeletetheinherentvowel[a]whenaddedtoconsonants.R1.Inherentvoweldeletionrule:Aninherentvowelofaconsonantgetsdeletedeitherbeforeamatrasignorbeforethehalantsign.C[ca]+M[◌,◌…]|H [◌]->C[c◌,◌]|H [◌]C[ca]+M[0C3E-3F,0C40-44,0C62-63,0C46-48,0C4A-4C]|[0C4D]->C[c]M[0C3E-3F,0C40-44,0C62-63,0C46-48,0C4A-4C]|[0C4D]C=Consonant,ca=aconsonantwithaninherent‘a’,M=Secondaryvowel;

5

No. Independentvowelsprimaryallographswithcodepoints

Dependentvowelssecondaryallographswithcodepoints

1. అU+0C05 Noexplicitsignrecognizedorencoded

2. ఆU+0C06 ◌U+0C3E

3. ఇU+0C07 ◌U+0C3F

4. ఈU+0C08 ◌U+0C40

5. ఉU+0C09 ◌ుU+0C41

6. ఊU+0C0A ◌ూU+0C42

7. ఋU+0C0B ◌ృU+0C43

8. ౠU+0C60 ◌ౄU+0C44

9. ఌU+0C0F ◌U+0C62

10. ౡU+0C61 ◌U+0C63

11. ఎU+0C0E ◌U+0C46

12. ఏU+0C0F ◌U+0C47

13. ఐU+0C10 ◌U+0C48

14. ఒU+0C12 ◌U+0C4A

15. ఓU+0C13 ◌U+0C4B

16. ఔU+0C14 ◌U+0C4C

Table2:Vowelsandthecorrespondingdependentsigns

No. Modifiersigns CodePoints Commonname

1. ◌ U+0C00 Candrabindu

2. ◌ఁ U+0C01 ArdhānusvāraorArasunna

3. ◌ం U+0C02 PūrṇanusvāraorSunna

4. ◌ః U+0C03 Visarga

5. ఽ U+0C3D Avagraha

6. ◌ U+0C4D Halant

Table3:Vowelmodifiersandtheconsonantalmodifiers

6

3.5.2TheAnusvāraorsunna(◌ం-U+0C02)

TheAnusvāraorsunnarepresentsahomorganicnasalbeforethecorrespondingconsonantandasasubstitutetotranscribewordfinal/mu/.EssentiallyitsubstitutesaclusterofaNasalConsonant+Halantbeforeaconsonant.Writingalternativelywithanasalconsonant+Halant+ConsonantisrareandoftenoccurwhiletranscribingSanskritwords.Otherwisethewritingpracticewithnasalconsonant+Halant+ConsonantofthelatertypeisvirtuallyabsentinTelugu.

No. Homorganicnasal=Archiphoneme/M/

Homorganicnasal+Halant

1. లంక/laMka/ లఙN/laŋka/‘island’

2. కంO/kaMce/ కఞQR[kaɲce]‘fence’

3. పంట/paMTa/ పణV /paṇTa/‘harvest’

4. కంత/kaMta/ కనY /kanta/ ‘hole’

5. కంప/kaMpa/ కమ[/kampa/‘thornybush’

6. కంస/kaMsa/ కమ]/kansa/‘kingKansa’

7. ^ంహ/siMha/ ^మ/simha/‘lion’

Table4:HomorganicnasalandHomorganicnasal+Halant

3.5.3Nasalization:Candrabindu(◌U+0C00)orarasunna(◌ఁU+0C01)

Candrabindu,whichdenotesnasalizationoftheprecedingvowel,isusedinthePrakrittextstranscribedintheTeluguscriptandthearasunnaasinoldTelugu!ల$ఁగ&/telũgu/‘telugu’.Present-dayTeluguusersdonotusethecandrabindufrequentlyunlesstobringspecialemphasisasinhãã,hũũ,etc.

3.5.4TheConsonantsTheTeluguconsonantshaveanimplicitvowel/a/includedinthem.Asperthetraditionalclassification theyare categorizedaccording to theirphoneticproperties.Thereare5vargagroups(classes)andonenon-vargagroup.Eachvargacorrespondstoaparticularsetofstopscharacterizedbyparticularplaceofarticulation.Eachvargacontainsfouroralstopsandonenasalstoporderedbythecomplexityoftheirmannerfromlefttorightas[-vd,-asp, -nas], [-vd, +asp, -nas], [+vd, -asp, -nas], [+vd, +asp, -nas], [+vd, -asp, +nas](where,vd=voiced,asp=aspirated,nas=nasal).Eachfeaturesetdefinesthecharacterbythevarga.Eachvargafromtoptobottomaredefinedbyanadditionalplacefeatureofarticulation.Thenon-vargasetisagaindividedintotwosubsets,eachischaracterizedbyabsenceorpresenceofsonority,i.e.[+/-son].Theobstruentscharacterizedby[–son]are

7

fricatives,viz.శ[ś],ష[ṣ],స[s],హ[h],whiletheremainingcarrythefeatureofsonorityi.e.[+son].No.

PlaceofArticulation

-asp-vd-nas

ISO

+asp-vd-nas

ISO

-asp+vd-nas

ISO

+asp+vd-nas

ISO

-asp+vd+nas

ISO

1. Velar క k ఖ kh గ g ఘ gh ఙ ṅ

2. Palatal చ c ఛ ch జ j ఝ jh ఞ ñ

3. Retroflex ట ṭ ఠ ṭh డ ḍ ఢ ḍh ణ ṇ

4. Dental త t థ th ద d ధ dh న n

5. Bilabial ప p ఫ ph బ b భ bh మ m

Table5:Classificationofstopconsonants

SonorantsFricatives

య y ర r ఱ ṛ ల l ళ ḷ వ v

శ ś ష ṣ స s హ h

Table6:Non-stopconsonants

4.TheDevelopmentProcessandMethodologyTheNeo-BrahmiGenerationPanelinvolvesanumberofdifferentscriptswithdistinctUnicodeblocks.EachofthesescriptsusuallywillhaveaseparateLGR.However,acommonthreadrunsthroughtheneo-BrahmiscriptsintheprocessofLGRdevelopment.Anumberofguidingprinciplesthatarelaidoutwillbeusedinthedevelopmentofthescheme.Asspecifiedelsewhere,theNBGPadoptsthefollowingprinciplesintheselectionofcode-pointsfromthecode-pointrepertoirefortheTelugulanguagescript.Aprinciple,liketheInclusionprinciple,dealswithwhetherthecharacterisregularlyusedinthelanguage,besidesitsunambiguousnature.Thesecondimportantprinciple,theexclusionprinciple,dealswiththeuseofthecodepointrepertoireforrootzoneanddoesnotalloweverycharacterthatistabulatedintheUnicodechart.AbaselinelayerofrestrictionissetfortheDomainNameSystembytheprotocol known as IDNA (Internationalized Domain Names in Applications). IDNAexcludes some characters from the Unicode repertoire for the concerned script. Anadditionallayerisaddedfortherootzone,calledtheMaximalStartingRepertoire(MSR).Telugudoesnothavemanysuchcharactersthatarerestricted.Onesuchcharacterfor

8

exampleis,theAvagraha"ఽ"(U+0C3D),whichisrestrictedbyMSRevenifallowedby

theIDNAprotocol.Similarly, certain punctuation marks that were used in the traditional texts are notassignedanycodepointsandhencenotnecessarytobeincludedhere.Othercasessuchas symbols and abbreviations are not permitted. In addition to the above, rare andobsolete characters though recognized in the Unicode chart of Telugu will not bepermittedintherootzoneLGR.

4.1ZeroWidthJoinerandZeroWidthNon-JoinerinTeluguDomainNamesMSRexcludesinvisiblecharacterslikeZeroWidthNon-Joiner(U+200C)andZeroWidthJoiner (U+200D), as they require ad hoc representation in different ways. These arerequiredincertaincaseswhereatypicalvisualshapeofanaksharisdesired.TherearecontrastiveusagesofwrittenformsderivedfromtheuseofZeroWidthJoiner(ZWJ)andZeroWidthNon-Joiner(ZWNJ).TheyhavespecialrolesinthewritingsystemofTelugu.ZWNJisusedinsequenceslikeConsonant(C)+Halant(U+0C4D)+Consonant,wherethesecond C is prevented from taking the usual dependent allograph (vattu) form after(below)thefirstconsonant,asinthefollowingexample:

1. క(U+0C15) + ◌ (U+0C4D)+స(U+0C38) + ◌ (U+0C4D)+వ (U+0C35) +◌ (0C3E)=

z]{–withoutusingZWNJ

Example: |z]{తంత}~ం

2. క(U+0C15) + ◌ (U+0C4D) + ZWNJ (U+200C)+స(U+0C38) + ◌ (U+0C4D)+వ

(U+0C35) +◌ (0C3E)= � ��– usingtheZWNJ

Example: |� ��తంత}~ం

Bothformsofthewordsthoughwrittenwithdifferentgraphicsignsmaymeanthesameand theyarealsosameeven in theirpronunciation.Though thesecond formwasnotpreviouslycommon,itsusageisgaininggroundduetotheinfluenceofEnglishandHindi.ItisfrequentlyusedintranscribingmanyEnglishwordsintoTelugu,suchas‘software’(��V |�� ,usingZWNJ). Theword‘software’willbecome��V{� ifZWNJisnotused.

4.2HowtoAvoidDuplicateDomainNamesInvolvingZWJandZWNJ?ZWJandZWNJareusedmainlytowritetwodistinctdisplaysofthesameconsonantclusterorsequencewhichdonothaveanysemanticandphoneticsignificance.WhenZWJandZWNJsareallowedindomainnamesforTelugu,theycreatetwodistinctformsofthesamedomainname.TomakethebrowsersandDNSstotreatthemasequal,we

9

havetoignoreZWJandZWNJsforcomparingtwowords.Thesameprocedureisusuallyfollowedbythespell-checkersofthelanguage.AcceptingZWJandZWNJindomainnamescreatesconfusiontoamajorityofthelinguisticcommunityandjoinercharactersareprohibitedfortheRootZone,hencethisisexplicitlyprohibitedbytheNBGP.

10

5.TheRepertoireIn this section, we present the discussion on the code points that would form the repertoire of code points licensed by the [MSR-3] to be validated and used in the root zone label generation rules. Section5.1providesthesectionofthe[MSR-3]applicabletotheTeluguscriptonwhichtheTelugucodepointrepertoireisbased.Section5.2detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanelproposestobeincludedintheTeluguLGR.5.1 Telugu section of Maximal Starting Repertoire [MSR] Version 3

Color convention1: Allcharactersthatareincludedinthe[MSR]arehighlightedinYellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]arehighlightedinPinkishbackgroundNotPVALIDinIDNA2008areinWhitebackground

Figure2:TeluguCodePagefrom[MSR-3] 1This document needs to be printed in color for this to be read correctly.

11

5.2CodePointsRepertoireIn the following, the Telugu Script Unicode Code points have been presented anddiscussedwithreferencetothePrinciplesthatconstrainthelabelgenerationrules.ItisimportanttonotethatthepurposeofthisdocumentistostateunambiguouslytheTelugucodepointsthatcanbeusedintherootzonerepertoire.Thefollowingtablelists63codepointsfortheTeluguLGR,outofatotalnumberof67codepointslistedinMSR-3,excludingfourcodepointswhichareobsolete.

No. UnicodeCodePoint

Glyph CharacterName

EGIDSstatus

IndicSyllabicCategory

Reference

1. 0C02 ◌ం TELUGUSIGNANUSVARA

2Tel4San5Others2

ANUSVĀRA 102,103

2. 0C03 ◌ః TELUGUSIGNVISARGA

2Tel4San5Others

VISARGA 102,103

3. 0C05 అ TELUGULETTERA 2Tel5Others

Vowel 102,103

4. 0C06 ఆ TELUGULETTERAA 2Tel5Others

Vowel 102,103

5. 0C07 ఇ TELUGULETTERI 2Tel5Others

Vowel 102,103

6. 0C08 ఈ TELUGULETTERII 2Tel5Others

Vowel 102,103

7. 0C09 ఉ TELUGULETTERU 2Tel5Others

Vowel 102,103

8. 0C0A ఊ TELUGULETTERUU 2Tel5Others

Vowel 102,103

9. 0C0B ఋ TELUGULETTERVOCALICR

2Tel5Others

Vowel 102,103

100C0E ఎ TELUGULETTERE 2Tel5Others

Vowel 102,103

11.0C0F ఏ TELUGULETTEREE 2Tel5Others

Vowel 102,103

2 Others are the EGIDS 5 languages, listed in Table 1: Main languages considered under Telugu LGR

12


Glyph CharacterName

EGIDSstatus


Reference

12.0C10 ఐ TELUGULETTERAI 2Tel5Others

Vowel 102,103

13.0C12 ఒ TELUGULETTERO 2Tel5Others

Vowel 102,103

14.0C13 ఓ TELUGULETTEROO 2Tel5Others

Vowel 102,103

15.0C14 ఔ TELUGULETTERAU 2Tel5Others

Vowel 102,103

16.0C15 క TELUGULETTERKA 2Tel5Others

Consonant 102,103

17.0C16 ఖ TELUGULETTERKHA

2Tel5Others

Consonant 102,103

18.0C17 గ TELUGULETTERGA 2Tel5Others

Consonant 102,103

19.0C18 ఘ TELUGULETTERGHA

2Tel5Others

Consonant 102,103

20.0C19 ఙ TELUGULETTERNGA

2Tel5Others

Consonant,Nasal-Consonant

102,103

21.0C1A చ TELUGULETTERCA 2Tel5Others

Consonant 102,103

22.0C1B ఛ TELUGULETTERCHA

2Tel5Others

Consonant 102,103

23.0C1C జ TELUGULETTERJA 2Tel5Others

Consonant 102,103

24.0C1D ఝ TELUGULETTERJHA

2Tel5Others

Consonant 102,103

25.0C1E ఞ TELUGULETTERNYA

2Tel5Others


102,103

26.0C1F ట TELUGULETTERTTA

2Tel5Others

Consonant 102,103

13


Glyph CharacterName

EGIDSstatus


Reference

27.0C20 ఠ TELUGULETTERTTHA

2Tel5Others

Consonant 102,103

28.0C21 డ TELUGULETTERDDA

2Tel5Others

Consonant 102,103

29.0C22 ఢ TELUGULETTERDDHA

2Tel5Others

Consonant 102,103

30.0C23 ణ TELUGULETTERNNA

2Tel5Others


102,103

31.0C24 త TELUGULETTERTA 2Tel5Others

Consonant 102,103

32.0C25 థ TELUGULETTERTHA

2Tel5Others

Consonant 102,103

33.0C26 ద TELUGULETTERDA 2Tel5Others

Consonant 102,103

34.0C27 ధ TELUGULETTERDHA

2Tel5Others

Consonant 102,103

35.0C28 న TELUGULETTERNA 2Tel5Others


102,103

36.0C2A ప TELUGULETTERPA 2Tel5Others

Consonant 102,103

37.0C2B ఫ TELUGULETTERPHA

2Tel5Others

Consonant 102,103

38.0C2C బ TELUGULETTERBA 2Tel5Others

Consonant 102,103

39.0C2D భ TELUGULETTERBHA

2Tel5Others

Consonant 102,103

40.0C2E మ TELUGULETTERMA 2Tel5Others


102,103

41.0C2F య TELUGULETTERYA 2Tel5Others

Consonant 102,103

14


Glyph CharacterName

EGIDSstatus


Reference

42.0C30 ర TELUGULETTERRA 2Tel5Others

Consonant 102,103

43.0C32 ల TELUGULETTERLA 2Tel5Others

Consonant 102,103

44.0C33 ళ TELUGULETTERLLA

2Tel5Others

Consonant 102,103

45.0C35 వ TELUGULETTERVA 2Tel5Others

Consonant 102,103

46.0C36 శ TELUGULETTERSHA

2Tel5Others

Consonant 102,103

47.0C37 ష TELUGULETTERSSA

2Tel5Others

Consonant 102,103

48.0C38 స TELUGULETTERSA 2Tel5Others

Consonant 102,103

49.0C39 హ TELUGULETTERHA 2Tel5Others

Consonant 102,103

50.0C3E ◌ TELUGUVOWELSIGNAA

2Tel5Others

Matra 102,103

51.0C3F ◌ TELUGUVOWELSIGNI

2Tel5Others

Matra 102,103

52.0C40 ◌ TELUGUVOWELSIGNII

2Tel5Others

Matra 102,103

53.0C41 ◌ు TELUGUVOWELSIGNU

2Tel5Others

Matra 102,103

54.0C42 ◌ూ TELUGUVOWELSIGNUU

2Tel5Others

Matra 102,103

55.0C43 ◌ృ TELUGUVOWELSIGNVOCALICR

2Tel5Others

Matra 102,103

56.0C44 ◌ౄ TELUGUVOWELSIGNVOCALICRR

2Tel5Others

Matra 102,103

57.0C46 ◌ TELUGUVOWELSIGNE

2Tel5Others

Matra 102,103

15


Glyph CharacterName

EGIDSstatus


Reference

58.0C47 ◌ TELUGUVOWELSIGNEE

2Tel5Others

Matra 102,103

59.0C48 ◌ TELUGUVOWELSIGNAI

2Tel5Others

Matra 102,103

60.0C4A ◌ TELUGUVOWELSIGNO

2Tel5Others

Matra 102,103

61.0C4B ◌ TELUGUVOWELSIGNOO

2Tel5Others

Matra 102,103

62.0C4C ◌ TELUGUVOWELSIGNAU

2Tel5Others

Matra 102,103

63.0C4D ◌ TELUGUSIGNVIRAMA

2Tel5Others

Matra 102,103

Table7:Includedcodepoints

5.3CodePointsNotIncludedReferringtotheprincipleinsection4,thecodepointstobeexcludedfromtherepertoirearethefollowing,forthereasonslisted.Thefollowingcodepointsarenotinwidespreaduse.

• 0C00◌TELUGULETTERCANDRABINDU• 0C01◌ఁ TELUGULETTERARASUNNA

• 0C0CఌTELUGULETTERVOCALICL

• 0C31ఱTELUGULETTERRRA

Varioussigns:Allographsofvoweldiacritics/a:/andpartofadiacriticspecifictoparticularconsonant/h/.

• 0C55◌TELUGULENGTHMARK

• 0C56◌TELUGUAILENGTHMARK

Historicphoneticvariants:Phonologicalvariantsshallnotbepermitted.TheyarenotinMSR-3.

• 0C58ౘTELUGULETTERTSA

16

• 0C59ౙTELUGULETTERDZA

ThetwoadditionalvowelslistedbelowtotranscribeSanskritarenotpermitted.TheyarenotinMSR-3.

• 0C60ౠTELUGULETTERVOCALICRR

• 0C61ౡTELUGULETTERVOCALICLL

ThefollowingtwodependentvowelsusedtotranscribeSanskritsoundsarenotpermitted.TheyarenotinMSR-3.

• 0C62◌TELUGUVOWELSIGNVOCALICL

• 0C63◌TELUGUVOWELSIGNVOCALICLL

StartingfromtheMSR-3,Therearefourcodepointstobeexcluded.


Glyph

CharacterName

EGIDSstatus


Reference Note

1. 0C0C ఌ TELUGULETTERVOCALICL

2Telu5Gon6bother

Vowel 103,108,109

ItisnotusedinmodernTelugu

2. 0C31 ఱ TELUGULETTERRRA

2Telu5Gon6bother

Consonant 103,108,109


3. 0C55 ◌ TELUGULENGTHMARK

2Telu5Gon6bother

Matra 103,108,109

Itisnotavailableongeneralkeyboard.

4. 0C56 ◌ TELUGUAILENGTHMARK

2Telu5Gon6bother

Matra 103,108,109


Table8:Excludedcodepoints

6.VariantsTelugu code points representing the basic simple stand-alone characters and somedependentcharactersmayenterintodifferentcombinationstoformsyllables.TherearenocharactersintheTeluguUnicodechartthateitherinsimpleformorincombinedform

17

aredeemedsimilarbyNBGP.However,Teluguhasasmallnumberofvariantsthathaveidenticalvaluesbutderivefromdifferentcharactercombinations.TheNBGPcategorizestheseconfusinglysimilarvariantsintwogroups.

6.1Type1:SimilaritywithintheScriptCertainvowels[o,ō]displaydifferentshapesincombinationwithcertainconsonants,thoughtheyhavesharedsoundandcodepointvalues.Forexample:

i. Ca+e+u(:)->mo(:)

ii. Ca+o(:)->ko(:)

Thevariants,whichareoftenconfusingandofvariableacceptanceareduetothedisplayoftheirrenderingdifferentlyduetotheidenticalcodepoints.These cases are interesting in that they present no similarity in their forms but havesimilarphoneticoutput.ItisnotunusualtofindsuchregionalvariationsandtheyareregularlyusedbyTeluguusers.Thesemaynotcauseconfusionbutbecomeannoyingtolearners.However,◌+◌ు(U+0C46+U+0C41)ismatra+matrasequence,whichisnotallowedintheWLE rules in section 7. Therefore, these are not defined as variant sequences byNBGP.Class Characterseq.[Ca+e+u] ->Co<- Ca+o

1 [క+◌+◌ు]->

0C15+0C46+0C41(This class includes otherconsonants like, kha, ga,nga,ca,cha,ja,nya,ta,tha,da,dha,na,ta,tha,da,dha,na, pa, pha, ba, bha, ra, la,va,Sa,sha,sa,andha)

z�ు

Blocked

z� క+◌

0C15+0C4A

2

[మ+◌+◌ు]->

0C2E+0C46+0C41

� మ

Blocked

మ+◌

0C2E+0C4A

[య+◌+◌ు]->

0C2F+0C46+0C41

� య Blocked య+◌

0C2F+0C4A

[ఝ+◌+◌ు]->

0C1D+0C46+0C41

� ఝ Blocked ఝ+◌

0C1D+0C4A

18

[ఘ+◌+◌ు]->

0C18+0C46+0C41

� ఘ Blocked ఘ+◌

0C18+0C4A

Table9a:Similaritywithinthescript

6.2Type1:VariantswithinScriptduetoAlternativeSpellingSimilar to the above, there are a set of representations in Telugu syllable formationswhere a homorganic nasal (anusvāra) in a syllable has alternate spelling which isrepresentedvisuallydifferent,asshownbelow.

No. Homorganicnasal(anusvāra)+consonant

Homorganicnasalconsonant+halant+consonant

1. లంక/laMka/ లఙN/laŋka/‘island’

2. కంO/kaMce/ కఞQR[kaɲce]‘fence’

3. పంట/paMTa/ పణV /paNTa/‘harvest’

4. కంత/kaMta/ కనY /kanta/ ‘hole’

5. కంప/kaMpa/ కమ[/kampa/‘thornybush’

6. కంస/kaMsa/ కమ]/kansa/‘kingKansa’

7. ^ంహ/siMha/ ^మ/simha/‘lion’

Table9b:Variantswithanusvāraalternatingwithnasalconsonants

Writingalternativelywithanasalconsonant+halant+consonantisrareinTeluguandoftenoccurwhiletranscribingSanskritwords.Sincethevariantshaveexactlythesamepronunciation, the rarer representation of nasal consonant + halant + consonant isdisallowedinordertoavoidthesourceofconfusion.

NasalConsonantsare:1.U+0C19TELUGULETTERNGA(ఙ)2.U+0C1ETELUGULETTERNYA(ఞ)3.U+0C23TELUGULETTERNNA(ణ)4.U+0C28TELUGULETTERNA(న)5.U+0C2ETELUGULETTERMA(మ)

Similarlyandveryfrequently,thewordfinalమ&[mu]isoftenrepresentedalternativelybythevariantanusvāra◌ం[M]asinthefollowing:

కలంkalaM కలమ& kalamu ‘pen’

19

ప�సYకం pustakaM ప�సYకమ& pustakamu ‘book’

ఆమ&దం a:mudaM ఆమ&దమ& a:mudamu ‘castoroil’

�శం deSaM �శమ&deSamu ‘country’

Insuchcases,oneoftheconfusablevariantsmustbedisallowed.ThiscanbedisallowedbytheWLErule:Hcannotfollowanasalconsonant.

6.3Type2:SharedSimilaritywiththeOtherRelatedScripts.There aremanyBrahmiderived scripts particularly in theSouthern part of India, SriLanka,andSouthEastAsia.Someofthecharactersofthesescriptsdisplaysimilaritywitheachother.Suchcases,relevantforTeluguscript,aregivenbelow.

6.3.1Type2:Cross-ScriptVariantsforTeluguandKannadaAnumberofcharactersoftheKannadascriptarealmostsimilartocharactersofTeluguscript,exceptfortheflattenedhead-strokeinKannadacontrastingwithatickmarkonthetopofthecharacterinTelugu.Outofthetotal,thereare34suchcaseswhicharecategorizedasvariantsets,asshowninthefollowingtable.

VariantSet TeluguCodePoint KannadaCodePoint 1 ◌ం (0C02) ◌ಂ (0C82)

2 ◌ః (0C03) ◌ಃ (0C83)

3 అ (0C05) ಅ (0C85)

4 ఆ (0C06) ಆ (0C86)

5 ఇ (0C07) ಇ (0C87)

6 ఈ (0C08) ಈ (0C88)

7 ఐ (0C10) ಐ (0C90)

8 ఒ (0C12) ಒ (0C92)

9 ఓ (0C13) ಓ (0C93)

10 ఔ (0C14) ಔ (0C94)

11 ఖ (0C16) ಖ (0C96)

12 గ (0C17) ಗ (0C97)

13 జ (0C1C) ಜ (0C9C)

14 ఝ (0C1D) ಝ (0C9D)

20

VariantSet TeluguCodePoint KannadaCodePoint 15 ఞ (0C1E) ಞ (0C9E)

16 ట (0C1F) ಟ (0C9F)

17 ఠ (0C20) ಠ (0CA0)

18 డ (0C21) ಡ (0CA1)

19 ఢ (0C22) ಢ (0CA2)

20 ణ (0C23) ಣ (0CA3)

21 థ (0C25) ಥ (0CA5)

22 ద (0C26) ದ (0CA6)

23 ధ (0C27) ಧ (0CA7)

24 న (0C28) ನ (0CA8)

25 బ (0C2C) ಬ (0CAC)

26 భ (0C2D) ಭ (0CAD)

27 మ (0C2E) ಮ (0CAE)

28 య (0C2F) ಯ (0CAF)

29 ర (0C30) ರ (0CB0)

30 ల (0C32) ಲ (0CB2)

31 ళ (0C33) ಳ (0CB3)

32 ◌ (0C3F) ◌ (0CBF)

33 ◌ు (0C41) ◌ು (0CC1)

34 ◌ృ (0C43) ◌ೃ (0CC3)

Table10:Cross-scriptvariantcodepointsforTeluguandKannada TheTeluguandKannadavariantsetsinTable10arecross-scriptvariantcodepoints.Thedetailsofvariousaksharcombinationsandvariantdispositioncanbefoundinsection6.4Codepointswhichhavebeenanalyzedandfoundtobesimilar,butnotconsideredasvariants,arelistedinAppendixA.

21

6.3.2Type2:Cross-ScriptVariantsforTeluguandDevanagariVisargaistheonlyidenticalcodepointthatexhibitsshapesimilaritybetweentheTeluguandDevanagariscripts.However,astherearenoothervariantcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.

DevanagariCodePoint TeluguCodePoint

◌ः (0903) ◌ః (0C03)Table11:Candidatecross-scriptvariantcodepointforTeluguandDevanagari

6.3.3Type2:Cross-ScriptVariantsforTeluguandGujaratiVisargaistheonlyidenticalcodepointthatexhibitsshapesimilaritybetweentheTeluguandGujaratiscripts.However,astherearenootheridenticalcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.

GujaratiCodePoint Telugu CodePoint

◌ઃ (0A83) ◌ః (0C03)

Table12:Candidatecross-scriptvariantcodepointforTeluguandGujarati

6.3.4Type2Cross-ScriptVariantsforTeluguandOriya ThefollowingcodepointsexhibitsimilaritybetweentheTeluguandOriyascripts.

TeluguCodePoint Oriya CodePointం (0C02)ANUSVĀRA

ଠ (0B20)LETTERTTHA

ః (0C03)SIGNVISARGA

ଃ (0B03)SIGNVISARGA

ర (0C30)LETTERRA

ଠ (0B20)LETTERTTHA

Table13:Candidatecross-scriptvariantcodepointsforTeluguandOriyaThefirsttwo(U+0C02–U+0B20andU+0C03–U+0B03)aredependentsignsandU+0C30isastand-alonecharacterinTelugu.NBGPdiscussionsconcludedthatthereisnoneedtorecognizethecross-scriptvariantcodepointsbetweentheOriyaandtheTeluguscripts.ThisisbecauseU+0C30andU+0B20aredistinguishableandtherearenotenoughothervariantcodepointsineachscripttoformlabelsthatlookthesame.Therefore,thesearenotdefinedasvariantcodepoints.

6.3.5Type2:Cross-ScriptVariantsforTeluguandMalayalamThetwocodepoints,viz.theanusvāraandthevisargaaretheonlyidenticalsignsbetweentheTeluguandMalayalamscripts.However,astherearenotenoughother

22

variantcodepointstoformlabels,theyarenotdefinedasvariantcodepointsbetweenthetwolanguages.

TeluguCodePoint Malayalam CodePoint

◌ం (0C02) ം (0D02)

◌ః (0C03) ഃ (0D03)

Table14:Candidatecross-scriptvariantcodepointsforTeluguandMalayalam

6.3.6Type2:Cross-ScriptVariantsforTeluguandSinhalaThefollowingthreepairsofcharactersrepresentedbythecorrespondingcodepointsbetweentheTeluguandSinhalawhichmaybeconsideredashavingonlysimilarityifthesimilaritybetween0C30and0DBBisnotsustainable.HoweverNBGP,inconsultationwithSinhala,concludesthat0C30and0DBBcouldcauseconfusionfromthescriptuserpointofview.Therefore,theyareproposesascrossscriptvariantsbetweenthetwoscriptsandthedispositionisblocked.”ThisanalysisfollowstheNBGPCross-scriptVariantinclusionpolicyavailableinAppendixC.

TeluguCodePoint Sinhala CodePoint

◌ం (0C02) ං (0D82)

◌ః (0C03) ඃ (0D83)

ర (0C30) ර (0DBB)

Table15:Cross-scriptvariantcodepointsforTeluguandSinhala

6.4CrossScriptVariantsofVariousAksharCombinations6.4.1ConjunctConsonantCombinationsCrossscriptvariantsofvariousAksharcombinations(consonant-consonant-dependentcharacters)commonbetweentheTeluguandKannadascriptsincludethefollowing:

VariantSet TeluguCodePoint KannadaCodePoint 1 ◌ం (0C02) ◌ಂ (0C82)

2 ◌ః (0C03) ◌ಃ (0C83)

3 ఖ (0C16) ಖ (0C96)

4 గ (0C17) ಗ (0C97)

5 జ (0C1C) ಜ (0C9C)

6 ఝ (0C1D) ಝ (0C9D)

23

VariantSet TeluguCodePoint KannadaCodePoint 7 ఞ (0C1E) ಞ (0C9E)

8 ట (0C1F) ಟ (0C9F)

9 ఠ (0C20) ಠ (0CA0)

10 డ (0C21) ಡ (0CA1)

11 ఢ (0C22) ಢ (0CA2)

12 ణ (0C23) ಣ (0CA3)

13 థ (0C25) ಥ (0CA5)

14 ద (0C26) ದ (0CA6)

15 ధ (0C27) ಧ (0CA7)

16 న (0C28) ನ (0CA8)

17 బ (0C2C) ಬ (0CAC)

18 భ (0C2D) ಭ (0CAD)

19 మ (0C2E) ಮ (0CAE)

20 య (0C2F) ಯ (0CAF)

21 ర (0C30) ರ (0CB0)

22 ల (0C32) ಲ (0CB2)

23 ళ (0C33) ಳ (0CB3)

24 ◌ (0C3F) ◌ (0CBF)

25 ◌ు (0C41) ◌ು (0CC1)

26 ◌ృ (0C43) ◌ೃ (0CC3)

Table16:Cross-scriptvariantsbetweenTeluguandKannadaforconjunctconsonantcombinationanalysis

Table16includes26distinctTelugucodepointsthatoccurintheformationofconjunctconsonantcombinationsinTeluguandKannada.ExcludingthestandalonevowelsfromthetotalcommonAksharcombinationsofcrossscriptvariants,thereareasetof21consonants(C),threevowelmatras(M)andtwovowelmodifiersthatenterintotheformationofthefollowingcombinations:

24

Sl.No.

Aksharcombinations Number

1. CM =21*3=632. CB =21*1=213. CX =21*1=214. CHCM =21*21*3=13235. CHCB =21*21*1=4416. CHCX =21*21*1=4417. CHCMB =21*21*3*1=13238. CHCMX =21*21*3*1=13239. Allcombinations: =4956

Table-17totalnumberofAksharcombinations

Thereoccursatotalof4956conjunctconsonantcombinationsmodifiedbymatrasandvowelmodifiersthatareidenticalandcanbelabeledforvariantlabelsbetweenTeluguand Kannada scripts. These combinations are covered by the variant code points inSection6,Table10andTable15.

6.4.2OtherCombinations

NBGPcreatesthepossiblecombinationsofTelugucodepointsandcrosscheckwithotherNeo-Brahmiscriptsforcandidatevariants.Thepossiblecombinationsare:

1.CHCMB,CHCMX2.CHCM,CHCB,CHCX3.VB,VX,V4.CHC,CM,CB,CX,C

Where,

C → ConsonantM → MatraV → VowelB → Anusvāra(Bindu)X → Visarga H → Halant/Virama

NBGPconcludesthatbesidethoseidenticalcodepointsdefinedasvariantsinSection6,Table 10 and Table 15, there are no other variant code points between Telugucombinationsandotherscriptscodepointsorcodepointcombinations.6.5Variantdisposition

As variantsmentioned in Section 6, Table 10 and Table 15 can result inwhole labelvariants,theymaybeconsideredfor"blocked"disposition.Thereisnopreferenceamongthesevariants.Whicheverlabelcontainingeitherofthesevariantsischosenearlier,theotherequivalentvariantlabelshouldbeblocked.

25

7.WholeLabelEvaluationRules(WLE)InthissectionweprovidetheWLEsthatarerequiredbythelanguage.Anumberofrules have been formulated so that they can be adopted for LGR specification.BelowarethesymbolsusedintheWLErules,foreachofthe"IndicSyllabicCategory"asmentionedintheTable7:Codepointrepertoireandthedetailsofsyllableformation,seeAppendixB.

C → ConsonantM → MatraV → VowelB → Anusvāra(Bindu)X → Visarga H → Halant/Virama Nasal-C → NasalConsonant

Rule1. HmustbeprecededbyC(Ref.AppendixB:SyllableformationRule4)Rule2. MmustbeprecededbyC(Ref.AppendixB:SyllableformationRule6)Rule3. XmustbeprecededbyVorMorC(Ref.AppendixB:syllableformationrule3c,

5cand7c)Rule4. BmustbeprecededbyVorMorC(Ref.AppendixB:syllableformationrule3b,

5band7b)Rule5. HcannotfollowNasal-C(Ref.Section6.2Type1)Rule6. VcannotbeprecededbyHForRule6,therecouldbecasesinvolvingmulti-worddomainswhereVmayneedtobeallowedtofollowanH.ThisisthecasewheretwodifferentwordsarejoinedtogetherbutfirstofwhichendswithaHalantandthesecondwordbeginswithaVowel.SomesectionsofthelinguisticusagerequiretheexplicitpresenceofHforfullrepresentationofthesoundintended.However,byandlarge,theformofthefirstwordwithouttheHisconsideredenoughforfullrepresentationofthesoundintended as in the following examples: Example:

‘houseofknowledge’:�� అ�ఉల��da:rHalHulu:mH/�� అల$ల��da:rHalulu:mH‘TheQor’an’:ఖు� ఆ� KhurHa:nH/ఖు�� Khura:nH‘inTelanganaRashtraSamiti’:ట�ఆ� ఎ� ల�ti:a:rHesHlo/ట�ఆ�� ti:a:rHesHlo‘Y.S.R.C.party’:|¡ఎ� ఆ� ¢vaiesHa:rHsi:pగ/|¡ఎ��£]¢vaiesHa:rHsi:pi ‘BritishIndia’:¤}ట¥¦ఇం§య©bHritiShHiMdiya /¤}ట¥ªం§య©britiShiMdiya

TherepresentationswheretherearecaseswithVprecededbyHagainstwhereVisnotprecededbyH,thelatterisawkwardandtheformerisindemandinmodernusage.

Thisisauniquesituationnecessitatedbythelackofhyphen,spaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire.Otherwise,VisneverrequiredtobeallowedtofollowanH.However,permittingthis

26

maycreateaperceptuallydissimilarbutphoneticallyandsemanticallysimilaritybetweenthetwolabels(withandwithoutH)formajorityofthelinguisticcommunity,hencethisisexplicitlyprohibitedbytheNBGP.8.ContributorsGangadharPandayUmaMaheshwaraRao,G.NBGPmembers

9.References[MSR-3] IntegrationPanel,"MaximalStartingRepertoire—MSR-3Overviewand

Rationale",28March2018https://www.icann.org/sites/default/files/packages/lgr/msr/msr-3-wle-rules-28mar18-en.html

[101] Disanayaka,J.B.2017.EncyclopediaofSinhalaLanguageandCulture.Colombo:SumithaPublishers.Firstedition2012.

[102] Krishnamurti,Bhadriraju,Ed.,2000.Telugubhaashaacharitra.Hyderabad:P.S.TeluguUniversity.Firstedition1974.

[103] Krishnamurti,BhadrirajuandJPLGwynn.1985.AGrammarofModernTelugu.NewDelhi:OxfordUniversityPress.ISBN978-0-19-561664-4.Delhi.

[104] Sarma,I.K.1980.CoinageofSatavahanaEmpire.Delhi:AgamKalaPrakashan,

[105] Sridhar,S.N.1980.Kannada.NewYork:Routledge.

[106] Suresh,Kolichala.2012.ProposaltoencodeTeluguLLLA,Teluguೞ:http://eemaata.com/unicode-proposal/telugu-llla-proposal.pdf.Accessedon9July2018.

[107] Suresh,Kolichala.2012.Divergentdevelopmentsofalveolarstop*ṯinTeluguhttp://kolichala.com/dravidian/Divergent_developments_of_alveolar_stop_in_Telugu.pdf.Accessedon9July2018.

[108] TeluguUnicodeChart,TeluguRange:0C00–0C7F.TheUnicodeStandard,Version10.0.http://www.unicode.org/Public/10.0.0/charts.Accessedon9July2018.

[109] UmaMaheshwaraRao,G.2012.Telugubhaasha-saMgaNanaM.Hyderabad:P.S.TeluguUniversity.ISBN:81-86073-372-9.

[110] UmaMaheshwaraRao,G.2003.StandardTeluguWrittenLanguage.VIDYULLIPI-4.pp.1-14.Hyderabad:SCIL.

[111] UshaDevi,A.andChandraSekharaReddy.D.2015.PeoplesLinguisticSurveyof India.AndhraPradeshandTelanganarAshtraalabhaashalu,vol.3,part1.ISBN:978-93-85231-05-6.Hyderabad:emesco.

27

AppendixA:ConfusableCodePointsAnalysis

A-1.TeluguandKannadaThefollowingtabledefinesTeluguandKannadacodepointswhichareconfusable.

No.

Telugu Kannada

CP Glyph CP Glyph

1 0C35 వ 0CB5 ವ

2 0C36 శ 0CB6 ಶ

3 0C38 స 0CB8 ಸ

TableA-1:ConfusablecodepointsofTeluguandKannadascript The following table lists other code points which have been analyzed and concluded that they are distinguishable.

No.

Telugu Kannada NBGPresolution

CP Glyph CP Glyph

1 0C0E ఎ 0C8E ಎ distinguishable

2 0C18 ఘ 0C98 ಘ distinguishable

3 0C19 ఙ 0C99 ಙ distinguishable

4 0C1A చ 0C9A ಚ distinguishable

5 0C1B ఛ 0C9B ಛ distinguishable

6 0C2A ప 0CAA ಪ distinguishable

7 0C2B ఫ 0CAB ಫ distinguishable

8 0C37 ష 0CB7 ಷ distinguishable

9 0C4C ◌ 0CCC ◌ distinguishable

TableA-2:OtherNBGPresolutionsonTeluguandKannadascript

A-2.TeluguandMalayalamBesidethose identicalcodepointsdefinedasvariants inSection6, therearenoothersimilarcodepointsbetweenTeluguandMalayalam.

A-3.TeluguandSinhalaBesidethose identicalcodepointsdefinedasvariants inSection6, therearenoothersimilarcodepointsbetweenTeluguandSinhala.

28

AppendixB:SyllableformationintheTeluguScriptTheTeluguscriptgrammarallowsustostate thenatureandstructureof thegraphicsyllables in the formation ofwords. The extended notion of syllable is often used tocharacterize orthographies of South-Asian scripts especially Brahmi derived scriptswhere words are composed of sequences of one or more orthographic aksharas orsyllables.Theseaksharasareagaincomposedofsequencesofcertaincharactersfromthealphabet.TheTelugualphabethasthe followingtypesofcharacters(encoded intotheUnicode)thateitherontheirownorbyenteringlargercombinationsformaksharasasshownhere.Thereare12differenttypesofsyllablespossibleinTelugu:ThefollowingVariablesareinvolvedintheformationofsyllable[$]:

• C=Consonants, that arestandalonecharactersorgraphemeswithan inherentvowel`a’canfunctionassyllables;

Stops:క ఖ గ ఘ ఙ చ ఛ జ ఝ ఞ ట ఠ డ ఢ ణ త థ ద ధ నప ఫ బ భ మ;Fricatives:శ ష స హSonorants:య ర ఱ ల ళ వ

• V=Vowels,thatstandaloneandrepresentedbythegraphicsignsofthefollowingmayfunctionassyllables;

అ ఆ ఇ ఈ ఉ ఊ ఎ ఏ ఐ ఒ ఓ ఔ ఋ

• M = Matras or the dependent vowel signs when occurwith a consonant mayfunction as syllables (characteristically delete the inherent vowel of theconsonant);

Example.z z« z¬ క$ క� z� z z�® z� z¯ z°;etc.

• H=Halantorvirama= ◌;ItmayoccurwithoneoftheconsonantsrepresentedbyCtoformCHsyllables;

Example.� ± ² ³ ´

• B=Pūrṇānusvāra,thehomorganicnasalandanArchiphoneme= ◌ం,mayoccurwithoneoftheC,V,andthecombinedCMtoformCB,CMB,VB,andC([HC]*)B

• • X= visarga or the glottal check= ◌ః, may occur with one of the C, V, and the

combinedCMtoformCX,CMX,VXThe operators used: The following four operators are employed to define thedelimitationofthegraphicsyllablesinTelugu.

29

No. Symbol Function;

1. | Alternative;

2. [] enclosesoptionalelements;

3. * Variableoccurrence;

4. () Thesequencecluster;

TableB-1symbolsandfunctionsAnAksharainTelugucanbedefinedasanyCorVandacombinationofM(dependentvowels),andthevowelmodifiersasinthefollowing:ThefollowingsyllableformationrulesderiveallpossiblegraphicsyllablesinTelugu.1.Thesyllableformationrule-1,a$=V;Everystandalonevowelcharactercanfunctionasasyllable,Ex.

అ,ఆ,ఇ,ఈ,ఉ,ఊ,ఎ,ఏ,ఐ,ఒ,ఓ,ఔ,ఋ;Aftertheexclusionofobsoletevowels13syllablesarepossible.2.Thesyllableformationrule-2,a$=C;Everystandaloneconsonantcharactercanfunctionasasyllable,Ex.

క ఖ గ ఘ ఙ, చ ఛ జ ఝ ఞ,

ట ఠ డ ఢ ణ, త థ ద ధ న, ప ఫ బ భ మ, య ర ఱ ల ళ వ,

శ ష స హ;Thereare35suchsyllablesarepossible.3.Syllableformationrule-3,$=VB|X;Example:

3a=V+B=$;అం ఆం ఇం ఈం ఉం ఊం ఎం ఏం ఐం ఒం ఓం ఔం;3b=V+X=$;అః ఆః ఇః ఈః ఉః ఊః ఎః ఏః ఐః ఒః ఓః ఔః;

IncombinationwithVandoneofthetwoBorX,atotal36syllablesarepossible.SyllablecombinationswithvocalicRarenotused.4.Syllableformationrule-4,a$=CH;AstandaloneconsonantmaybeappendedbythehalantmarkerHtoformthecorrespondinggraphicsyllablesasshownhere.

30

Example:� ± ² ³ ´µ ¶ · ¸ ¹º » ¼ ½ ¾¿ À Á Â �Ã Ä Å Æ �Ç � È � É ÊË ¦ � Ì

Thereare35suchgraphicsyllablesarepossible.5.Syllableformationrule-5,$=CB|X;Ex.Standaloneconsonantscantakeoneofthethreevowelmodifiersandformthecorrespondingsyllablesasshownbelow:Example:

5a.$=CB:కం ఖం గం ఘం ఙం చం ఛం జం ఝం ఞం టం ఠం etc.5b.$=CX:కః ఖః గః ఘః ఙః చః ఛః జః ఝః ఞః టః ఠః etc.

Thereare2*35=70graphicconsonantmodifiersyllablesarepossible.

6.Syllableformationrule-6,$=CM;Aconsonantmaygetattachedwithavowelmodifierorthedependentvoweldiacritictoformthecorrespondingsyllables;Example:

z z« z¬ క$ క� కృ క z� z z�® z� z¯ z°;etc.Atotalof35*13consonant+voweldiacriticcombinationsmayderive455graphicsyllablesinTelugu.

7.Syllableformationrule-7,$=CMB|X;Aconsonantwithadependentvowelwhenfollowedbyoneofthethreemodifiersmayderivethefollowinggraphicsyllables;Example:

7a.zం z«ం z¬ం క$ం క�ం z�ం zం z�®ం z�ం z¯ం z°ం7b.zః z«ః z¬ః క$ః క�ః z�ః zః z�®ః z�ః z¯ః z°ః

Atotalof35*12*2consonantplusadependentvowelandoneofthethreemodifiersderive840possiblegraphicsyllablesinTelugu.

8.Syllableformationrule-8,$=CH[(C)*C];Anyconsonantfollowedbythehalantmarkermaycombinewithanotherconsonantorconsonantstoformcomplexgraphicsyllables;Example:

2consonantclusters:ÍÎ గÏ ,ÐÑ ,ఙÒ ,చR,ఛÓ,జÔ ,ÕÖ ,ఞ× ,టV ,ఠØ ,డÙ ,ÚÛ ,ణÜ ,etc.

3consonantclusters:రÝÞ,షV ß,సY à,నá ß,ఙÑ ß షâã,త}~,త]ä etc.

4consonantclusters:త]ä~ ;

31

Atotalof35*1*35=1225CHCsyllablesinvolvingtwoconsonantclustersarepossible;Further, a total of 35*1*35*1*35 =42,875 CHCHC syllables involving three consonantclustersarepossible;Thoughfourconsonantclustersareextremelyrarebuttheoreticallypossibleasshownabove.9.Syllableformationrule-9,$=CH(CH[CH])CM;Anyconsonantfollowedbythehalantmarkerandaconsonantorconsonantsmaybeappendedbyoneofthedependentvowelstoformcomplexgraphicsyllablesinvolvingtwotothreeconsonantclusters;Example:

క$N åÎ æçÏ ,èéÑ ,ఙêÒ ,O�R,ఛూÓ,జÔ ,ëìÖ ,ఞí× ,ట�V ,ఠîØ ,§�Ù ,ÚూÛ ,ణ�Ü ,etc.

రïÝÞ ,ªV ß,^Y à,ðá ß,ఙ¥Ñ ß ñ âã,!�} ~,తò]äetc.

!�]ä~

Atotalof35*1*35*1*12=14,700complexsyllablesinvolvingtwoconsonantclustersfollowedbydependentvowelsarepossible.

Atotalof35*1*35*1*35*12=5,14,500complexsyllablesinvolvingthreeconsonantclustersfollowedbydependentvowelsarepossible.ThefollowingisasummaryofpossiblesyllabletypeswiththeglyphsinTelugu:

$= V([B|X])|CM([B|X])|CH(CH[C])M([B|X])Asperourdefinitionthefollowing21subtypesofgraphicsyllablesarepossiblewhichhowevercanbegroupedunder8rulesasdiscussedabove.

$= V|VB|VX|C|CB|CX|CM|CH|CHC|CHCB|CHCX|CHCMreCHCH|CHCHC|CHCHCB|CHCHCX|CHCHCM

Therefore,typologically8distincttypesofgraphicsyllablescanbederivedinthelanguage.

32

AppendixC:NBGPCross-scriptVariantInclusionPolicy

If, inanytwogivenscripts,allthepotentialcross-scriptvariantsconsistofdependent(e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu etc.) charactersONLY, then thatentiresetcanbeignoredandnocross-scriptvariantsbeproposedbetweenthosetwoscripts.

If,inanytwogivenscripts,thereisATLEASTONEnon-dependent(e.g.Consonant,Voweletc.)cross-scriptvariantcharacter/sequencepresent,allthepotentialcross-scriptvariantsbeconsideredandproposedbetweenthetwoscripts.Thiscross-scriptanalysishasbeenrestrictedtothescriptsthathavedescendedfromtheBrahmiasmostofthemsharesimilarusagepatterns.Byandlarge,allofthesescriptshaveacommonsetofcharactersthatexistedinBrahmiscriptandbearthesameidentities.However,asthescriptsbranchedoutfromtheBrahmi,dependingonvariousfactors,theshapesofthecharacterschanged.Thischangeintheshapewasnotuniformacrossallthecharactersandthescripts.Somecharactersshapesdidchangesignificantlywhereassomeofthemstillretainedsimilarity.Thecross-scriptsimilarityanalysisalsoaimstoidentifysuchcaseswherethesamecharacterretainedalmostthesameshapedespitebeingpartofthedifferentscripts.Thesesetofcharactersarevariantsofeachotherintruesensethanmerelyofco-incidentalvisualsimilarity.Since,havingsuchlabelsisarealisticpossibilityandthecorrespondinglabelslookalmostexactlyalike,NBGPhasproposedthemasblockedvariants.

NBGPacknowledgestheconcernthatthisshapeisquitegenericandmayhaveparallelsinotherscriptsnotunderitsambit.However,asNBGPdoesnothaveanyexposureaboutactualusageofthosecharactersinthoseparticularscripts,NBGPdesistedfromincludingthemintheanalysis.AsNBGPhasalreadyconsideredalltherelatedscriptsunderthecross-scriptvariantanalysis,thesimilarityofthecharactersbelongingtoNBGPscriptswith other scripts not under the NBGP ambit,may be of amere co-incidental visualnature.

Additionally,thisconcernisnotlimitedtothesetwocharactersbutforallthecharactersinallthescriptsunderthescopeoftheRootLGRprocedure.Carryingoutthisanalysiscan practically be done onlywith theGeneration Panels that existwhile theNBGP isactive.ThisstillleavesoutthosescriptsoutofthescopewhichmaynothaveaGenerationPanelestablishedyet.Hence,carryingoutthisexerciseinentiretyisquiteimpracticable.Thisconundrumcanberesolvedifallthesuchcasesarehandledbythe"StringSimilarityAssessmentPanel"ofICANN.

Documents

Proposal for a Telugu Script Root Zone Label Generation ...The Telugu language is a Dravidian language spoken by about 75 million (ca. 2001) people mainly in the southern Indian states