Upload
others
View
31
Download
1
Embed Size (px)
Citation preview
1
ProposalforaTeluguScriptRootZoneLabelGenerationRuleset(LGR)
LGRVersion:3.0Date:2018-08-08Documentversion:2.6Authors:Neo-BrahmiGenerationPanel[NBGP]
1. GeneralInformation/Overview/AbstractThisdocumentlaysdowntheLabelGenerationRuleSetfortheTeluguscript.Threemaincomponentsof theTeluguScriptLGR, viz. Code point repertoire, Variants andWholeLabelEvaluationRuleshavebeendescribed indetailhere.All thesecomponentshavebeen incorporated in a machine-readable format in the accompanying XML file:"Proposal-LGR-Telu-20180808.xml".
Inaddition,alistoftestlabelshasbeenprovidedinthefollowingfile,whichcoverstherepertoire,variantcodepointsandthewholelabelevaluationrules,providingexamplesforvalidandinvalidlabels:“telugu-test-labels-20180808.txt”.
2. ScriptforwhichtheLGRisproposedISO15924Code:TeluISO15924KeyN°:340ISO15924EnglishName:TeluguLatintransliterationofnativescriptname:telɯgɯNativenameofthescript:!ల$గ&MaximalStartingRepertoire[MSR]version:3TheUnicodeStandard,Version:6.3TeluguUnicodeRange:0C00–0C7F
3. BackgroundoftheScriptandPrincipalLanguagesUsingItTheTelugulanguageusestheTeluguscriptwhichiswrittenintheformofsequencesoforthographic syllables. Each orthographic syllable is formed of one or more Telugucharactersplacedfromlefttorightandtoptobottom.Teluguisoneofthe22scheduledlanguages of India. The Telugu script is immediately related to Kannada and closelyrelatedtotheSinhalascript.
2
3.1TheEvolutionoftheScriptTheoriginsoftheTeluguscriptcanbetracedtotheBrahmialphabetofancientIndia,often known as Asokan Brahmi. Historically the script is derived from the SouthernBrahmiorBhattiproluBrahmialternativelyknownastheTeluguBrahmialphabetof3rdcentury BCE. Later, by 5th century during the Chalukyan period, it developed into acommonalphabetusedforTeluguandKannada.TheTelugu-Kannadacommonalphabetsplitintotwoseparatealphabetsduringthe12thand13thcenturiesADtobecalledtheTeluguandKannadascripts.Inadditiontothecommonorigin,alongerperiodofsharedpolitical and cultural confederation of the Telugu and Kannada speaking regions hasultimatelyresultedintheconsiderableproportionofthesharedidenticalcharactersignsbetweenthetwoscripts(34outof63characters,seeTable10).TheearliestknowninscriptionscontainingTeluguwordsappearonthebilingualcoinsofSatavahanas that date back to 2nd centuryAD [104]. The first inscription entirely inTeluguwasmadein575ADandwasprobablymadebyRenatiCholas,whostartedwritingroyalproclamations inTelugu insteadof Sanskrit.Telugudevelopedasapoetical andliterarylanguageduringthe11thcenturyAD.Untilthe20thcenturyTeluguwaswritteninGranthicstyleverydifferentfromthecolloquiallanguage.Duringthesecondhalfofthe20thcentury,amodernwrittenstyleemergedbasedonthemoderncolloquiallanguage.In2008TeluguwasdesignatedasaclassicallanguagebytheIndiangovernment.
Figure1:EvolutionofTeluguscript
3.2NotableFeaturesTheTeluguorthographysuperficiallyappearsasaseriesofcirclesandsemi-circles.MostconsonantscarryatickmarkcalledTalakattu.Thewritingsystemisclassifiedasabugidatype that employs alpha-syllabaries. The alphabet consistsof vowels, consonants andmodifiers.Eachofthesevowelsandconsonantshasoneormoresecondaryallographs.Thesecondaryallographsalwaysappearasdependentsymbolsonthefirstcharacterofasyllable.Eachsyllableisformedofasinglestandalonevoweloroneormoreconsonants.Eachoftheseconsonantsmayoccurwithaninherentvowelormodifiedbyasecondaryvowel.AConsonantclustermaybeformedwithasinglestandalonecharacterfollowed
3
byoneormoresecondaryformsofconsonants.Theorderofcompositionofsyllabariesdoesnotmatchwith the readingorder.Thereare rules to learn to readorthographicsequencesintophoneticsequenceswhethersimpleorcomplexsyllables.
3.3TheTelugu(!ల$గ&)Language
The Telugu language is a Dravidian language spoken by about 75million (ca. 2001)peoplemainlyinthesouthernIndianstatesofAndhraPradeshandTelanganawhereitisthe official language. It is also spoken in such neighboring states asKarnataka, TamilNadu,Orissa,MaharashtraandChattisgarh,andisoneofthe22scheduledlanguagesofIndia. There are also quite a few Telugu speakers in Canada, the USA, South Africa,Malaysia,Mauritius,Myanmar,SriLankaandRéunion
3.4LanguagesthatUsetheTeluguScriptThescriptisalsousedfortenotherlanguages,viz.Gondi,Koya,Konda,Kuvi,KolavarorKolami,Yerukala,BanjaraorLambadi,SavaraorSora,AdivasiOdiyaandalsoSanskrit.IntheTeluguspeakingregion,thetraditionofwritingSanskritintheTeluguscripthasremained a commonpractice. During the last fewdecades, a considerable number ofpublicationsintheformoftextbooks,dictionariesandotherreadingmaterialhasbeenproduced in theTeluguscript inGondi,Koya,Konda,Kuvi,Kolami,Yerukala,Banjara,SavaraandAdivasiOdiya.
no. Nameofthelanguage(ISO639Code)
Languagefamily
Status EGIDSScale
1 Telugu(tel) Dravidian ScheduledandClassical
2
2 Gondi(gon) Dravidian ModernTribal 5
3 Koya(kff) Dravidian ModernTribal 5
4 Konda(knd) Dravidian ModernTribal 6b
5 Kuvi(kxv) Dravidian ModernTribal 5
6 KolavarorKolami(kfb) Dravidian ModernTribal 5
7 Yerukala(yeu) Dravidian ModernTribal 6
8 BanjaraorLambadi(lmn) Indo-Aryan ModernTribal 5
9 SavaraorSora(srb) Austro-Asiatic
ModernTribal 5
10 AdivasiOdiya(ort) Indo-Aryan ModernTribal 5
4
no. Nameofthelanguage(ISO639Code)
Languagefamily
Status EGIDSScale
11 Sanskrit(san) Indo-Aryan ScheduledandClassical
4
Table1:MainlanguagesconsideredunderTeluguLGR
3.5TheStructureofWrittenTeluguTheTeluguscriptasitisusedfortheTelugulanguageconsistsofatotalof72characters[102]comprising40consonants,16charactersrepresentingvowelsthatcanstandaloneand16dependentsigns,eachcorrespondingoneofthesixteenvowelsexcepting/a/అ;no explicit dependent symbol exists for that sound, instead it is inherent with theconsonantsintheabsenceofadependentsign. Besidesthese,therearesixadditionaldependentsymbols,ofwhichfivealwaysoccurwiththevowels,asextensions.Thesixth,the halant sign◌U+0C4D,occurswithconsonants.Thefollowingsubsectionsgivefurtherdetails.
3.5.1ThevowelsandvowelmodifiersTherearefourteenvowelcharactersviz.అ[a],ఆ[ā],ఇ[i],ఈ[ī],ఉ[u],ఊ[ū],ఋ[r],ఌ[l],ఎ[e],ఏ[ē],ఐ[ai],ఒ[o],ఓ[ō],ఔ[au],inthecommoninventory[103]forallthelanguagesusingTeluguscript[111]specifiedaboveandtwo(ౠ[r],ౡ[ḹ])towriteSanskritloanwords.Forthesevowels,therearecorrespondingfifteenmarks,exceptforఅ[a](whichisinherent).ThesearelistedinTable2below. Therearesixmodifiersforvowels:◌ఁ[~],◌ం[ṃ],◌ః[ḥ],◌[~](aspecialsymbolnotcommoninstandardTeluguwritings),ఽ[:.](theavagrahasign,commonlyusedtoindicatedoublingthevowellengthandfollowsonlylongvowels), and ◌ [H] (thehalant sign,whenappended toa consonant,deducts theinherent vowel /a/ from it). The halant sign has similar characteristic as that of asecondaryvowelsigninthatbothofthemdeletetheinherentvowel[a]whenaddedtoconsonants.R1.Inherentvoweldeletionrule:Aninherentvowelofaconsonantgetsdeletedeitherbeforeamatrasignorbeforethehalantsign.C[ca]+M[◌,◌…]|H [◌]->C[c◌,◌]|H [◌]C[ca]+M[0C3E-3F,0C40-44,0C62-63,0C46-48,0C4A-4C]|[0C4D]->C[c]M[0C3E-3F,0C40-44,0C62-63,0C46-48,0C4A-4C]|[0C4D]C=Consonant,ca=aconsonantwithaninherent‘a’,M=Secondaryvowel;
5
No. Independentvowelsprimaryallographswithcodepoints
Dependentvowelssecondaryallographswithcodepoints
1. అU+0C05 Noexplicitsignrecognizedorencoded
2. ఆU+0C06 ◌U+0C3E
3. ఇU+0C07 ◌U+0C3F
4. ఈU+0C08 ◌U+0C40
5. ఉU+0C09 ◌ుU+0C41
6. ఊU+0C0A ◌ూU+0C42
7. ఋU+0C0B ◌ృU+0C43
8. ౠU+0C60 ◌ౄU+0C44
9. ఌU+0C0F ◌U+0C62
10. ౡU+0C61 ◌U+0C63
11. ఎU+0C0E ◌U+0C46
12. ఏU+0C0F ◌U+0C47
13. ఐU+0C10 ◌U+0C48
14. ఒU+0C12 ◌U+0C4A
15. ఓU+0C13 ◌U+0C4B
16. ఔU+0C14 ◌U+0C4C
Table2:Vowelsandthecorrespondingdependentsigns
No. Modifiersigns CodePoints Commonname
1. ◌ U+0C00 Candrabindu
2. ◌ఁ U+0C01 ArdhānusvāraorArasunna
3. ◌ం U+0C02 PūrṇanusvāraorSunna
4. ◌ః U+0C03 Visarga
5. ఽ U+0C3D Avagraha
6. ◌ U+0C4D Halant
Table3:Vowelmodifiersandtheconsonantalmodifiers
6
3.5.2TheAnusvāraorsunna(◌ం-U+0C02)
TheAnusvāraorsunnarepresentsahomorganicnasalbeforethecorrespondingconsonantandasasubstitutetotranscribewordfinal/mu/.EssentiallyitsubstitutesaclusterofaNasalConsonant+Halantbeforeaconsonant.Writingalternativelywithanasalconsonant+Halant+ConsonantisrareandoftenoccurwhiletranscribingSanskritwords.Otherwisethewritingpracticewithnasalconsonant+Halant+ConsonantofthelatertypeisvirtuallyabsentinTelugu.
No. Homorganicnasal=Archiphoneme/M/
Homorganicnasal+Halant
1. లంక/laMka/ లఙN/laŋka/‘island’
2. కంO/kaMce/ కఞQR[kaɲce]‘fence’
3. పంట/paMTa/ పణV /paṇTa/‘harvest’
4. కంత/kaMta/ కనY /kanta/ ‘hole’
5. కంప/kaMpa/ కమ[/kampa/‘thornybush’
6. కంస/kaMsa/ కమ]/kansa/‘kingKansa’
7. ^ంహ/siMha/ ^మ/simha/‘lion’
Table4:HomorganicnasalandHomorganicnasal+Halant
3.5.3Nasalization:Candrabindu(◌U+0C00)orarasunna(◌ఁU+0C01)
Candrabindu,whichdenotesnasalizationoftheprecedingvowel,isusedinthePrakrittextstranscribedintheTeluguscriptandthearasunnaasinoldTelugu!ల$ఁగ&/telũgu/‘telugu’.Present-dayTeluguusersdonotusethecandrabindufrequentlyunlesstobringspecialemphasisasinhãã,hũũ,etc.
3.5.4TheConsonantsTheTeluguconsonantshaveanimplicitvowel/a/includedinthem.Asperthetraditionalclassification theyare categorizedaccording to theirphoneticproperties.Thereare5vargagroups(classes)andonenon-vargagroup.Eachvargacorrespondstoaparticularsetofstopscharacterizedbyparticularplaceofarticulation.Eachvargacontainsfouroralstopsandonenasalstoporderedbythecomplexityoftheirmannerfromlefttorightas[-vd,-asp, -nas], [-vd, +asp, -nas], [+vd, -asp, -nas], [+vd, +asp, -nas], [+vd, -asp, +nas](where,vd=voiced,asp=aspirated,nas=nasal).Eachfeaturesetdefinesthecharacterbythevarga.Eachvargafromtoptobottomaredefinedbyanadditionalplacefeatureofarticulation.Thenon-vargasetisagaindividedintotwosubsets,eachischaracterizedbyabsenceorpresenceofsonority,i.e.[+/-son].Theobstruentscharacterizedby[–son]are
7
fricatives,viz.శ[ś],ష[ṣ],స[s],హ[h],whiletheremainingcarrythefeatureofsonorityi.e.[+son].No.
PlaceofArticulation
-asp-vd-nas
ISO
+asp-vd-nas
ISO
-asp+vd-nas
ISO
+asp+vd-nas
ISO
-asp+vd+nas
ISO
1. Velar క k ఖ kh గ g ఘ gh ఙ ṅ
2. Palatal చ c ఛ ch జ j ఝ jh ఞ ñ
3. Retroflex ట ṭ ఠ ṭh డ ḍ ఢ ḍh ణ ṇ
4. Dental త t థ th ద d ధ dh న n
5. Bilabial ప p ఫ ph బ b భ bh మ m
Table5:Classificationofstopconsonants
SonorantsFricatives
య y ర r ఱ ṛ ల l ళ ḷ వ v
శ ś ష ṣ స s హ h
Table6:Non-stopconsonants
4.TheDevelopmentProcessandMethodologyTheNeo-BrahmiGenerationPanelinvolvesanumberofdifferentscriptswithdistinctUnicodeblocks.EachofthesescriptsusuallywillhaveaseparateLGR.However,acommonthreadrunsthroughtheneo-BrahmiscriptsintheprocessofLGRdevelopment.Anumberofguidingprinciplesthatarelaidoutwillbeusedinthedevelopmentofthescheme.Asspecifiedelsewhere,theNBGPadoptsthefollowingprinciplesintheselectionofcode-pointsfromthecode-pointrepertoirefortheTelugulanguagescript.Aprinciple,liketheInclusionprinciple,dealswithwhetherthecharacterisregularlyusedinthelanguage,besidesitsunambiguousnature.Thesecondimportantprinciple,theexclusionprinciple,dealswiththeuseofthecodepointrepertoireforrootzoneanddoesnotalloweverycharacterthatistabulatedintheUnicodechart.AbaselinelayerofrestrictionissetfortheDomainNameSystembytheprotocol known as IDNA (Internationalized Domain Names in Applications). IDNAexcludes some characters from the Unicode repertoire for the concerned script. Anadditionallayerisaddedfortherootzone,calledtheMaximalStartingRepertoire(MSR).Telugudoesnothavemanysuchcharactersthatarerestricted.Onesuchcharacterfor
8
exampleis,theAvagraha"ఽ"(U+0C3D),whichisrestrictedbyMSRevenifallowedby
theIDNAprotocol.Similarly, certain punctuation marks that were used in the traditional texts are notassignedanycodepointsandhencenotnecessarytobeincludedhere.Othercasessuchas symbols and abbreviations are not permitted. In addition to the above, rare andobsolete characters though recognized in the Unicode chart of Telugu will not bepermittedintherootzoneLGR.
4.1ZeroWidthJoinerandZeroWidthNon-JoinerinTeluguDomainNamesMSRexcludesinvisiblecharacterslikeZeroWidthNon-Joiner(U+200C)andZeroWidthJoiner (U+200D), as they require ad hoc representation in different ways. These arerequiredincertaincaseswhereatypicalvisualshapeofanaksharisdesired.TherearecontrastiveusagesofwrittenformsderivedfromtheuseofZeroWidthJoiner(ZWJ)andZeroWidthNon-Joiner(ZWNJ).TheyhavespecialrolesinthewritingsystemofTelugu.ZWNJisusedinsequenceslikeConsonant(C)+Halant(U+0C4D)+Consonant,wherethesecond C is prevented from taking the usual dependent allograph (vattu) form after(below)thefirstconsonant,asinthefollowingexample:
1. క(U+0C15) + ◌ (U+0C4D)+స(U+0C38) + ◌ (U+0C4D)+వ (U+0C35) +◌ (0C3E)=
z]{–withoutusingZWNJ
Example: |z]{తంత}~ం
2. క(U+0C15) + ◌ (U+0C4D) + ZWNJ (U+200C)+స(U+0C38) + ◌ (U+0C4D)+వ
(U+0C35) +◌ (0C3E)= � ��– usingtheZWNJ
Example: |� ��తంత}~ం
Bothformsofthewordsthoughwrittenwithdifferentgraphicsignsmaymeanthesameand theyarealsosameeven in theirpronunciation.Though thesecond formwasnotpreviouslycommon,itsusageisgaininggroundduetotheinfluenceofEnglishandHindi.ItisfrequentlyusedintranscribingmanyEnglishwordsintoTelugu,suchas‘software’(��V |�� ,usingZWNJ). Theword‘software’willbecome���V{� ifZWNJisnotused.
4.2HowtoAvoidDuplicateDomainNamesInvolvingZWJandZWNJ?ZWJandZWNJareusedmainlytowritetwodistinctdisplaysofthesameconsonantclusterorsequencewhichdonothaveanysemanticandphoneticsignificance.WhenZWJandZWNJsareallowedindomainnamesforTelugu,theycreatetwodistinctformsofthesamedomainname.TomakethebrowsersandDNSstotreatthemasequal,we
9
havetoignoreZWJandZWNJsforcomparingtwowords.Thesameprocedureisusuallyfollowedbythespell-checkersofthelanguage.AcceptingZWJandZWNJindomainnamescreatesconfusiontoamajorityofthelinguisticcommunityandjoinercharactersareprohibitedfortheRootZone,hencethisisexplicitlyprohibitedbytheNBGP.
10
5.TheRepertoireIn this section, we present the discussion on the code points that would form the repertoire of code points licensed by the [MSR-3] to be validated and used in the root zone label generation rules. Section5.1providesthesectionofthe[MSR-3]applicabletotheTeluguscriptonwhichtheTelugucodepointrepertoireisbased.Section5.2detailsthecodepointrepertoirethattheNeo-BrahmiGenerationPanelproposestobeincludedintheTeluguLGR.5.1 Telugu section of Maximal Starting Repertoire [MSR] Version 3
Color convention1: Allcharactersthatareincludedinthe[MSR]arehighlightedinYellowbackgroundPVALIDinIDNA2008butexcludedfromthe[MSR]arehighlightedinPinkishbackgroundNotPVALIDinIDNA2008areinWhitebackground
Figure2:TeluguCodePagefrom[MSR-3] 1This document needs to be printed in color for this to be read correctly.
11
5.2CodePointsRepertoireIn the following, the Telugu Script Unicode Code points have been presented anddiscussedwithreferencetothePrinciplesthatconstrainthelabelgenerationrules.ItisimportanttonotethatthepurposeofthisdocumentistostateunambiguouslytheTelugucodepointsthatcanbeusedintherootzonerepertoire.Thefollowingtablelists63codepointsfortheTeluguLGR,outofatotalnumberof67codepointslistedinMSR-3,excludingfourcodepointswhichareobsolete.
No. UnicodeCodePoint
Glyph CharacterName
EGIDSstatus
IndicSyllabicCategory
Reference
1. 0C02 ◌ం TELUGUSIGNANUSVARA
2Tel4San5Others2
ANUSVĀRA 102,103
2. 0C03 ◌ః TELUGUSIGNVISARGA
2Tel4San5Others
VISARGA 102,103
3. 0C05 అ TELUGULETTERA 2Tel5Others
Vowel 102,103
4. 0C06 ఆ TELUGULETTERAA 2Tel5Others
Vowel 102,103
5. 0C07 ఇ TELUGULETTERI 2Tel5Others
Vowel 102,103
6. 0C08 ఈ TELUGULETTERII 2Tel5Others
Vowel 102,103
7. 0C09 ఉ TELUGULETTERU 2Tel5Others
Vowel 102,103
8. 0C0A ఊ TELUGULETTERUU 2Tel5Others
Vowel 102,103
9. 0C0B ఋ TELUGULETTERVOCALICR
2Tel5Others
Vowel 102,103
100C0E ఎ TELUGULETTERE 2Tel5Others
Vowel 102,103
11.0C0F ఏ TELUGULETTEREE 2Tel5Others
Vowel 102,103
2 Others are the EGIDS 5 languages, listed in Table 1: Main languages considered under Telugu LGR
12
No. UnicodeCodePoint
Glyph CharacterName
EGIDSstatus
IndicSyllabicCategory
Reference
12.0C10 ఐ TELUGULETTERAI 2Tel5Others
Vowel 102,103
13.0C12 ఒ TELUGULETTERO 2Tel5Others
Vowel 102,103
14.0C13 ఓ TELUGULETTEROO 2Tel5Others
Vowel 102,103
15.0C14 ఔ TELUGULETTERAU 2Tel5Others
Vowel 102,103
16.0C15 క TELUGULETTERKA 2Tel5Others
Consonant 102,103
17.0C16 ఖ TELUGULETTERKHA
2Tel5Others
Consonant 102,103
18.0C17 గ TELUGULETTERGA 2Tel5Others
Consonant 102,103
19.0C18 ఘ TELUGULETTERGHA
2Tel5Others
Consonant 102,103
20.0C19 ఙ TELUGULETTERNGA
2Tel5Others
Consonant,Nasal-Consonant
102,103
21.0C1A చ TELUGULETTERCA 2Tel5Others
Consonant 102,103
22.0C1B ఛ TELUGULETTERCHA
2Tel5Others
Consonant 102,103
23.0C1C జ TELUGULETTERJA 2Tel5Others
Consonant 102,103
24.0C1D ఝ TELUGULETTERJHA
2Tel5Others
Consonant 102,103
25.0C1E ఞ TELUGULETTERNYA
2Tel5Others
Consonant,Nasal-Consonant
102,103
26.0C1F ట TELUGULETTERTTA
2Tel5Others
Consonant 102,103
13
No. UnicodeCodePoint
Glyph CharacterName
EGIDSstatus
IndicSyllabicCategory
Reference
27.0C20 ఠ TELUGULETTERTTHA
2Tel5Others
Consonant 102,103
28.0C21 డ TELUGULETTERDDA
2Tel5Others
Consonant 102,103
29.0C22 ఢ TELUGULETTERDDHA
2Tel5Others
Consonant 102,103
30.0C23 ణ TELUGULETTERNNA
2Tel5Others
Consonant,Nasal-Consonant
102,103
31.0C24 త TELUGULETTERTA 2Tel5Others
Consonant 102,103
32.0C25 థ TELUGULETTERTHA
2Tel5Others
Consonant 102,103
33.0C26 ద TELUGULETTERDA 2Tel5Others
Consonant 102,103
34.0C27 ధ TELUGULETTERDHA
2Tel5Others
Consonant 102,103
35.0C28 న TELUGULETTERNA 2Tel5Others
Consonant,Nasal-Consonant
102,103
36.0C2A ప TELUGULETTERPA 2Tel5Others
Consonant 102,103
37.0C2B ఫ TELUGULETTERPHA
2Tel5Others
Consonant 102,103
38.0C2C బ TELUGULETTERBA 2Tel5Others
Consonant 102,103
39.0C2D భ TELUGULETTERBHA
2Tel5Others
Consonant 102,103
40.0C2E మ TELUGULETTERMA 2Tel5Others
Consonant,Nasal-Consonant
102,103
41.0C2F య TELUGULETTERYA 2Tel5Others
Consonant 102,103
14
No. UnicodeCodePoint
Glyph CharacterName
EGIDSstatus
IndicSyllabicCategory
Reference
42.0C30 ర TELUGULETTERRA 2Tel5Others
Consonant 102,103
43.0C32 ల TELUGULETTERLA 2Tel5Others
Consonant 102,103
44.0C33 ళ TELUGULETTERLLA
2Tel5Others
Consonant 102,103
45.0C35 వ TELUGULETTERVA 2Tel5Others
Consonant 102,103
46.0C36 శ TELUGULETTERSHA
2Tel5Others
Consonant 102,103
47.0C37 ష TELUGULETTERSSA
2Tel5Others
Consonant 102,103
48.0C38 స TELUGULETTERSA 2Tel5Others
Consonant 102,103
49.0C39 హ TELUGULETTERHA 2Tel5Others
Consonant 102,103
50.0C3E ◌ TELUGUVOWELSIGNAA
2Tel5Others
Matra 102,103
51.0C3F ◌ TELUGUVOWELSIGNI
2Tel5Others
Matra 102,103
52.0C40 ◌ TELUGUVOWELSIGNII
2Tel5Others
Matra 102,103
53.0C41 ◌ు TELUGUVOWELSIGNU
2Tel5Others
Matra 102,103
54.0C42 ◌ూ TELUGUVOWELSIGNUU
2Tel5Others
Matra 102,103
55.0C43 ◌ృ TELUGUVOWELSIGNVOCALICR
2Tel5Others
Matra 102,103
56.0C44 ◌ౄ TELUGUVOWELSIGNVOCALICRR
2Tel5Others
Matra 102,103
57.0C46 ◌ TELUGUVOWELSIGNE
2Tel5Others
Matra 102,103
15
No. UnicodeCodePoint
Glyph CharacterName
EGIDSstatus
IndicSyllabicCategory
Reference
58.0C47 ◌ TELUGUVOWELSIGNEE
2Tel5Others
Matra 102,103
59.0C48 ◌ TELUGUVOWELSIGNAI
2Tel5Others
Matra 102,103
60.0C4A ◌ TELUGUVOWELSIGNO
2Tel5Others
Matra 102,103
61.0C4B ◌ TELUGUVOWELSIGNOO
2Tel5Others
Matra 102,103
62.0C4C ◌ TELUGUVOWELSIGNAU
2Tel5Others
Matra 102,103
63.0C4D ◌ TELUGUSIGNVIRAMA
2Tel5Others
Matra 102,103
Table7:Includedcodepoints
5.3CodePointsNotIncludedReferringtotheprincipleinsection4,thecodepointstobeexcludedfromtherepertoirearethefollowing,forthereasonslisted.Thefollowingcodepointsarenotinwidespreaduse.
• 0C00◌TELUGULETTERCANDRABINDU• 0C01◌ఁ TELUGULETTERARASUNNA
• 0C0CఌTELUGULETTERVOCALICL
• 0C31ఱTELUGULETTERRRA
Varioussigns:Allographsofvoweldiacritics/a:/andpartofadiacriticspecifictoparticularconsonant/h/.
• 0C55◌TELUGULENGTHMARK
• 0C56◌TELUGUAILENGTHMARK
Historicphoneticvariants:Phonologicalvariantsshallnotbepermitted.TheyarenotinMSR-3.
• 0C58ౘTELUGULETTERTSA
16
• 0C59ౙTELUGULETTERDZA
ThetwoadditionalvowelslistedbelowtotranscribeSanskritarenotpermitted.TheyarenotinMSR-3.
• 0C60ౠTELUGULETTERVOCALICRR
• 0C61ౡTELUGULETTERVOCALICLL
ThefollowingtwodependentvowelsusedtotranscribeSanskritsoundsarenotpermitted.TheyarenotinMSR-3.
• 0C62◌TELUGUVOWELSIGNVOCALICL
• 0C63◌TELUGUVOWELSIGNVOCALICLL
StartingfromtheMSR-3,Therearefourcodepointstobeexcluded.
No. UnicodeCodePoint
Glyph
CharacterName
EGIDSstatus
IndicSyllabicCategory
Reference Note
1. 0C0C ఌ TELUGULETTERVOCALICL
2Telu5Gon6bother
Vowel 103,108,109
ItisnotusedinmodernTelugu
2. 0C31 ఱ TELUGULETTERRRA
2Telu5Gon6bother
Consonant 103,108,109
ItisnotusedinmodernTelugu
3. 0C55 ◌ TELUGULENGTHMARK
2Telu5Gon6bother
Matra 103,108,109
Itisnotavailableongeneralkeyboard.
4. 0C56 ◌ TELUGUAILENGTHMARK
2Telu5Gon6bother
Matra 103,108,109
ItisnotusedinmodernTelugu
Table8:Excludedcodepoints
6.VariantsTelugu code points representing the basic simple stand-alone characters and somedependentcharactersmayenterintodifferentcombinationstoformsyllables.TherearenocharactersintheTeluguUnicodechartthateitherinsimpleformorincombinedform
17
aredeemedsimilarbyNBGP.However,Teluguhasasmallnumberofvariantsthathaveidenticalvaluesbutderivefromdifferentcharactercombinations.TheNBGPcategorizestheseconfusinglysimilarvariantsintwogroups.
6.1Type1:SimilaritywithintheScriptCertainvowels[o,ō]displaydifferentshapesincombinationwithcertainconsonants,thoughtheyhavesharedsoundandcodepointvalues.Forexample:
i. Ca+e+u(:)->mo(:)
ii. Ca+o(:)->ko(:)
Thevariants,whichareoftenconfusingandofvariableacceptanceareduetothedisplayoftheirrenderingdifferentlyduetotheidenticalcodepoints.These cases are interesting in that they present no similarity in their forms but havesimilarphoneticoutput.ItisnotunusualtofindsuchregionalvariationsandtheyareregularlyusedbyTeluguusers.Thesemaynotcauseconfusionbutbecomeannoyingtolearners.However,◌+◌ు(U+0C46+U+0C41)ismatra+matrasequence,whichisnotallowedintheWLE rules in section 7. Therefore, these are not defined as variant sequences byNBGP.Class Characterseq.[Ca+e+u] ->Co<- Ca+o
1 [క+◌+◌ు]->
0C15+0C46+0C41(This class includes otherconsonants like, kha, ga,nga,ca,cha,ja,nya,ta,tha,da,dha,na,ta,tha,da,dha,na, pa, pha, ba, bha, ra, la,va,Sa,sha,sa,andha)
z�ు
Blocked
z� క+◌
0C15+0C4A
2
[మ+◌+◌ు]->
0C2E+0C46+0C41
� మ
Blocked
మ+◌
0C2E+0C4A
[య+◌+◌ు]->
0C2F+0C46+0C41
� య Blocked య+◌
0C2F+0C4A
[ఝ+◌+◌ు]->
0C1D+0C46+0C41
� ఝ Blocked ఝ+◌
0C1D+0C4A
18
[ఘ+◌+◌ు]->
0C18+0C46+0C41
� ఘ Blocked ఘ+◌
0C18+0C4A
Table9a:Similaritywithinthescript
6.2Type1:VariantswithinScriptduetoAlternativeSpellingSimilar to the above, there are a set of representations in Telugu syllable formationswhere a homorganic nasal (anusvāra) in a syllable has alternate spelling which isrepresentedvisuallydifferent,asshownbelow.
No. Homorganicnasal(anusvāra)+consonant
Homorganicnasalconsonant+halant+consonant
1. లంక/laMka/ లఙN/laŋka/‘island’
2. కంO/kaMce/ కఞQR[kaɲce]‘fence’
3. పంట/paMTa/ పణV /paNTa/‘harvest’
4. కంత/kaMta/ కనY /kanta/ ‘hole’
5. కంప/kaMpa/ కమ[/kampa/‘thornybush’
6. కంస/kaMsa/ కమ]/kansa/‘kingKansa’
7. ^ంహ/siMha/ ^మ/simha/‘lion’
Table9b:Variantswithanusvāraalternatingwithnasalconsonants
Writingalternativelywithanasalconsonant+halant+consonantisrareinTeluguandoftenoccurwhiletranscribingSanskritwords.Sincethevariantshaveexactlythesamepronunciation, the rarer representation of nasal consonant + halant + consonant isdisallowedinordertoavoidthesourceofconfusion.
NasalConsonantsare:1.U+0C19TELUGULETTERNGA(ఙ)2.U+0C1ETELUGULETTERNYA(ఞ)3.U+0C23TELUGULETTERNNA(ణ)4.U+0C28TELUGULETTERNA(న)5.U+0C2ETELUGULETTERMA(మ)
Similarlyandveryfrequently,thewordfinalమ&[mu]isoftenrepresentedalternativelybythevariantanusvāra◌ం[M]asinthefollowing:
కలంkalaM కలమ& kalamu ‘pen’
19
ప�సYకం pustakaM ప�సYకమ& pustakamu ‘book’
ఆమ&దం a:mudaM ఆమ&దమ& a:mudamu ‘castoroil’
�శం deSaM �శమ&deSamu ‘country’
Insuchcases,oneoftheconfusablevariantsmustbedisallowed.ThiscanbedisallowedbytheWLErule:Hcannotfollowanasalconsonant.
6.3Type2:SharedSimilaritywiththeOtherRelatedScripts.There aremanyBrahmiderived scripts particularly in theSouthern part of India, SriLanka,andSouthEastAsia.Someofthecharactersofthesescriptsdisplaysimilaritywitheachother.Suchcases,relevantforTeluguscript,aregivenbelow.
6.3.1Type2:Cross-ScriptVariantsforTeluguandKannadaAnumberofcharactersoftheKannadascriptarealmostsimilartocharactersofTeluguscript,exceptfortheflattenedhead-strokeinKannadacontrastingwithatickmarkonthetopofthecharacterinTelugu.Outofthetotal,thereare34suchcaseswhicharecategorizedasvariantsets,asshowninthefollowingtable.
VariantSet TeluguCodePoint KannadaCodePoint 1 ◌ం (0C02) ◌ಂ (0C82)
2 ◌ః (0C03) ◌ಃ (0C83)
3 అ (0C05) ಅ (0C85)
4 ఆ (0C06) ಆ (0C86)
5 ఇ (0C07) ಇ (0C87)
6 ఈ (0C08) ಈ (0C88)
7 ఐ (0C10) ಐ (0C90)
8 ఒ (0C12) ಒ (0C92)
9 ఓ (0C13) ಓ (0C93)
10 ఔ (0C14) ಔ (0C94)
11 ఖ (0C16) ಖ (0C96)
12 గ (0C17) ಗ (0C97)
13 జ (0C1C) ಜ (0C9C)
14 ఝ (0C1D) ಝ (0C9D)
20
VariantSet TeluguCodePoint KannadaCodePoint 15 ఞ (0C1E) ಞ (0C9E)
16 ట (0C1F) ಟ (0C9F)
17 ఠ (0C20) ಠ (0CA0)
18 డ (0C21) ಡ (0CA1)
19 ఢ (0C22) ಢ (0CA2)
20 ణ (0C23) ಣ (0CA3)
21 థ (0C25) ಥ (0CA5)
22 ద (0C26) ದ (0CA6)
23 ధ (0C27) ಧ (0CA7)
24 న (0C28) ನ (0CA8)
25 బ (0C2C) ಬ (0CAC)
26 భ (0C2D) ಭ (0CAD)
27 మ (0C2E) ಮ (0CAE)
28 య (0C2F) ಯ (0CAF)
29 ర (0C30) ರ (0CB0)
30 ల (0C32) ಲ (0CB2)
31 ళ (0C33) ಳ (0CB3)
32 ◌ (0C3F) ◌ (0CBF)
33 ◌ు (0C41) ◌ು (0CC1)
34 ◌ృ (0C43) ◌ೃ (0CC3)
Table10:Cross-scriptvariantcodepointsforTeluguandKannada TheTeluguandKannadavariantsetsinTable10arecross-scriptvariantcodepoints.Thedetailsofvariousaksharcombinationsandvariantdispositioncanbefoundinsection6.4Codepointswhichhavebeenanalyzedandfoundtobesimilar,butnotconsideredasvariants,arelistedinAppendixA.
21
6.3.2Type2:Cross-ScriptVariantsforTeluguandDevanagariVisargaistheonlyidenticalcodepointthatexhibitsshapesimilaritybetweentheTeluguandDevanagariscripts.However,astherearenoothervariantcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.
DevanagariCodePoint TeluguCodePoint
◌ः (0903) ◌ః (0C03)Table11:Candidatecross-scriptvariantcodepointforTeluguandDevanagari
6.3.3Type2:Cross-ScriptVariantsforTeluguandGujaratiVisargaistheonlyidenticalcodepointthatexhibitsshapesimilaritybetweentheTeluguandGujaratiscripts.However,astherearenootheridenticalcodepointsbetweenthetwolanguages,itisnotdefinedasavariantcodepoint.
GujaratiCodePoint Telugu CodePoint
◌ઃ (0A83) ◌ః (0C03)
Table12:Candidatecross-scriptvariantcodepointforTeluguandGujarati
6.3.4Type2Cross-ScriptVariantsforTeluguandOriya ThefollowingcodepointsexhibitsimilaritybetweentheTeluguandOriyascripts.
TeluguCodePoint Oriya CodePointం (0C02)ANUSVĀRA
ଠ (0B20)LETTERTTHA
ః (0C03)SIGNVISARGA
ଃ (0B03)SIGNVISARGA
ర (0C30)LETTERRA
ଠ (0B20)LETTERTTHA
Table13:Candidatecross-scriptvariantcodepointsforTeluguandOriyaThefirsttwo(U+0C02–U+0B20andU+0C03–U+0B03)aredependentsignsandU+0C30isastand-alonecharacterinTelugu.NBGPdiscussionsconcludedthatthereisnoneedtorecognizethecross-scriptvariantcodepointsbetweentheOriyaandtheTeluguscripts.ThisisbecauseU+0C30andU+0B20aredistinguishableandtherearenotenoughothervariantcodepointsineachscripttoformlabelsthatlookthesame.Therefore,thesearenotdefinedasvariantcodepoints.
6.3.5Type2:Cross-ScriptVariantsforTeluguandMalayalamThetwocodepoints,viz.theanusvāraandthevisargaaretheonlyidenticalsignsbetweentheTeluguandMalayalamscripts.However,astherearenotenoughother
22
variantcodepointstoformlabels,theyarenotdefinedasvariantcodepointsbetweenthetwolanguages.
TeluguCodePoint Malayalam CodePoint
◌ం (0C02) ം (0D02)
◌ః (0C03) ഃ (0D03)
Table14:Candidatecross-scriptvariantcodepointsforTeluguandMalayalam
6.3.6Type2:Cross-ScriptVariantsforTeluguandSinhalaThefollowingthreepairsofcharactersrepresentedbythecorrespondingcodepointsbetweentheTeluguandSinhalawhichmaybeconsideredashavingonlysimilarityifthesimilaritybetween0C30and0DBBisnotsustainable.HoweverNBGP,inconsultationwithSinhala,concludesthat0C30and0DBBcouldcauseconfusionfromthescriptuserpointofview.Therefore,theyareproposesascrossscriptvariantsbetweenthetwoscriptsandthedispositionisblocked.”ThisanalysisfollowstheNBGPCross-scriptVariantinclusionpolicyavailableinAppendixC.
TeluguCodePoint Sinhala CodePoint
◌ం (0C02) ං (0D82)
◌ః (0C03) ඃ (0D83)
ర (0C30) ර (0DBB)
Table15:Cross-scriptvariantcodepointsforTeluguandSinhala
6.4CrossScriptVariantsofVariousAksharCombinations6.4.1ConjunctConsonantCombinationsCrossscriptvariantsofvariousAksharcombinations(consonant-consonant-dependentcharacters)commonbetweentheTeluguandKannadascriptsincludethefollowing:
VariantSet TeluguCodePoint KannadaCodePoint 1 ◌ం (0C02) ◌ಂ (0C82)
2 ◌ః (0C03) ◌ಃ (0C83)
3 ఖ (0C16) ಖ (0C96)
4 గ (0C17) ಗ (0C97)
5 జ (0C1C) ಜ (0C9C)
6 ఝ (0C1D) ಝ (0C9D)
23
VariantSet TeluguCodePoint KannadaCodePoint 7 ఞ (0C1E) ಞ (0C9E)
8 ట (0C1F) ಟ (0C9F)
9 ఠ (0C20) ಠ (0CA0)
10 డ (0C21) ಡ (0CA1)
11 ఢ (0C22) ಢ (0CA2)
12 ణ (0C23) ಣ (0CA3)
13 థ (0C25) ಥ (0CA5)
14 ద (0C26) ದ (0CA6)
15 ధ (0C27) ಧ (0CA7)
16 న (0C28) ನ (0CA8)
17 బ (0C2C) ಬ (0CAC)
18 భ (0C2D) ಭ (0CAD)
19 మ (0C2E) ಮ (0CAE)
20 య (0C2F) ಯ (0CAF)
21 ర (0C30) ರ (0CB0)
22 ల (0C32) ಲ (0CB2)
23 ళ (0C33) ಳ (0CB3)
24 ◌ (0C3F) ◌ (0CBF)
25 ◌ు (0C41) ◌ು (0CC1)
26 ◌ృ (0C43) ◌ೃ (0CC3)
Table16:Cross-scriptvariantsbetweenTeluguandKannadaforconjunctconsonantcombinationanalysis
Table16includes26distinctTelugucodepointsthatoccurintheformationofconjunctconsonantcombinationsinTeluguandKannada.ExcludingthestandalonevowelsfromthetotalcommonAksharcombinationsofcrossscriptvariants,thereareasetof21consonants(C),threevowelmatras(M)andtwovowelmodifiersthatenterintotheformationofthefollowingcombinations:
24
Sl.No.
Aksharcombinations Number
1. CM =21*3=632. CB =21*1=213. CX =21*1=214. CHCM =21*21*3=13235. CHCB =21*21*1=4416. CHCX =21*21*1=4417. CHCMB =21*21*3*1=13238. CHCMX =21*21*3*1=13239. Allcombinations: =4956
Table-17totalnumberofAksharcombinations
Thereoccursatotalof4956conjunctconsonantcombinationsmodifiedbymatrasandvowelmodifiersthatareidenticalandcanbelabeledforvariantlabelsbetweenTeluguand Kannada scripts. These combinations are covered by the variant code points inSection6,Table10andTable15.
6.4.2OtherCombinations
NBGPcreatesthepossiblecombinationsofTelugucodepointsandcrosscheckwithotherNeo-Brahmiscriptsforcandidatevariants.Thepossiblecombinationsare:
1.CHCMB,CHCMX2.CHCM,CHCB,CHCX3.VB,VX,V4.CHC,CM,CB,CX,C
Where,
C → ConsonantM → MatraV → VowelB → Anusvāra(Bindu)X → Visarga H → Halant/Virama
NBGPconcludesthatbesidethoseidenticalcodepointsdefinedasvariantsinSection6,Table 10 and Table 15, there are no other variant code points between Telugucombinationsandotherscriptscodepointsorcodepointcombinations.6.5Variantdisposition
As variantsmentioned in Section 6, Table 10 and Table 15 can result inwhole labelvariants,theymaybeconsideredfor"blocked"disposition.Thereisnopreferenceamongthesevariants.Whicheverlabelcontainingeitherofthesevariantsischosenearlier,theotherequivalentvariantlabelshouldbeblocked.
25
7.WholeLabelEvaluationRules(WLE)InthissectionweprovidetheWLEsthatarerequiredbythelanguage.Anumberofrules have been formulated so that they can be adopted for LGR specification.BelowarethesymbolsusedintheWLErules,foreachofthe"IndicSyllabicCategory"asmentionedintheTable7:Codepointrepertoireandthedetailsofsyllableformation,seeAppendixB.
C → ConsonantM → MatraV → VowelB → Anusvāra(Bindu)X → Visarga H → Halant/Virama Nasal-C → NasalConsonant
Rule1. HmustbeprecededbyC(Ref.AppendixB:SyllableformationRule4)Rule2. MmustbeprecededbyC(Ref.AppendixB:SyllableformationRule6)Rule3. XmustbeprecededbyVorMorC(Ref.AppendixB:syllableformationrule3c,
5cand7c)Rule4. BmustbeprecededbyVorMorC(Ref.AppendixB:syllableformationrule3b,
5band7b)Rule5. HcannotfollowNasal-C(Ref.Section6.2Type1)Rule6. VcannotbeprecededbyHForRule6,therecouldbecasesinvolvingmulti-worddomainswhereVmayneedtobeallowedtofollowanH.ThisisthecasewheretwodifferentwordsarejoinedtogetherbutfirstofwhichendswithaHalantandthesecondwordbeginswithaVowel.SomesectionsofthelinguisticusagerequiretheexplicitpresenceofHforfullrepresentationofthesoundintended.However,byandlarge,theformofthefirstwordwithouttheHisconsideredenoughforfullrepresentationofthesoundintended as in the following examples: Example:
‘houseofknowledge’:��� అ�ఉల��da:rHalHulu:mH/��� అల$ల��da:rHalulu:mH‘TheQor’an’:ఖు� ఆ� KhurHa:nH/ఖు�� Khura:nH‘inTelanganaRashtraSamiti’:ట�ఆ� ఎ� ల�ti:a:rHesHlo/ట�ఆ���� ti:a:rHesHlo‘Y.S.R.C.party’:|¡ఎ� ఆ� ¢vaiesHa:rHsi:pగ/|¡ఎ��£]¢vaiesHa:rHsi:pi ‘BritishIndia’:¤}ట¥¦ఇం§య©bHritiShHiMdiya /¤}ట¥ªం§య©britiShiMdiya
TherepresentationswheretherearecaseswithVprecededbyHagainstwhereVisnotprecededbyH,thelatterisawkwardandtheformerisindemandinmodernusage.
Thisisauniquesituationnecessitatedbythelackofhyphen,spaceortheZeroWidthNon-joinercharacterinthepermissiblesetofcharactersintheRootzonerepertoire.Otherwise,VisneverrequiredtobeallowedtofollowanH.However,permittingthis
26
maycreateaperceptuallydissimilarbutphoneticallyandsemanticallysimilaritybetweenthetwolabels(withandwithoutH)formajorityofthelinguisticcommunity,hencethisisexplicitlyprohibitedbytheNBGP.8.ContributorsGangadharPandayUmaMaheshwaraRao,G.NBGPmembers
9.References[MSR-3] IntegrationPanel,"MaximalStartingRepertoire—MSR-3Overviewand
Rationale",28March2018https://www.icann.org/sites/default/files/packages/lgr/msr/msr-3-wle-rules-28mar18-en.html
[101] Disanayaka,J.B.2017.EncyclopediaofSinhalaLanguageandCulture.Colombo:SumithaPublishers.Firstedition2012.
[102] Krishnamurti,Bhadriraju,Ed.,2000.Telugubhaashaacharitra.Hyderabad:P.S.TeluguUniversity.Firstedition1974.
[103] Krishnamurti,BhadrirajuandJPLGwynn.1985.AGrammarofModernTelugu.NewDelhi:OxfordUniversityPress.ISBN978-0-19-561664-4.Delhi.
[104] Sarma,I.K.1980.CoinageofSatavahanaEmpire.Delhi:AgamKalaPrakashan,
[105] Sridhar,S.N.1980.Kannada.NewYork:Routledge.
[106] Suresh,Kolichala.2012.ProposaltoencodeTeluguLLLA,Teluguೞ:http://eemaata.com/unicode-proposal/telugu-llla-proposal.pdf.Accessedon9July2018.
[107] Suresh,Kolichala.2012.Divergentdevelopmentsofalveolarstop*ṯinTeluguhttp://kolichala.com/dravidian/Divergent_developments_of_alveolar_stop_in_Telugu.pdf.Accessedon9July2018.
[108] TeluguUnicodeChart,TeluguRange:0C00–0C7F.TheUnicodeStandard,Version10.0.http://www.unicode.org/Public/10.0.0/charts.Accessedon9July2018.
[109] UmaMaheshwaraRao,G.2012.Telugubhaasha-saMgaNanaM.Hyderabad:P.S.TeluguUniversity.ISBN:81-86073-372-9.
[110] UmaMaheshwaraRao,G.2003.StandardTeluguWrittenLanguage.VIDYULLIPI-4.pp.1-14.Hyderabad:SCIL.
[111] UshaDevi,A.andChandraSekharaReddy.D.2015.PeoplesLinguisticSurveyof India.AndhraPradeshandTelanganarAshtraalabhaashalu,vol.3,part1.ISBN:978-93-85231-05-6.Hyderabad:emesco.
27
AppendixA:ConfusableCodePointsAnalysis
A-1.TeluguandKannadaThefollowingtabledefinesTeluguandKannadacodepointswhichareconfusable.
No.
Telugu Kannada
CP Glyph CP Glyph
1 0C35 వ 0CB5 ವ
2 0C36 శ 0CB6 ಶ
3 0C38 స 0CB8 ಸ
TableA-1:ConfusablecodepointsofTeluguandKannadascript The following table lists other code points which have been analyzed and concluded that they are distinguishable.
No.
Telugu Kannada NBGPresolution
CP Glyph CP Glyph
1 0C0E ఎ 0C8E ಎ distinguishable
2 0C18 ఘ 0C98 ಘ distinguishable
3 0C19 ఙ 0C99 ಙ distinguishable
4 0C1A చ 0C9A ಚ distinguishable
5 0C1B ఛ 0C9B ಛ distinguishable
6 0C2A ప 0CAA ಪ distinguishable
7 0C2B ఫ 0CAB ಫ distinguishable
8 0C37 ష 0CB7 ಷ distinguishable
9 0C4C ◌ 0CCC ◌ distinguishable
TableA-2:OtherNBGPresolutionsonTeluguandKannadascript
A-2.TeluguandMalayalamBesidethose identicalcodepointsdefinedasvariants inSection6, therearenoothersimilarcodepointsbetweenTeluguandMalayalam.
A-3.TeluguandSinhalaBesidethose identicalcodepointsdefinedasvariants inSection6, therearenoothersimilarcodepointsbetweenTeluguandSinhala.
28
AppendixB:SyllableformationintheTeluguScriptTheTeluguscriptgrammarallowsustostate thenatureandstructureof thegraphicsyllables in the formation ofwords. The extended notion of syllable is often used tocharacterize orthographies of South-Asian scripts especially Brahmi derived scriptswhere words are composed of sequences of one or more orthographic aksharas orsyllables.Theseaksharasareagaincomposedofsequencesofcertaincharactersfromthealphabet.TheTelugualphabethasthe followingtypesofcharacters(encoded intotheUnicode)thateitherontheirownorbyenteringlargercombinationsformaksharasasshownhere.Thereare12differenttypesofsyllablespossibleinTelugu:ThefollowingVariablesareinvolvedintheformationofsyllable[$]:
• C=Consonants, that arestandalonecharactersorgraphemeswithan inherentvowel`a’canfunctionassyllables;
Stops:క ఖ గ ఘ ఙ చ ఛ జ ఝ ఞ ట ఠ డ ఢ ణ త థ ద ధ నప ఫ బ భ మ;Fricatives:శ ష స హSonorants:య ర ఱ ల ళ వ
• V=Vowels,thatstandaloneandrepresentedbythegraphicsignsofthefollowingmayfunctionassyllables;
అ ఆ ఇ ఈ ఉ ఊ ఎ ఏ ఐ ఒ ఓ ఔ ఋ
• M = Matras or the dependent vowel signs when occurwith a consonant mayfunction as syllables (characteristically delete the inherent vowel of theconsonant);
Example.z z« z¬ క$ క� z� z z�® z� z¯ z°;etc.
• H=Halantorvirama= ◌;ItmayoccurwithoneoftheconsonantsrepresentedbyCtoformCHsyllables;
Example.� ± ² ³ ´
• B=Pūrṇānusvāra,thehomorganicnasalandanArchiphoneme= ◌ం,mayoccurwithoneoftheC,V,andthecombinedCMtoformCB,CMB,VB,andC([HC]*)B
• • X= visarga or the glottal check= ◌ః, may occur with one of the C, V, and the
combinedCMtoformCX,CMX,VXThe operators used: The following four operators are employed to define thedelimitationofthegraphicsyllablesinTelugu.
29
No. Symbol Function;
1. | Alternative;
2. [] enclosesoptionalelements;
3. * Variableoccurrence;
4. () Thesequencecluster;
TableB-1symbolsandfunctionsAnAksharainTelugucanbedefinedasanyCorVandacombinationofM(dependentvowels),andthevowelmodifiersasinthefollowing:ThefollowingsyllableformationrulesderiveallpossiblegraphicsyllablesinTelugu.1.Thesyllableformationrule-1,a$=V;Everystandalonevowelcharactercanfunctionasasyllable,Ex.
అ,ఆ,ఇ,ఈ,ఉ,ఊ,ఎ,ఏ,ఐ,ఒ,ఓ,ఔ,ఋ;Aftertheexclusionofobsoletevowels13syllablesarepossible.2.Thesyllableformationrule-2,a$=C;Everystandaloneconsonantcharactercanfunctionasasyllable,Ex.
క ఖ గ ఘ ఙ, చ ఛ జ ఝ ఞ,
ట ఠ డ ఢ ణ, త థ ద ధ న, ప ఫ బ భ మ, య ర ఱ ల ళ వ,
శ ష స హ;Thereare35suchsyllablesarepossible.3.Syllableformationrule-3,$=VB|X;Example:
3a=V+B=$;అం ఆం ఇం ఈం ఉం ఊం ఎం ఏం ఐం ఒం ఓం ఔం;3b=V+X=$;అః ఆః ఇః ఈః ఉః ఊః ఎః ఏః ఐః ఒః ఓః ఔః;
IncombinationwithVandoneofthetwoBorX,atotal36syllablesarepossible.SyllablecombinationswithvocalicRarenotused.4.Syllableformationrule-4,a$=CH;AstandaloneconsonantmaybeappendedbythehalantmarkerHtoformthecorrespondinggraphicsyllablesasshownhere.
30
Example:� ± ² ³ ´µ ¶ · ¸ ¹º » ¼ ½ ¾¿ À Á  �Ã Ä Å Æ �Ç � È � É ÊË ¦ � Ì
Thereare35suchgraphicsyllablesarepossible.5.Syllableformationrule-5,$=CB|X;Ex.Standaloneconsonantscantakeoneofthethreevowelmodifiersandformthecorrespondingsyllablesasshownbelow:Example:
5a.$=CB:కం ఖం గం ఘం ఙం చం ఛం జం ఝం ఞం టం ఠం etc.5b.$=CX:కః ఖః గః ఘః ఙః చః ఛః జః ఝః ఞః టః ఠః etc.
Thereare2*35=70graphicconsonantmodifiersyllablesarepossible.
6.Syllableformationrule-6,$=CM;Aconsonantmaygetattachedwithavowelmodifierorthedependentvoweldiacritictoformthecorrespondingsyllables;Example:
z z« z¬ క$ క� కృ క z� z z�® z� z¯ z°;etc.Atotalof35*13consonant+voweldiacriticcombinationsmayderive455graphicsyllablesinTelugu.
7.Syllableformationrule-7,$=CMB|X;Aconsonantwithadependentvowelwhenfollowedbyoneofthethreemodifiersmayderivethefollowinggraphicsyllables;Example:
7a.zం z«ం z¬ం క$ం క�ం z�ం zం z�®ం z�ం z¯ం z°ం7b.zః z«ః z¬ః క$ః క�ః z�ః zః z�®ః z�ః z¯ః z°ః
Atotalof35*12*2consonantplusadependentvowelandoneofthethreemodifiersderive840possiblegraphicsyllablesinTelugu.
8.Syllableformationrule-8,$=CH[(C)*C];Anyconsonantfollowedbythehalantmarkermaycombinewithanotherconsonantorconsonantstoformcomplexgraphicsyllables;Example:
2consonantclusters:ÍÎ గÏ ,ÐÑ ,ఙÒ ,చR,ఛÓ,జÔ ,ÕÖ ,ఞ× ,టV ,ఠØ ,డÙ ,ÚÛ ,ణÜ ,etc.
3consonantclusters:రÝÞ,షV ß,సY à,నá ß,ఙÑ ß షâã,త}~,త]ä etc.
4consonantclusters:త]ä~ ;
31
Atotalof35*1*35=1225CHCsyllablesinvolvingtwoconsonantclustersarepossible;Further, a total of 35*1*35*1*35 =42,875 CHCHC syllables involving three consonantclustersarepossible;Thoughfourconsonantclustersareextremelyrarebuttheoreticallypossibleasshownabove.9.Syllableformationrule-9,$=CH(CH[CH])CM;Anyconsonantfollowedbythehalantmarkerandaconsonantorconsonantsmaybeappendedbyoneofthedependentvowelstoformcomplexgraphicsyllablesinvolvingtwotothreeconsonantclusters;Example:
క$N åÎ æçÏ ,èéÑ ,ఙêÒ ,O�R,ఛూÓ,జÔ ,ëìÖ ,ఞí× ,ట�V ,ఠîØ ,§�Ù ,ÚూÛ ,ణ�Ü ,etc.
రïÝÞ ,ªV ß,^Y à,ðá ß,ఙ¥Ñ ß ñ âã,!�} ~,తò]äetc.
!�]ä~
Atotalof35*1*35*1*12=14,700complexsyllablesinvolvingtwoconsonantclustersfollowedbydependentvowelsarepossible.
Atotalof35*1*35*1*35*12=5,14,500complexsyllablesinvolvingthreeconsonantclustersfollowedbydependentvowelsarepossible.ThefollowingisasummaryofpossiblesyllabletypeswiththeglyphsinTelugu:
$= V([B|X])|CM([B|X])|CH(CH[C])M([B|X])Asperourdefinitionthefollowing21subtypesofgraphicsyllablesarepossiblewhichhowevercanbegroupedunder8rulesasdiscussedabove.
$= V|VB|VX|C|CB|CX|CM|CH|CHC|CHCB|CHCX|CHCMreCHCH|CHCHC|CHCHCB|CHCHCX|CHCHCM
Therefore,typologically8distincttypesofgraphicsyllablescanbederivedinthelanguage.
32
AppendixC:NBGPCross-scriptVariantInclusionPolicy
If, inanytwogivenscripts,allthepotentialcross-scriptvariantsconsistofdependent(e.g. Vowel Signs, Anusvara, Visarga, Chandrabindu etc.) charactersONLY, then thatentiresetcanbeignoredandnocross-scriptvariantsbeproposedbetweenthosetwoscripts.
If,inanytwogivenscripts,thereisATLEASTONEnon-dependent(e.g.Consonant,Voweletc.)cross-scriptvariantcharacter/sequencepresent,allthepotentialcross-scriptvariantsbeconsideredandproposedbetweenthetwoscripts.Thiscross-scriptanalysishasbeenrestrictedtothescriptsthathavedescendedfromtheBrahmiasmostofthemsharesimilarusagepatterns.Byandlarge,allofthesescriptshaveacommonsetofcharactersthatexistedinBrahmiscriptandbearthesameidentities.However,asthescriptsbranchedoutfromtheBrahmi,dependingonvariousfactors,theshapesofthecharacterschanged.Thischangeintheshapewasnotuniformacrossallthecharactersandthescripts.Somecharactersshapesdidchangesignificantlywhereassomeofthemstillretainedsimilarity.Thecross-scriptsimilarityanalysisalsoaimstoidentifysuchcaseswherethesamecharacterretainedalmostthesameshapedespitebeingpartofthedifferentscripts.Thesesetofcharactersarevariantsofeachotherintruesensethanmerelyofco-incidentalvisualsimilarity.Since,havingsuchlabelsisarealisticpossibilityandthecorrespondinglabelslookalmostexactlyalike,NBGPhasproposedthemasblockedvariants.
NBGPacknowledgestheconcernthatthisshapeisquitegenericandmayhaveparallelsinotherscriptsnotunderitsambit.However,asNBGPdoesnothaveanyexposureaboutactualusageofthosecharactersinthoseparticularscripts,NBGPdesistedfromincludingthemintheanalysis.AsNBGPhasalreadyconsideredalltherelatedscriptsunderthecross-scriptvariantanalysis,thesimilarityofthecharactersbelongingtoNBGPscriptswith other scripts not under the NBGP ambit,may be of amere co-incidental visualnature.
Additionally,thisconcernisnotlimitedtothesetwocharactersbutforallthecharactersinallthescriptsunderthescopeoftheRootLGRprocedure.Carryingoutthisanalysiscan practically be done onlywith theGeneration Panels that existwhile theNBGP isactive.ThisstillleavesoutthosescriptsoutofthescopewhichmaynothaveaGenerationPanelestablishedyet.Hence,carryingoutthisexerciseinentiretyisquiteimpracticable.Thisconundrumcanberesolvedifallthesuchcasesarehandledbythe"StringSimilarityAssessmentPanel"ofICANN.