36
Swaran Lata, Director and HoD [email protected] Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology , Govt. of India

Swaran Lata , Director and HoD slata@mit

  • Upload
    rafiki

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Challenges of development of Language Technology and services in multicultural and multilingual Indian Scenario. Swaran Lata , Director and HoD [email protected] Technology Development for Indian Languages Programme (TDIL) Dept of Information Technology , Govt. of I ndia. - PowerPoint PPT Presentation

Citation preview

Page 1: Swaran Lata , Director and  HoD slata@mit

Swaran Lata, Director and HoD

[email protected]

Technology Development for Indian Languages Programme (TDIL)

Dept of Information Technology , Govt. of India

Page 2: Swaran Lata , Director and  HoD slata@mit

Organization of Presentation India – cultural diversity Linguistic Diversity in India Present Knowledge Society and Indian Scenario ICT scenario in India Internet penetration – Haves & Have Not-s Mind-set - Still an inhibition Bridging the gap – Service delivery –reaching the citizens doorsteps Localization – Key enabler Challenges and Issues TDIL’s efforts National Roll Out Plan – A big Step forward Localization of Applications Putting Standards in place Collaboration and Hand-holding

Page 3: Swaran Lata , Director and  HoD slata@mit

India – A civilization of more than 5000 years old

Vast ancient knowledge baseDiverse culture and heritage –probably one of

the most spectacular in the worldOne of largest economy in the present worldRapid strides in Information and

communications technologyYet .. Widening divide in terms of knowledge

amongst various strata of citizens

Page 4: Swaran Lata , Director and  HoD slata@mit

Linguistic Diversity in IndiaAccording to Census 2001 India has 122 major

languages and 2371 dialects.Out of 122 languages 22 are constitutionally

recognized languages.Linguistic Diversity is very rich and wide in IndiaOne Language –many scriptMany Language –one scriptCulturally different depending on region though

using same script for different languages.Even wide difference for same language across

different country

Page 5: Swaran Lata , Director and  HoD slata@mit

Though same script – Devanagari – Content wise variation for Hindi and Marathi – Depicting cultural and linguistic difference

Marathi Hindi

Page 6: Swaran Lata , Director and  HoD slata@mit

Present ICT scenario in India• Despite a reputation as an emerging technology powerhouse, India’s scores on the

2009 Connectivity Scorecard are poor in the vital consumer and business segments.

• These poor scores should not be surprising, since many of the individual metrics that we utilise are effectively measuring “penetration rates.”

• This means that India is judged as a whole, and not by the pockets of ICT excellence that it undoubtedly possesses.

• India scores especially low on broadband and Internet penetration rates.

• Broadband penetration in India is below 2 percent of households compared to 20 percent of households or more in Turkey, Chile, and Mexico .

• On the consumer usage front, India is not a strong performer in terms of Internet usage, with below 10 percent of the population regularly using the Internet. The country is hampered by a relatively low literacy rate

Page 7: Swaran Lata , Director and  HoD slata@mit

India still in low broad-band penetration region

http://www.itu.int

Global Broadband divide

Page 8: Swaran Lata , Director and  HoD slata@mit

Low Rural Tele-density . Compared to urban one

Page 9: Swaran Lata , Director and  HoD slata@mit

Mind-set : Still favouring English as medium of excellence English and Hindi serves and link languages English Learning viewed as a passport to better economic and social prospects. - Even

people from low income strata now considers this. Due to surge in the ICT and ICT enabled services in recent time , English now has

become 2nd highest medium of instruction from school level Study by National University for Education Planning and Administration (NUEPA): -- In

Sarba Siksha Abhiyan no of students opting for English grew by 150% between 2003-08 while the corresponding fig of Hindi is only 32%

Example : Uttar-Pradesh , West Bengal and .. Now using English medium of instruction for schools and colleges

Primary school students in Eng medium school (in Lakhs)

2005-06 2007-08 growth

Haryana 0.19 1.56 721

WB 0.29 2.31 704

Punjab 0.93 2.78 197

UP 0.12 0.37 193

India 52.00 153.70 196

Page 10: Swaran Lata , Director and  HoD slata@mit

Result : Though , Hindi (ranked 3rd) and Bengali (ranked 8th) are

among the top 10 language spoken across the world- but, no Indian language is in the top 10 languages used in the Internet.

Minuscule Internet usage in Indian Languages

Confinement of Knowledge

Low usage of knowledge sources and applications

Page 11: Swaran Lata , Director and  HoD slata@mit

Language constitutes the foundation of communication and is fundamental to cultural and historical heritage.

Increasingly, knowledge and information are key determinants of wealth creation, social transformation and human development.

Language is the primary vector for communicating knowledge and traditions, thus the opportunity to use one’s language on global information networks such as the Internet will determine the extent to which one can participate in the emerging knowledge society.

Thousands of languages worldwide are absent from Internet content and there are no tools for creating or translating information into these excluded tongues.

Huge sections of the world’s population are thus prevented from enjoying the benefits of technological advances and obtaining information essential to their wellbeing and development.

UNESCO’s VISION for Multilingualism in Cyberspace

Page 12: Swaran Lata , Director and  HoD slata@mit

An uneven growth

Indian Software Export Industry growing at a very fast pace in their global presence

However , Root is not expanding its base within the country

Fallout : Domestic requirement is not being looked into within the country using Indian Languages

Result : Non-availability of Information and Knowledge to the vast section of the citizen

Expanding Software Export

Low penetration in Indian Market

Page 13: Swaran Lata , Director and  HoD slata@mit

Requirements :

Reaching out to the door steps of citizens offering better services for wider dissemination of knowledge .

Localization of Software Solutions , contents and services as per local requirements .

Page 14: Swaran Lata , Director and  HoD slata@mit

Common Services Centre –Its objectives

CSC is a strategic cornerstone of the National e-Governance Plan (NeGP) – Front end service Interface for major G2C services

CSC is one of the three infrastructure pillars of e-governance which the government is committed to building, to ensure “anytime anywhere” web enabled delivery of government services.

To provide e-governance services.100,000 CSCs for 600,000 village clustersTo cater to service needs of major rural areasBeing implemented in PPP Model

Page 15: Swaran Lata , Director and  HoD slata@mit

Local Language Interface – Not a desirable but An essential Component

The success of CSC hinges upon effective delivery of the G2C applications to rural masses

Since most of the citizens communicate in their local languages – Local Language Interface to G2C solutions at CSC is essential

Hosting of content in local languages helps citizens to interact in a better way in today’s knowledge society

Thus , Local Language Interface is “Not a desirable but An essential Component”

Page 16: Swaran Lata , Director and  HoD slata@mit

LandRecords

RoadTransport Police

LandRegn

TreasuriesComrlTaxes

Agriculture

Gram Pts

Munici palities

EmploymentExchanges

CivilSupplies

Education

IncomeTax

PassportVisa

MCA21

Insurance Banking

NationalIDCentral

ExcisePensions

GIS e-Posts

Common ServiceCentres

Gatewaye-Procure

e-Office

eBiz

EDIe-Courts

IndiaPortal

CorePolicies

NeGP – Mission Mode Projects

Initiatives already taken to enable G2C applications such as Land Records , Civil Supplies and Municipal applications with Indian Language Interface

Page 17: Swaran Lata , Director and  HoD slata@mit

Service Delivery Model of CSC

Requires Language Interface

Page 18: Swaran Lata , Director and  HoD slata@mit

Localization Requirements for Service Delivery Applications

• To ensure seamless access of services, language Component /Localization and interface requires at:

• Storage level – Server end• Date Exchange – Traffic (Language tags needs to be properly

embedded• Display & Rendering • Language Interface for differently -abled citizens for more

inclusive societal benefits

Page 19: Swaran Lata , Director and  HoD slata@mit

Web based applications

Dynamic & Static websites with search &

Cross Lingual access

Operating systems

ToolsOffice Suites

Handheld devices

Mobile Devices

Stand alone applications

Globalization of IT

Page 20: Swaran Lata , Director and  HoD slata@mit

Localization

Internationalization

Process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design.

Taking a product and making it linguistically and culturally appropriate to the target locale (country/ region and language) where it will be used and sold"

I18N L10N

Globalization & Localization

Page 21: Swaran Lata , Director and  HoD slata@mit

Locale Data Repository

Linguistic Resources

Standards

Certification

Localization Tools

TrainingAwareness

Technologies

Key Enablers

Localization

Page 22: Swaran Lata , Director and  HoD slata@mit

The Tree of Localization Complexities

• Presentation of dates, times, numbers, lists, and other values.

• Collation and sorting• Alternate calendars, which may include

holidays, work rules, weekday/weekend.• Currency• Tax or regulatory regime

• Machine Translation• Optical Character

Recognition• Speech Technologies• Cross Lingual Information

Retrieval

• Machine Translation• Optical Character

Recognition• Speech Technologies• Cross Lingual Information

Retrieval

• Project Management• Translation Memory• Translation Tools• Natural language for text processing:

parsing, spell checking, and grammar checking etc

• Automatic Testing Tools

• Encoding Standards• Multimodal input device

standards• Fonts & Rendering Engines• Transliteration & Translation

• Guidelines• Best Practices• Case Studies• Consultancy• Showcasing of Tools

& Technologies

• Parallel Corpora• Speech Corpora• Lexical resources• Ontologies• Dictionaries • Thesaurus• Reference Terminologies

• Certified Localization professionals

• PG Specialization in Localization

• PhD Programmes

• Minimizing Time lag• Benchmarking w.r.t.

English version• Political sensitivity • Pricing issues

• Testing methodologies • Metrics for Linguistic Testing• Certification by Government for

linguistic compliance

Complexities

Page 23: Swaran Lata , Director and  HoD slata@mit

Globalization and Localization Issues Language IssuesLanguage issues are the result of differences in how languages around the world differ in display, alphabets, grammar, and syntactical rules.

• Bidirectional scripts• Capitalization, Uppercasing and Lowercasing• Code Pages• Complex Script Awareness• Fonts• Input Method Editors• Keyboards• Line and Word Breaks• Mirroring Awareness• Unicode

Page 24: Swaran Lata , Director and  HoD slata@mit

Formatting IssuesFrom the user's perspective, formatting issues are the primary source of discrepancies when working with applications originally written for another language or culture/locale. Developers should use the National Language Support (NLS) APIs in Windows or the System.

Globalization Namespace to handle most of these issues automatically. Globalization Namespace.

• Addresses• Currency• Dates• Numerals• Paper Sizes• Telephone Numbers• Time• Units of Measure

Page 25: Swaran Lata , Director and  HoD slata@mit

Localization- Tool for increasing Financial Sustainability

• Training of local youth in Localized Content Creation

• Working with Self Help Groups to up-lift their business

• Identify Dynamically changing Local Content which helps in their local professions

• E-Tutor

• Entertainment during non-official hours

Page 26: Swaran Lata , Director and  HoD slata@mit

TDIL’s EffortsMore than a decade’s sustained and major national initiativeLeading to development and consolidation of various

language Tools , resources and components Continuous and untiring representation in various

International and National Standards bodies- ISO ,UNICODE, W3C, IETF , ELRA and BIS

Represented and included 22Indian Languages in UNICODEFirst time in India to launch consortium mode projects in

the technology intensive areas of Machine Translation , Cross-lingual Information Access, Text to Speech etc - to develop state of the art technologies in Indian languages

Promotes futuristic research in Language Technology

Page 27: Swaran Lata , Director and  HoD slata@mit

National Roll-Out Plan –A Big Step ForwardCDs containing Software Tools and Fonts for all 22

Officially Recognized Languages released in public domain for free use

Contains Fonts, Localized Open Office, Keyboard drivers, E-mail clients and Firefox browsers in Indian languages

Freely downloadable from Indian Language Data centre – http://www.ildc.gov.in

Already crossed ~ 41 lakhs downloads and 7.0 lakhs shipments

NASSCOM may take active role towards proliferating the benefits of these language CDs

These free CDs would also benefit NGOs and CSC operators for developing and promoting local language contents.

Page 28: Swaran Lata , Director and  HoD slata@mit

CDs containing Indian Language Software Tools

Page 29: Swaran Lata , Director and  HoD slata@mit

Putting Standards in place

UNICODEUNICODE – Default Text Encoding Standard. Compatible with ISO 10646Seamless data storage and search if data is stored in UNICODEAll 22 Officially recognized Indian Languages including Vedic

Sanskrit represented in UNICODE Declared as Text Encoding Standard for All E-Governance

Applications

Page 30: Swaran Lata , Director and  HoD slata@mit

Extracting Knowledge from our vast ancient knowledge base

UNICODE Encoding for Vedic Sanskrit , Grantha scripts : Key towards computerization of knowledge base

Page 31: Swaran Lata , Director and  HoD slata@mit

Capturing Region Specific Requirements : Common Locale Data Repository (CLDR)

• The Unicode CLDR provides key building blocks for software to support the world's languages.

• CLDR is by far the largest and most extensive standard repository of locale data.

• This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; etc.

• Locale Data for Indian Languages are in the process of modification

• Six Languages CLDR Hindi , Nepali, Bengali , Assamese, Malayalam and Gujarati are finalized.

• Other languages in process

Page 32: Swaran Lata , Director and  HoD slata@mit

All Region specific requirements have been captured and put in Hindi Locale repository

Example of CLDR: Hindi

Page 33: Swaran Lata , Director and  HoD slata@mit

Putting Standards in place… Contd. W3C

W3CWorld-Wide –web Consortium (W3C) develops web

standards for interoperable web solutions across platform, devices and access methodology

Ensures interoperability across major browsers, IE, Firefox, Opera etc.

Work already started to represent all Indian Language representation in W3C standards.

Desirable – Pro-active Industry & Industry Body like NASSCOM participation

Page 34: Swaran Lata , Director and  HoD slata@mit

Keyboard LayoutsOpen Type Fonts.. Sakal Bharti FontsLocale DataLanguage Tag. (For Language Negotiation in Internet)Domain Names in Indian LanguagesIT Terminology

… and Standards for major Linguistic Resources and Tools

Putting Standards in place…Contd.

Page 35: Swaran Lata , Director and  HoD slata@mit

Collaboration and Hand HoldingCollaborative efforts required for wider proliferation and

sustained initiatives.

Govt., Industry Bodies and Academia needs to join hand to address the challenges of Local Language Computing and to promote and bring services closer to doorsteps to millions of citizens in their own languages

Page 36: Swaran Lata , Director and  HoD slata@mit

धन्यवा�दThank You

Swaran Lata, Director and HoD

[email protected]

Contact:011-24364365