Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and...

Preview:

Citation preview

Sanskrit and Natural Language Processing

Dr.Srinivasa VarakhediCenter for Advanced Studies and Research

in Shabdabodha and NLP

RASHTRIYA SANSKRIT VIDYAPEETHA

DEEMED UNIVERSITYTirupati(A.P)

Dream of a bee…..

®úÉÊjÉ& MÉʨɹªÉÊiÉ ¦ÉʴɹªÉÊiÉ ºÉÖ|ɦÉÉiɨÉÂ*¦Éɺ´ÉÉxÉ =näù¹ªÉÊiÉ ½þʺɹªÉÊiÉ

{ÉRÂóEòVɸÉÒ&**<ilÉÆ Ê´ÉÊSÉxiɪÉÊiÉ EòÉä¶ÉMÉiÉä Êuù®äú¡äò*

½þÉ ½þxiÉ ½þxiÉ xÉʱÉxÉÓ MÉVÉ =VVɽþÉ®ú**

Present situation of Sanskrit Sanskrit colleges are like 'zoo'! No Govt. support unless we are

productive Humanities and Languages are being

neglected How far this support will continue ? Great tradition of learning is being

lost No scope for novel research

Innovation is the key Sanskrit Shastras are competent

enough to enter the science world Move out of Humanities and get

merged with science Analogy : Maths, psychology, Logic. We must find practical approach for

these Sanskrit Sciences.

we have lost 80% Meemamsa - No practical approach !

Nyaya - No use in modern dialectics ?

Vyakarana – No application ??

What to do ?

Relevance of Sanskrit Shastras in Modern Technology fortunately these shastras are found relevent

in today’s technology

Computing ideas in Panini Text processing principles in Meemamsa Formal languages in Nyaya

we lack the technology and application area

Story of Babbage!!!

Massage of Acharya Shankara Bhagavatpada

“avidyayaa mrtyum tiirtvaa..vidyayaa amrtamashnute..”

- Ishavasya Uapanishad

Sri Shankara Bhagavatpada comments on this …..

avidyaa = karma ; vidyaa = knowledge

Opportunity Emerging Info technology has provided

a great oportunity to survive MÉÞàþÒªÉÉiÉ ÊiÉxiÉÞhÉÒ¶ÉÉJÉÉÆ

ʶÉOÉÖ¶ÉÉJÉÉOɽäþhÉ ÊEò¨É ? Solve a major contemporary problem

like MT basing on the shastras Get new openings for Sanskritists Open a new avenue for research

Know How… Ultimate aim :finding appropriate place

for sanskrit Shastras

Method: solutions to contemporory problems adopting modern technology

Resource needed : Adequate manpower, who act as a bridge between modern scientists and technologists one side and sanskrit scholars on the other side.

Change the scenario

Technology

Western Theories INDIAN THEORIES

Opportunities missed Industrial revolution

We missed this with some hasty decisions

IT revolution Indians are serving in the level of

coding ; not in designing level ! Knowledge Revolution

we should take this advantage

Need of the hour we need

to understand how technology works to understand the contempomporary

problems Then

we will be able to give solutions in the light of sashtras and show the relevence of Indian theories

History and Progress Conference held at Bangalore in Dec 1987

on “Knowledge Representation and Sanskritam” generated tremendous interest

Nothing much has been archived, except some efforts and projects here and there in small scale that too in technical institutions

Time running out ! What progress has been made since then?

Complexity of the problem Different Goal : Two disciplines – Technology

and Shastras - are developed in different context

Paradigm difference : Modern Scholars are accustomed to visual teaching method, Traditional Pandits on the other hand prefer oral tradition

Language Barrier : Both of them do not understand each other’s language !

The tuning in of the dialogue will take time

Who would bell the cat ? It needs a long interaction between

technologists and Traditional Sanskrit Scholars

Technical institutions are always ready for such activities

There is NO much interest is seen in Sanskrit Institutions

It is we Sanskritists should to bell the cat

Long process like extraction of ghee from milk Nothing miracle happens in the initial

stage

It’s a big challenge, one OR two persons are not enough

We need hundreds of dedicated persons to achieve a small goal

A person can climb a small hill ; Team can climb the Everest

Identifying the “problem” Analogy:- Braman in Upanishads

what is Brahman? we can NOT show it as it is impercievable. we can NOT describe it as it is beyond words.

Hence , we can direct you towards that by

way of negating what we know. (+{ÉÉä½þ) -

¶ÉÉJÉÉSÉxpù¨ÉºÉɯûxvÉiÉÒxªÉɪÉ&

Possible areas Machine Translation Speech Processing Summary Extraction from huge texts Indo Wordnet as a base for IL-wordnets Developing Tools for IL Researchers Knowledge Representation schemes

Machine Translation English To Indian Languages

Word sense disambiguation Karaka & Syntax Relation Word-grouping Idiomatic Expression Shabdasutra

MT among Indian Languages Bi-language Electronic Dictionaries Karaka & Vibhakti Relation

Major MT systems

India Angla-Bharati, IIT Kanpur Shakti, IIIT Hyderabad Mantra, CDAC Pune SaHiT (Sanskrit Hindi Translator),

CSS, JNU Anusaaraka (RSV, HCU, IIIT)

Major MT systems Outside India

UNITRAN BabelFish AltaVista (Systran) ATR (bimodal, Japan) JANUS (bimodal, US-Germany) SLT (SRI, Cambridge) VERBMOBIL (Germany) DIPLOMAT (Carnegie-Mellon)

Get a 125 page directory of available MT systems at http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-11.pdf

Summary Extraction Meemamsa Principles applied to

extract the summary of a text

Upakramaadi Tatparya Lingas are used to extract the summary of a text in Indian Institute of Science, Bangalore, in our consultancy.

Wordnet / Concept-net based on NN ontology Wordnet is an electronic lexical

reference resource system designed on the basis of semantic relations of words Synonymy {Graha, nivaasa,….} Hypernymy {Amra, vriksha, vanaspati…} Antonnymy {Shreemaan, akinchana} Mecronymy {nAsika, mukha, shariira..} Gradation {Shushka,…tara,….tama}

Knowledge Engineering Representation

For Data representation, several databse management systems are available.

For representing and retrieving useful information, there are various worked out methodologies

Finally Knowledge Representation needs special treatment where Indian Knowledge systems can be applied

Knowledge and its importance in AI AI researchers are interested in building

Intelligent systems Web technologies looking forward to

Semantic webs instead of syntactic web Knowledge is more valuable than data

and Information Data – simple DoB. Info – Age calculated.

Knowledge – the judgment about suitability for job at hand etc. This requires a lot of inputs from various K- sources.

Computational Linguistics and Panini’s Grammar The structure of Paninian Grammar is

nothing but a computer program – Babbage !

It has captured the base of universal principles of all languages

CL requires formal rules for analysis and generation of language

Slowly Chomsky and others are turning towards Panini…

The System of Panini Phonetic component

Phonemes pratyahara

Rule base Vidhi (operations) Samjna paribhasha (metarules) adhikara (headings) atidea (extension) niyama (restriction)

Lexicon Dhatupaatha Ganapaatha

Lists Affixes Rule specific items

Paninian Model for Sentence Analysis

Action – Central theme Karakas – Syntactico-semantic roles Visheshana-Visheshyabhava Concept of anabhihite…in switching

to different voice Vivakshaa – Intention of speaker Form and meaning

Navya Nyaya -> AI ?

Classify Nyaya into five parts …..1. Ontology2. Epistemology3. Technical Language4. Semantics5. Art of debate and fallacies

OntologyIncludes… Categories - Substance, Quality etc., Relations – SamavAya, SvarUpa … Universals – Types or classes…

Ontology helps to various areas like NLP, K-Repr, K-Engg, especially in Cognitive sciences.

Epistemology

Deals with … Cognitive process Cognitive structure

It helps to solve the problems of cognitive sciences and K-repr.

Technical Language NNL is a Restricted Language that

has both the features – power of mechanism of Artificial Languages and power of of expression of Natural Languages.

The basic ideas behind this language will be helpful in Knowledge Represenation.

Semantics Way of analysis of semantics shown by

Navya Naiyayikas has been crucially found helpful in NLP and Machine Translation

Eg. Classification of words – rUdha, yogaSyntactical analysisPower of definitions

KR & NN

Semantics in MT Lexicography

Word/concepts nets based NN ontology

Classification of pada’s (words) Rudha – word has convention I.e names… Yougik – word has etymological

meaning…cook, driver, Yoga-rudha – which has etymology as

well as convention…CD-driver

WSD – using different techniques Definitions of Karaka relation

without any overlap Kartrtvam = kriyAnukUlakritimattvam Karmattvam = para-samaveta-kriyA-

janya-phala-Ashrayatvam Going – Rama and Forest Who is going where ? Result –contact is possible in Rama too.. To avoid such overlap, this def. Is useful

Refinement of karaka Relations Classification of Karma

Karma – Reachable, understandable so on.

Analysis of root semantics Leave – He left the place / left from

the place Analysis of expectancy (AkAnkshA)

Rats killed cats

To infinity relation I stand up to speak I want o speak He goes to London to study law He wants to study law in London To walk in mornings is good for

health

Special thanks to The authorities of

Sri Chandrashekharendra Sarasvati Vishvamahavidyalaya

Kanchipuram

Namaste!

Recommended