38
Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA DEEMED UNIVERSITY Tirupati(A.P)

Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Embed Size (px)

Citation preview

Page 1: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Sanskrit and Natural Language Processing

Dr.Srinivasa VarakhediCenter for Advanced Studies and Research

in Shabdabodha and NLP

RASHTRIYA SANSKRIT VIDYAPEETHA

DEEMED UNIVERSITYTirupati(A.P)

Page 2: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Dream of a bee…..

®úÉÊjÉ& MÉʨɹªÉÊiÉ ¦ÉʴɹªÉÊiÉ ºÉÖ|ɦÉÉiɨÉÂ*¦Éɺ´ÉÉxÉ =näù¹ªÉÊiÉ ½þʺɹªÉÊiÉ

{ÉRÂóEòVɸÉÒ&**<ilÉÆ Ê´ÉÊSÉxiɪÉÊiÉ EòÉä¶ÉMÉiÉä Êuù®äú¡äò*

½þÉ ½þxiÉ ½þxiÉ xÉʱÉxÉÓ MÉVÉ =VVɽþÉ®ú**

Page 3: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Present situation of Sanskrit Sanskrit colleges are like 'zoo'! No Govt. support unless we are

productive Humanities and Languages are being

neglected How far this support will continue ? Great tradition of learning is being

lost No scope for novel research

Page 4: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Innovation is the key Sanskrit Shastras are competent

enough to enter the science world Move out of Humanities and get

merged with science Analogy : Maths, psychology, Logic. We must find practical approach for

these Sanskrit Sciences.

Page 5: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

we have lost 80% Meemamsa - No practical approach !

Nyaya - No use in modern dialectics ?

Vyakarana – No application ??

What to do ?

Page 6: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Relevance of Sanskrit Shastras in Modern Technology fortunately these shastras are found relevent

in today’s technology

Computing ideas in Panini Text processing principles in Meemamsa Formal languages in Nyaya

we lack the technology and application area

Story of Babbage!!!

Page 7: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Massage of Acharya Shankara Bhagavatpada

“avidyayaa mrtyum tiirtvaa..vidyayaa amrtamashnute..”

- Ishavasya Uapanishad

Sri Shankara Bhagavatpada comments on this …..

avidyaa = karma ; vidyaa = knowledge

Page 8: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Opportunity Emerging Info technology has provided

a great oportunity to survive MÉÞàþÒªÉÉiÉ ÊiÉxiÉÞhÉÒ¶ÉÉJÉÉÆ

ʶÉOÉÖ¶ÉÉJÉÉOɽäþhÉ ÊEò¨É ? Solve a major contemporary problem

like MT basing on the shastras Get new openings for Sanskritists Open a new avenue for research

Page 9: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Know How… Ultimate aim :finding appropriate place

for sanskrit Shastras

Method: solutions to contemporory problems adopting modern technology

Resource needed : Adequate manpower, who act as a bridge between modern scientists and technologists one side and sanskrit scholars on the other side.

Page 10: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Change the scenario

Technology

Western Theories INDIAN THEORIES

Page 11: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Opportunities missed Industrial revolution

We missed this with some hasty decisions

IT revolution Indians are serving in the level of

coding ; not in designing level ! Knowledge Revolution

we should take this advantage

Page 12: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Need of the hour we need

to understand how technology works to understand the contempomporary

problems Then

we will be able to give solutions in the light of sashtras and show the relevence of Indian theories

Page 13: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

History and Progress Conference held at Bangalore in Dec 1987

on “Knowledge Representation and Sanskritam” generated tremendous interest

Nothing much has been archived, except some efforts and projects here and there in small scale that too in technical institutions

Time running out ! What progress has been made since then?

Page 14: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Complexity of the problem Different Goal : Two disciplines – Technology

and Shastras - are developed in different context

Paradigm difference : Modern Scholars are accustomed to visual teaching method, Traditional Pandits on the other hand prefer oral tradition

Language Barrier : Both of them do not understand each other’s language !

The tuning in of the dialogue will take time

Page 15: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Who would bell the cat ? It needs a long interaction between

technologists and Traditional Sanskrit Scholars

Technical institutions are always ready for such activities

There is NO much interest is seen in Sanskrit Institutions

It is we Sanskritists should to bell the cat

Page 16: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Long process like extraction of ghee from milk Nothing miracle happens in the initial

stage

It’s a big challenge, one OR two persons are not enough

We need hundreds of dedicated persons to achieve a small goal

A person can climb a small hill ; Team can climb the Everest

Page 17: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Identifying the “problem” Analogy:- Braman in Upanishads

what is Brahman? we can NOT show it as it is impercievable. we can NOT describe it as it is beyond words.

Hence , we can direct you towards that by

way of negating what we know. (+{ÉÉä½þ) -

¶ÉÉJÉÉSÉxpù¨ÉºÉɯûxvÉiÉÒxªÉɪÉ&

Page 18: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Possible areas Machine Translation Speech Processing Summary Extraction from huge texts Indo Wordnet as a base for IL-wordnets Developing Tools for IL Researchers Knowledge Representation schemes

Page 19: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Machine Translation English To Indian Languages

Word sense disambiguation Karaka & Syntax Relation Word-grouping Idiomatic Expression Shabdasutra

MT among Indian Languages Bi-language Electronic Dictionaries Karaka & Vibhakti Relation

Page 20: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Major MT systems

India Angla-Bharati, IIT Kanpur Shakti, IIIT Hyderabad Mantra, CDAC Pune SaHiT (Sanskrit Hindi Translator),

CSS, JNU Anusaaraka (RSV, HCU, IIIT)

Page 21: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Major MT systems Outside India

UNITRAN BabelFish AltaVista (Systran) ATR (bimodal, Japan) JANUS (bimodal, US-Germany) SLT (SRI, Cambridge) VERBMOBIL (Germany) DIPLOMAT (Carnegie-Mellon)

Get a 125 page directory of available MT systems at http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-11.pdf

Page 22: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Summary Extraction Meemamsa Principles applied to

extract the summary of a text

Upakramaadi Tatparya Lingas are used to extract the summary of a text in Indian Institute of Science, Bangalore, in our consultancy.

Page 23: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Wordnet / Concept-net based on NN ontology Wordnet is an electronic lexical

reference resource system designed on the basis of semantic relations of words Synonymy {Graha, nivaasa,….} Hypernymy {Amra, vriksha, vanaspati…} Antonnymy {Shreemaan, akinchana} Mecronymy {nAsika, mukha, shariira..} Gradation {Shushka,…tara,….tama}

Page 24: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Knowledge Engineering Representation

For Data representation, several databse management systems are available.

For representing and retrieving useful information, there are various worked out methodologies

Finally Knowledge Representation needs special treatment where Indian Knowledge systems can be applied

Page 25: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Knowledge and its importance in AI AI researchers are interested in building

Intelligent systems Web technologies looking forward to

Semantic webs instead of syntactic web Knowledge is more valuable than data

and Information Data – simple DoB. Info – Age calculated.

Knowledge – the judgment about suitability for job at hand etc. This requires a lot of inputs from various K- sources.

Page 26: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Computational Linguistics and Panini’s Grammar The structure of Paninian Grammar is

nothing but a computer program – Babbage !

It has captured the base of universal principles of all languages

CL requires formal rules for analysis and generation of language

Slowly Chomsky and others are turning towards Panini…

Page 27: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

The System of Panini Phonetic component

Phonemes pratyahara

Rule base Vidhi (operations) Samjna paribhasha (metarules) adhikara (headings) atidea (extension) niyama (restriction)

Lexicon Dhatupaatha Ganapaatha

Lists Affixes Rule specific items

Page 28: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Paninian Model for Sentence Analysis

Action – Central theme Karakas – Syntactico-semantic roles Visheshana-Visheshyabhava Concept of anabhihite…in switching

to different voice Vivakshaa – Intention of speaker Form and meaning

Page 29: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Navya Nyaya -> AI ?

Classify Nyaya into five parts …..1. Ontology2. Epistemology3. Technical Language4. Semantics5. Art of debate and fallacies

Page 30: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

OntologyIncludes… Categories - Substance, Quality etc., Relations – SamavAya, SvarUpa … Universals – Types or classes…

Ontology helps to various areas like NLP, K-Repr, K-Engg, especially in Cognitive sciences.

Page 31: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Epistemology

Deals with … Cognitive process Cognitive structure

It helps to solve the problems of cognitive sciences and K-repr.

Page 32: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Technical Language NNL is a Restricted Language that

has both the features – power of mechanism of Artificial Languages and power of of expression of Natural Languages.

The basic ideas behind this language will be helpful in Knowledge Represenation.

Page 33: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Semantics Way of analysis of semantics shown by

Navya Naiyayikas has been crucially found helpful in NLP and Machine Translation

Eg. Classification of words – rUdha, yogaSyntactical analysisPower of definitions

KR & NN

Page 34: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Semantics in MT Lexicography

Word/concepts nets based NN ontology

Classification of pada’s (words) Rudha – word has convention I.e names… Yougik – word has etymological

meaning…cook, driver, Yoga-rudha – which has etymology as

well as convention…CD-driver

Page 35: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

WSD – using different techniques Definitions of Karaka relation

without any overlap Kartrtvam = kriyAnukUlakritimattvam Karmattvam = para-samaveta-kriyA-

janya-phala-Ashrayatvam Going – Rama and Forest Who is going where ? Result –contact is possible in Rama too.. To avoid such overlap, this def. Is useful

Page 36: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Refinement of karaka Relations Classification of Karma

Karma – Reachable, understandable so on.

Analysis of root semantics Leave – He left the place / left from

the place Analysis of expectancy (AkAnkshA)

Rats killed cats

Page 37: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

To infinity relation I stand up to speak I want o speak He goes to London to study law He wants to study law in London To walk in mornings is good for

health

Page 38: Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA

Special thanks to The authorities of

Sri Chandrashekharendra Sarasvati Vishvamahavidyalaya

Kanchipuram

Namaste!