38
Sanskrit and Natural Language Processing Dr.Srinivasa Varakhedi Center for Advanced Studies and Research in Shabdabodha and NLP RASHTRIYA SANSKRIT VIDYAPEETHA DEEMED UNIVERSITY Tirupati(A.P)

SAnskrit and Natural Language Processing

Embed Size (px)

DESCRIPTION

linguistics

Citation preview

  • Sanskrit and Natural Language Processing

    Dr.Srinivasa VarakhediCenter for Advanced Studies and Research in Shabdabodha and NLPRASHTRIYA SANSKRIT VIDYAPEETHADEEMED UNIVERSITYTirupati(A.P)

  • Dream of a bee..j& Mi i |i*x =ni i {REV&**
  • Present situation of SanskritSanskrit colleges are like 'zoo'!No Govt. support unless we are productiveHumanities and Languages are being neglectedHow far this support will continue ?Great tradition of learning is being lostNo scope for novel research

  • Innovation is the keySanskrit Shastras are competent enough to enter the science worldMove out of Humanities and get merged with scienceAnalogy : Maths, psychology, Logic.We must find practical approach for these Sanskrit Sciences.

  • we have lost 80%Meemamsa - No practical approach !

    Nyaya- No use in modern dialectics ?

    Vyakarana No application ??

    What to do ?

  • Relevance of Sanskrit Shastras in Modern Technologyfortunately these shastras are found relevent in todays technology

    Computing ideas in PaniniText processing principles in MeemamsaFormal languages in Nyaya

    we lack the technology and application areaStory of Babbage!!!

  • Massage of Acharya Shankara Bhagavatpadaavidyayaa mrtyum tiirtvaa..vidyayaa amrtamashnute.. - Ishavasya UapanishadSri Shankara Bhagavatpada comments on this ..avidyaa = karma ; vidyaa = knowledge

  • OpportunityEmerging Info technology has provided a great oportunity to surviveMi ixihJ OJOh E ?Solve a major contemporary problem like MT basing on the shastrasGet new openings for SanskritistsOpen a new avenue for research

  • Know HowUltimate aim :finding appropriate place for sanskrit Shastras

    Method: solutions to contemporory problems adopting modern technology

    Resource needed : Adequate manpower, who act as a bridge between modern scientists and technologists one side and sanskrit scholars on the other side.

  • Change the scenarioTechnology

    Western Theories INDIAN THEORIES

  • Opportunities missedIndustrial revolutionWe missed this with some hasty decisionsIT revolutionIndians are serving in the level of coding ; not in designing level !Knowledge Revolution we should take this advantage

  • Need of the hourwe need to understand how technology worksto understand the contempomporary problems Then we will be able to give solutions in the light of sashtras and show the relevence of Indian theories

  • History and ProgressConference held at Bangalore in Dec 1987 on Knowledge Representation and Sanskritam generated tremendous interestNothing much has been archived, except some efforts and projects here and there in small scale that too in technical institutionsTime running out ! What progress has been made since then?

  • Complexity of the problemDifferent Goal : Two disciplines Technology and Shastras - are developed in different contextParadigm difference : Modern Scholars are accustomed to visual teaching method, Traditional Pandits on the other hand prefer oral traditionLanguage Barrier : Both of them do not understand each others language !

    The tuning in of the dialogue will take time

  • Who would bell the cat ?It needs a long interaction between technologists and Traditional Sanskrit ScholarsTechnical institutions are always ready for such activitiesThere is NO much interest is seen in Sanskrit InstitutionsIt is we Sanskritists should to bell the cat

  • Long process like extraction of ghee from milkNothing miracle happens in the initial stage

    Its a big challenge, one OR two persons are not enough

    We need hundreds of dedicated persons to achieve a small goal

    A person can climb a small hill ; Team can climb the Everest

  • Identifying the problemAnalogy:- Braman in Upanishadswhat is Brahman?we can NOT show it as it is impercievable.we can NOT describe it as it is beyond words. Hence ,we can direct you towards that by way of negating what we know.(+{) - JSxpxvix&

  • Platform For InnovationTo achieve this Rashtiya Sanskrit Vidyapeetha has set up a view Innovative centre for advanced study and research in shabdabodha and language technology

    Center has faculty from shabdabodha (Nyaya Vyakarana Meemamsa), NLP and computer science

    Center has full-fledged computer lab

  • Possible areas Machine TranslationSpeech ProcessingSummary Extraction from huge textsIndo Wordnet as a base for IL-wordnetsDeveloping Tools for IL ResearchersKnowledge Representation schemes

  • Machine TranslationEnglish To Indian LanguagesWord sense disambiguationKaraka & Syntax RelationWord-groupingIdiomatic ExpressionShabdasutraMT among Indian LanguagesBi-language Electronic DictionariesKaraka & Vibhakti Relation

  • Major MT systemsIndiaAngla-Bharati, IIT KanpurShakti, IIIT HyderabadMantra, CDAC Pune SaHiT (Sanskrit Hindi Translator), CSS, JNU Anusaaraka (RSV, HCU, IIIT)

  • Major MT systemsOutside IndiaUNITRANBabelFish AltaVista (Systran)ATR (bimodal, Japan)JANUS (bimodal, US-Germany)SLT (SRI, Cambridge)VERBMOBIL (Germany) DIPLOMAT (Carnegie-Mellon)

    Get a 125 page directory of available MT systems at http://ourworld.compuserve.com/homepages/WJHutchins/Compendium-11.pdf

  • Summary ExtractionMeemamsa Principles applied to extract the summary of a text

    Upakramaadi Tatparya Lingas are used to extract the summary of a text in Indian Institute of Science, Bangalore, in our consultancy.

  • Wordnet / Concept-net based on NN ontologyWordnet is an electronic lexical reference resource system designed on the basis of semantic relations of wordsSynonymy {Graha, nivaasa,.}Hypernymy {Amra, vriksha, vanaspati}Antonnymy {Shreemaan, akinchana}Mecronymy {nAsika, mukha, shariira..}Gradation {Shushka,tara,.tama}

  • Sanskrit CorpusAnnotating the relation in Sanskrit TextsTagging Samasas Identifying the topics of the textsMake available Sanskrit Texts along with Simple translations on web and CD R formStatistical analysis of Sanskrit Texts

  • Knowledge EngineeringRepresentation For Data representation, several databse management systems are available.For representing and retrieving useful information, there are various worked out methodologiesFinally Knowledge Representation needs special treatment where Indian Knowledge systems can be applied

  • Knowledge and its importance in AIAI researchers are interested in building Intelligent systemsWeb technologies looking forward to Semantic webs instead of syntactic webKnowledge is more valuable than data and InformationData simple DoB. Info Age calculated.Knowledge the judgment about suitability for job at hand etc. This requires a lot of inputs from various K- sources.

  • Computational Linguistics and Paninis GrammarThe structure of Paninian Grammar is nothing but a computer program Babbage !It has captured the base of universal principles of all languagesCL requires formal rules for analysis and generation of languageSlowly Chomsky and others are turning towards Panini

  • The System of PaniniPhonetic componentPhonemes pratyaharaRule baseVidhi (operations)Samjnaparibhasha (metarules)adhikara (headings)atidea (extension)niyama (restriction)LexiconDhatupaathaGanapaatha ListsAffixesRule specific items

  • Paninian Model for Sentence AnalysisAction Central themeKarakas Syntactico-semantic rolesVisheshana-VisheshyabhavaConcept of anabhihitein switching to different voiceVivakshaa Intention of speakerForm and meaning

  • Navya Nyaya -> AI ?Classify Nyaya into five parts ..1. Ontology2. Epistemology3. Technical Language4. Semantics5. Art of debate and fallacies

  • OntologyIncludesCategories - Substance, Quality etc.,Relations SamavAya, SvarUpa Universals Types or classes

    Ontology helps to various areas like NLP, K-Repr, K-Engg, especially in Cognitive sciences.

  • EpistemologyDeals with Cognitive processCognitive structure

    It helps to solve the problems of cognitive sciences and K-repr.

  • Technical LanguageNNL is a Restricted Language that has both the features power of mechanism of Artificial Languages and power of of expression of Natural Languages.

    The basic ideas behind this language will be helpful in Knowledge Represenation.

  • SemanticsWay of analysis of semantics shown by Navya Naiyayikas has been crucially found helpful in NLP and Machine Translation

    Eg. Classification of words rUdha, yogaSyntactical analysisPower of definitions KR & NN

  • Semantics in MTLexicographyWord/concepts nets based NN ontologyClassification of padas (words)Rudha word has convention I.e namesYougik word has etymological meaningcook, driver, Yoga-rudha which has etymology as well as conventionCD-driver

  • WSD using different techniquesDefinitions of Karaka relation without any overlap Kartrtvam = kriyAnukUlakritimattvamKarmattvam = para-samaveta-kriyA-janya-phala-AshrayatvamGoing Rama and ForestWho is going where ?Result contact is possible in Rama too..To avoid such overlap, this def. Is useful

  • Refinement of karaka RelationsClassification of Karma Karma Reachable, understandable so on.Analysis of root semanticsLeave He left the place / left from the placeAnalysis of expectancy (AkAnkshA)Rats killed cats

  • To infinity relationI stand up to speakI want o speakHe goes to London to study lawHe wants to study law in LondonTo walk in mornings is good for health

  • Computer as a Toolstory of Greek researchnot only sciences, but humanities subjects are also benefited by the aid of computerswe can use computers to improve our education method to improve the quality in research

  • Power of computersMemory : store any amount of data in discsSpeed processing : access it fast Search Replace / Edit/ AddGet statistical info Create hyperlinks Present it in a better way produce it several times less cost Distribute in easy ways

  • Sansk - Netan online gigantic electronic library of Sanskrit worksmore than 500 works(3,00,00,000 pages of E-content)www.sansknet.ac.inDhathuratnakara is available on web. It can be accessed through web http:/sanskrit.nic.ac.in

  • CD R ProductionPaniniya Udaharanakosha is now available in CD form'koshas' will be made available in CD form. Vachaspathyam, SabdakalpadrumaDhaturatnakara All the forms of all roots will be made available on CD R.Morphological analyzer for Sanskrit

  • Vatmikiramayana on NET- Vatmikiramayana moolam in all Indian scripts -Audio recording -Transalation in five foriegn languages.-Eight Sanskrit commentories-English transalation and commentories-Summary, Glossary-Beautiful picture gallary http://www.rsvpramayana.ac.in

  • Machiene translationEnglish to Sanskrit

    Circular translation

    English Sanskrit dictionaries

    Sanskrit wordnet

  • Sanskrit readers (accessors)Ramayana accessor

    Bhagavathgeeta reader

    Nyaya Classics Reader

    Vyakarana Reader

  • Sanskrit language processing toolsSandhi concator - Ready

    Morphological analyser Hosted on web

    Sandhi spliter (Under progress)

    Samasa tag interpretor - Ready

  • Future ProjectsText to speach for Sanskrit texts

    High quality search engine for Sanskrit E-library

    Hypertext archive for Sanskrit Literature

  • Dream ProjectsPaninian Grammar for English (MT)Ground work is doneA national Symposium conductedValidity checking of Paninian system through computingBasing teaching material is readySanskrit WordnetPrototype project is undertaken by a student

  • Namaste!Special thanks to The authorities of Sri Chandrashekharendra Sarasvati VishvamahavidyalayaKanchipuram