59
CS 671 ICT For Development 19 th Sep 2008 Vishal Vachhani CFILT and DIL CFILT and DIL, IIT Bombay

CS 671 ICT For Development 19 Sep 2008 - CSE, IIT …cs671/lectureslides/Agro...at KVK, Baramati `Languages supported: Marathi, Hindi, English Vishal Vachhani 3 Why Need Multilingual

  • Upload
    ledang

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

  • CS 671 ICT For Development19th Sep 2008

    Vishal VachhaniCFILT and DILCFILT and DIL,

    IIT Bombay

  • Agro Explorerg p

    A Meaning Based Multilingual Search EngineSearch Engine

    Vishal Vachhani 2

  • f fWeb-site for Indian farmers Farmers can submit their problems related to their cropsQueries are answered by Agricultural Experts at KVK, BaramatiLanguages supported: Marathi, Hindi, English

    Vishal Vachhani 3

  • Why Need Multilingual SearchWhy Need Multilingual Search

    Vast Amo nt of Information a ailable on theVast Amount of Information available on the Web

    l 0% f h f l hAlmost 70% of the Information is in English

    The Indian rural populace is not English-Literate

    A Big Language BarrierA Big Language BarrierInformation has to be made available to them

    in their local languagesin their local languages.

    Vishal Vachhani 4

  • Why Need Meaning Based SearchWhy Need Meaning Based Search

    Most of the current Search Engines areMost of the current Search Engines are Keyword Based.

    Th d t id th ti f thThey do not consider the semantics of the query

    h l l b fThe result set contains a large number of extraneous documents.

    Search based on the Meaning of the query will help narrow down on the desired information q icklinformation quickly.

    Vishal Vachhani 5

  • Query inSystem

    Query in Hindi

    English Documentsearch

    Marathi Document

    search

    English Document

    Result in Hindi

    Vishal Vachhani 6

  • Same Keywords Different SemanticsDifferent Semantics

    Moneylenders Exploit Farmers

    Farmers Exploit Moneylenders

    F d 1 R lt F d 0 R ltFound 1 Result Found 0 Result

    Vishal Vachhani 7

  • Provides bothMeaning Based SearchMeaning Based SearchCross-Lingual Information Access

    Vishal Vachhani 8

  • System Architecture

    Vishal Vachhani 9

  • Vishal Vachhani 10

  • Vishal Vachhani 11

  • Vishal Vachhani 12

  • Vishal Vachhani 13

  • Vishal Vachhani 14

  • Conclusion

    P id t i d d t f tProvides two independent features Multi-LingualityMeaning Based SearchMeaning Based Search.

    Because of UNL both multi-lingual and meaning based properties can be incorporatedmeaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner.

    Vishal Vachhani 15

  • UNLUNLUNL UNL Universal Networking LanguageUniversal Networking LanguageUniversal Networking LanguageUniversal Networking Language

    Vishal Vachhani 16

  • Hindi

    Englis Frenc

    UNL

    English

    French

    Tamil

    Marathi

    Vishal Vachhani 17

  • Direct translationDirect translation - translation will be done directly

    N*(N 1) translator are needed for N- N*(N-1) translator are needed for Nlanguages translation.

    I di LIntermediate Language - intermediate language will be usedfor language translation

    - Only 2*N translators are required.

    Vishal Vachhani 18

  • UNL is an acronym for Universal NetworkingUNL is an acronym for Universal Networking Language.UNL is a computer language that enables U s a co pute a guage t at e ab escomputers to process information and knowledge across the language barriers.UNL is a language for representing information and knowledge provided by natural languages U lik l l UNL iUnlike natural languages, UNL expressions are unambiguous.

    Vishal Vachhani 19

  • Although the UNL is a language forAlthough the UNL is a language for computers, it has all the components of a natural languagenatural language.It is composed of Universal Words (UWs), Relations AttributesRelations, Attributes.Knowledge :semantic graph Nodes concepts Nodes concepts Arcs relation between concepts

    Vishal Vachhani 20

  • A UW represents simple or compound conceptsA UW represents simple or compound concepts. There are two classes of UWs: unit concepts p compound structures of binary relations grouped

    together ( indicated with Compound UW-Ids)A UW is made up of a character string (an English-language word) followed by a list of constraints.

    ::=[] example example

    state(icl>express)state(icl>country)

    Vishal Vachhani 21

  • A relation label is represented as strings of 3 A relation label is represented as strings of 3 characters or less. The relations between UWs are binary.y

    rel (UW1, UW2) They have different labels according to the different

    l h lroles they play. At present, there are 46 relations in UNL For example agt (agent) ins (instrument) pur For example, agt (agent), ins (instrument), pur

    (purpose), etc.

    Vishal Vachhani 22

  • Attribute labels express additionalAttribute labels express additional information about the Universal Words that appear in a sentence.pp

    They show what is said from the speakers point of ie ho the speaker ie s hat is said (timeview; how the speaker views what is said. (time,

    reference, emphasis, attitude, etc)

    @entry, @present, @progressive, @topic, etc.

    Vishal Vachhani 23

  • Example:Ram eats rice.

    {unl}agt(eat.@entry.@present, Ram)obj(eat.@entry.@present, rice(icl>eatable))

    {/unl}

    Vishal Vachhani 24

  • eat

    plc agt

    Ram rice

    Vishal Vachhani 25

  • E lExample:The boy who works here went to school.

    {unl}{unl}agt(go(icl>move).@entry.@past, :01)plt(go(icl>occur).@entry.@past,school(icl>institutioplt(go(icl>occur).@entry.@past,school(icl>institution))agt:01(work(icl>do), boy(icl>person.@entry))plc:01(work(icl>do),here)

    {/unl}

    Vishal Vachhani 26

  • go

    agt plt

    work school:01

    plc agt

    here boy

    Vishal Vachhani 27

  • EnconvertorS EnconvertorSource language

    IntermediateLanguageLanguage

    Deconvertortarget language

    Vishal Vachhani 28

  • Its a Language Independent GeneratorIt s a Language Independent GeneratorIt can deconvert UNL expressions into a variety of native languages, using a number of linguistic data at e a guages, us g a u be o gu st c datasuch as Word Dictionary, Grammatical Rules of each language.The DeConverter transforms the sentence represented by a UNL expression into Natural lang age sentencelanguage sentence.

    Vishal Vachhani 29

  • Vishal Vachhani 30

  • DictionarySyntax

    Planning Rules

    Case Marking

    RulesMorphology

    Rules Rules

    Case h l SyntaxUNLDoc

    HindiDocUNL

    Parser

    Case MarkingModule

    Morphology Module

    SyntaxPlanning Module

    Doc iDoc

    Language dependent Module

    Vishal Vachhani 31

    Language Independent Module

  • UNL parser module will do following tasks

    Check input format of UNL documentSeparate attributes form UWsS ib f di i iSeparate attributes form dictionary entries

    Replace UWs with Hindi root words

  • C t f h t tiCategory of morpho-syntactic properties which distinguish the ario s relations that a no n phrasevarious relations that a noun phrase

    may bear to a governing head. , ,, , ,etc.

    A rule base based on : UNL attributes lexical attributes from dictionary

    Vishal Vachhani 33

  • Case marking is implemented using rulesCase marking is implemented using rules.We analyze all UNL as well as dictionary attributes and decide next and previous caseattributes and decide next and previous case marker.Also we use relation with parent to extractAlso we use relation with parent to extract the right case mark.

    Vishal Vachhani 34

  • agt:null:null:null::@past#V:VINT:N:nullagt:null:null:null::@past#V:VINT:N:nullStructure relName : parent previous case marker: parent next case marker: child previous case marker:child previous case marker: child next case marker: the rest four are in form of

    tt 'REL' l ti attr'REL'relationname and attr will be separated by # also relation name are separated by # p y

    Vishal Vachhani 35

  • What is Morphology

    Study of Morphemes Their formation into words, including inflection, g

    derivation and composition

    Vishal Vachhani 36

  • Noun, Verb and Adjective Morphology Depends on the phonetic properties of the

    Hindi wordNoun MorphologyNoun Morphology Depends on gender, number and vowel ending

    of the nounAdjective Morphology , ,

    dj ti h l i l tt ib t AdjA adjective changes, lexical attribute AdjAVerb Morphology Depends upon tense gender number person Depends upon tense, gender, number , person

    etc.

    Vishal Vachhani 37

  • Verbs are categorized byVerbs are categorized by Tense (past,present,future) Gender(male,female)Gender(male,female) Person (1st , 2nd , 3rd ) Number (sg,pl)

    Example Ladaka khana kha raha hai.

    It contains present continuous tense,male, sg, and 3rd person

    Vishal Vachhani 38

  • Arranging word according to the languageArranging word according to the language structureRule based moduleRule based moduleIt is priority based graph traversal

    Vishal Vachhani 39

  • Algorithm for Syntax Planning:g g

    1) Start traversing the UNL graph from the entry node.2) If node has no children then add this node to final string.

    ) f h h h ld f d h h ld3) If there is more than one child of one node then sort children based on the priority of the relations. Relation having highest priority will bepriority will betraversed first.

    4) Mark that node as visited node.5) Repeat steps 3 and 4 until all the children of that node get

    i i dp p g

    visited.6) If all the children of that node get visited then add that node

    to finalstringstring.

    7) Repeat steps 2 to 4 until all the nodes get traversed.

    Vishal Vachhani 40

  • Also, spray 5% Neemark solution.U3

    manobj

    sprayobj:17man:9mod:5qua:5

    modmod

    alsosolution

    modmod

    Neemarkpercent

    qua

    5

    Vishal Vachhani41

  • spray

    Entry

    Vishal Vachhani 42

  • Entry

    spray

    Entry

    obj man

    Vishal Vachhani 43

  • Entry

    spray

    y

    obj:17 man:9obj:17 man:9

    Vishal Vachhani 44

  • E t

    spray

    Entry

    obj:17 man:9

    solution

    Vishal Vachhani 45

  • E t

    spray

    Entry

    obj:17 man:9

    solution

    mod mod

    Vishal Vachhani 46

  • Entry

    spray

    obj:17 man:9obj:17 man:9

    solution

    mod:5 mod:5

    Vishal Vachhani 47

  • Entry

    spray

    obj:17 man:9j

    solution

    mod:5 mod:5

    percent

    Vishal Vachhani 48

  • Entry

    spray

    obj:17 man:9j

    solution

    mod:5 mod:5

    percent

    Vishal Vachhani 49

  • Entry

    spray

    obj:17 man:9obj:17 man:9

    solution

    mod:5 mod:5

    percentqua:5q

    Vishal Vachhani 50

  • spray

    Entry

    spray

    obj:17 man:9

    solution

    mod:5 mod:5

    percentqua:5

    55

    Output : 5

    Vishal Vachhani 51

  • spray

    Entry

    obj:17 man:9

    solution

    d 5 d 5mod:5 mod:5

    percentpercentqua:5

    5

    Output : 5 percent

    Vishal Vachhani 52

  • spray

    Entry

    obj:17 man:9

    solution

    d 5 d 5mod:5 mod:5

    percent Neemarkpercentqua:5

    5

    Neemark

    Output : 5 percent Neemark

    Vishal Vachhani 53

  • spray

    Entry

    spray

    obj:17 man:9

    solution

    mod:5 mod:5

    percentqua:5

    5

    Neemark

    5

    Output : 5 percent Neemark solution

    Vishal Vachhani 54

  • Entry

    spray

    obj:17 man:9

    solution also

    mod:5 mod:5

    percentqua:5

    Neemark

    5

    Output : 5 percent Neemark Solution also

    Vishal Vachhani 55

  • spray

    Entry

    spray

    obj:17 man:9

    solution also

    mod:5 mod:5

    percentqua:5

    5

    Neemark

    5

    Output : 5 percent Neemark Solution also spray

    Vishal Vachhani 56

    Output 5 pe ce t ee a So ut o a so sp ay

  • Output:Output:5 percent Neemark solution also spray5 | 5 | 5 |

    Vishal Vachhani 57

  • Input sentence: Its roots are affected by bacterial infectioninfection.

    Module OutputInput Its roots are affected by bacterial infection.

    UNL parser Case marking

    Input Its roots are affected by bacterial infection.

    MorphologySyntax Planning

    | |

    Output: |

    Vishal Vachhani 58

  • UNL 2005 Specifications: http://www.undl.org/unlsys/unl/unl2005/http://www.undl.org/unlsys/unl/unl2005/S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.DamaniHindi generation from interlingua MTsummit 2007

    (www cse iitb ac in/~vishalv)(www.cse.iitb.ac.in/~vishalv)Mrugank Surve, Sarvjeet Singh, Satish Kagathara, Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati Salil Badodekar Akshay Iyer Ashish AlmeidaSaraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, PushpakBhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine International Conference onBased Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb 2004.Agro Explorer : http://agro.mlasia.iitb.ac.inaAQUA : http://www aaqua orgaAQUA : http://www.aaqua.org

    Vishal Vachhani 59