Upload
ledang
View
217
Download
2
Embed Size (px)
Citation preview
CS 671 ICT For Development19th Sep 2008
Vishal VachhaniCFILT and DILCFILT and DIL,
IIT Bombay
Agro Explorerg p
A Meaning Based Multilingual Search EngineSearch Engine
Vishal Vachhani 2
f fWeb-site for Indian farmers Farmers can submit their problems related to their cropsQueries are answered by Agricultural Experts at KVK, BaramatiLanguages supported: Marathi, Hindi, English
Vishal Vachhani 3
Why Need Multilingual SearchWhy Need Multilingual Search
Vast Amo nt of Information a ailable on theVast Amount of Information available on the Web
l 0% f h f l hAlmost 70% of the Information is in English
The Indian rural populace is not English-Literate
A Big Language BarrierA Big Language BarrierInformation has to be made available to them
in their local languagesin their local languages.
Vishal Vachhani 4
Why Need Meaning Based SearchWhy Need Meaning Based Search
Most of the current Search Engines areMost of the current Search Engines are Keyword Based.
Th d t id th ti f thThey do not consider the semantics of the query
h l l b fThe result set contains a large number of extraneous documents.
Search based on the Meaning of the query will help narrow down on the desired information q icklinformation quickly.
Vishal Vachhani 5
Query inSystem
Query in Hindi
English Documentsearch
Marathi Document
search
English Document
Result in Hindi
Vishal Vachhani 6
Same Keywords Different SemanticsDifferent Semantics
Moneylenders Exploit Farmers
Farmers Exploit Moneylenders
F d 1 R lt F d 0 R ltFound 1 Result Found 0 Result
Vishal Vachhani 7
Provides bothMeaning Based SearchMeaning Based SearchCross-Lingual Information Access
Vishal Vachhani 8
System Architecture
Vishal Vachhani 9
Vishal Vachhani 10
Vishal Vachhani 11
Vishal Vachhani 12
Vishal Vachhani 13
Vishal Vachhani 14
Conclusion
P id t i d d t f tProvides two independent features Multi-LingualityMeaning Based SearchMeaning Based Search.
Because of UNL both multi-lingual and meaning based properties can be incorporatedmeaning based properties can be incorporated together rather than using separate language translators in search engines. The scheme admits itself to Integration of multiple languages in a seamless, scalable manner.
Vishal Vachhani 15
UNLUNLUNL UNL Universal Networking LanguageUniversal Networking LanguageUniversal Networking LanguageUniversal Networking Language
Vishal Vachhani 16
Hindi
Englis Frenc
UNL
English
French
Tamil
Marathi
Vishal Vachhani 17
Direct translationDirect translation - translation will be done directly
N*(N 1) translator are needed for N- N*(N-1) translator are needed for Nlanguages translation.
I di LIntermediate Language - intermediate language will be usedfor language translation
- Only 2*N translators are required.
Vishal Vachhani 18
UNL is an acronym for Universal NetworkingUNL is an acronym for Universal Networking Language.UNL is a computer language that enables U s a co pute a guage t at e ab escomputers to process information and knowledge across the language barriers.UNL is a language for representing information and knowledge provided by natural languages U lik l l UNL iUnlike natural languages, UNL expressions are unambiguous.
Vishal Vachhani 19
Although the UNL is a language forAlthough the UNL is a language for computers, it has all the components of a natural languagenatural language.It is composed of Universal Words (UWs), Relations AttributesRelations, Attributes.Knowledge :semantic graph Nodes concepts Nodes concepts Arcs relation between concepts
Vishal Vachhani 20
A UW represents simple or compound conceptsA UW represents simple or compound concepts. There are two classes of UWs: unit concepts p compound structures of binary relations grouped
together ( indicated with Compound UW-Ids)A UW is made up of a character string (an English-language word) followed by a list of constraints.
::=[] example example
state(icl>express)state(icl>country)
Vishal Vachhani 21
A relation label is represented as strings of 3 A relation label is represented as strings of 3 characters or less. The relations between UWs are binary.y
rel (UW1, UW2) They have different labels according to the different
l h lroles they play. At present, there are 46 relations in UNL For example agt (agent) ins (instrument) pur For example, agt (agent), ins (instrument), pur
(purpose), etc.
Vishal Vachhani 22
Attribute labels express additionalAttribute labels express additional information about the Universal Words that appear in a sentence.pp
They show what is said from the speakers point of ie ho the speaker ie s hat is said (timeview; how the speaker views what is said. (time,
reference, emphasis, attitude, etc)
@entry, @present, @progressive, @topic, etc.
Vishal Vachhani 23
Example:Ram eats rice.
{unl}agt(eat.@entry.@present, Ram)obj(eat.@entry.@present, rice(icl>eatable))
{/unl}
Vishal Vachhani 24
eat
plc agt
Ram rice
Vishal Vachhani 25
E lExample:The boy who works here went to school.
{unl}{unl}agt(go(icl>move).@entry.@past, :01)plt(go(icl>occur).@entry.@past,school(icl>institutioplt(go(icl>occur).@entry.@past,school(icl>institution))agt:01(work(icl>do), boy(icl>person.@entry))plc:01(work(icl>do),here)
{/unl}
Vishal Vachhani 26
go
agt plt
work school:01
plc agt
here boy
Vishal Vachhani 27
EnconvertorS EnconvertorSource language
IntermediateLanguageLanguage
Deconvertortarget language
Vishal Vachhani 28
Its a Language Independent GeneratorIt s a Language Independent GeneratorIt can deconvert UNL expressions into a variety of native languages, using a number of linguistic data at e a guages, us g a u be o gu st c datasuch as Word Dictionary, Grammatical Rules of each language.The DeConverter transforms the sentence represented by a UNL expression into Natural lang age sentencelanguage sentence.
Vishal Vachhani 29
Vishal Vachhani 30
DictionarySyntax
Planning Rules
Case Marking
RulesMorphology
Rules Rules
Case h l SyntaxUNLDoc
HindiDocUNL
Parser
Case MarkingModule
Morphology Module
SyntaxPlanning Module
Doc iDoc
Language dependent Module
Vishal Vachhani 31
Language Independent Module
UNL parser module will do following tasks
Check input format of UNL documentSeparate attributes form UWsS ib f di i iSeparate attributes form dictionary entries
Replace UWs with Hindi root words
C t f h t tiCategory of morpho-syntactic properties which distinguish the ario s relations that a no n phrasevarious relations that a noun phrase
may bear to a governing head. , ,, , ,etc.
A rule base based on : UNL attributes lexical attributes from dictionary
Vishal Vachhani 33
Case marking is implemented using rulesCase marking is implemented using rules.We analyze all UNL as well as dictionary attributes and decide next and previous caseattributes and decide next and previous case marker.Also we use relation with parent to extractAlso we use relation with parent to extract the right case mark.
Vishal Vachhani 34
agt:null:null:null::@past#V:VINT:N:nullagt:null:null:null::@past#V:VINT:N:nullStructure relName : parent previous case marker: parent next case marker: child previous case marker:child previous case marker: child next case marker: the rest four are in form of
tt 'REL' l ti attr'REL'relationname and attr will be separated by # also relation name are separated by # p y
Vishal Vachhani 35
What is Morphology
Study of Morphemes Their formation into words, including inflection, g
derivation and composition
Vishal Vachhani 36
Noun, Verb and Adjective Morphology Depends on the phonetic properties of the
Hindi wordNoun MorphologyNoun Morphology Depends on gender, number and vowel ending
of the nounAdjective Morphology , ,
dj ti h l i l tt ib t AdjA adjective changes, lexical attribute AdjAVerb Morphology Depends upon tense gender number person Depends upon tense, gender, number , person
etc.
Vishal Vachhani 37
Verbs are categorized byVerbs are categorized by Tense (past,present,future) Gender(male,female)Gender(male,female) Person (1st , 2nd , 3rd ) Number (sg,pl)
Example Ladaka khana kha raha hai.
It contains present continuous tense,male, sg, and 3rd person
Vishal Vachhani 38
Arranging word according to the languageArranging word according to the language structureRule based moduleRule based moduleIt is priority based graph traversal
Vishal Vachhani 39
Algorithm for Syntax Planning:g g
1) Start traversing the UNL graph from the entry node.2) If node has no children then add this node to final string.
) f h h h ld f d h h ld3) If there is more than one child of one node then sort children based on the priority of the relations. Relation having highest priority will bepriority will betraversed first.
4) Mark that node as visited node.5) Repeat steps 3 and 4 until all the children of that node get
i i dp p g
visited.6) If all the children of that node get visited then add that node
to finalstringstring.
7) Repeat steps 2 to 4 until all the nodes get traversed.
Vishal Vachhani 40
Also, spray 5% Neemark solution.U3
manobj
sprayobj:17man:9mod:5qua:5
modmod
alsosolution
modmod
Neemarkpercent
qua
5
Vishal Vachhani41
spray
Entry
Vishal Vachhani 42
Entry
spray
Entry
obj man
Vishal Vachhani 43
Entry
spray
y
obj:17 man:9obj:17 man:9
Vishal Vachhani 44
E t
spray
Entry
obj:17 man:9
solution
Vishal Vachhani 45
E t
spray
Entry
obj:17 man:9
solution
mod mod
Vishal Vachhani 46
Entry
spray
obj:17 man:9obj:17 man:9
solution
mod:5 mod:5
Vishal Vachhani 47
Entry
spray
obj:17 man:9j
solution
mod:5 mod:5
percent
Vishal Vachhani 48
Entry
spray
obj:17 man:9j
solution
mod:5 mod:5
percent
Vishal Vachhani 49
Entry
spray
obj:17 man:9obj:17 man:9
solution
mod:5 mod:5
percentqua:5q
Vishal Vachhani 50
spray
Entry
spray
obj:17 man:9
solution
mod:5 mod:5
percentqua:5
55
Output : 5
Vishal Vachhani 51
spray
Entry
obj:17 man:9
solution
d 5 d 5mod:5 mod:5
percentpercentqua:5
5
Output : 5 percent
Vishal Vachhani 52
spray
Entry
obj:17 man:9
solution
d 5 d 5mod:5 mod:5
percent Neemarkpercentqua:5
5
Neemark
Output : 5 percent Neemark
Vishal Vachhani 53
spray
Entry
spray
obj:17 man:9
solution
mod:5 mod:5
percentqua:5
5
Neemark
5
Output : 5 percent Neemark solution
Vishal Vachhani 54
Entry
spray
obj:17 man:9
solution also
mod:5 mod:5
percentqua:5
Neemark
5
Output : 5 percent Neemark Solution also
Vishal Vachhani 55
spray
Entry
spray
obj:17 man:9
solution also
mod:5 mod:5
percentqua:5
5
Neemark
5
Output : 5 percent Neemark Solution also spray
Vishal Vachhani 56
Output 5 pe ce t ee a So ut o a so sp ay
Output:Output:5 percent Neemark solution also spray5 | 5 | 5 |
Vishal Vachhani 57
Input sentence: Its roots are affected by bacterial infectioninfection.
Module OutputInput Its roots are affected by bacterial infection.
UNL parser Case marking
Input Its roots are affected by bacterial infection.
MorphologySyntax Planning
| |
Output: |
Vishal Vachhani 58
UNL 2005 Specifications: http://www.undl.org/unlsys/unl/unl2005/http://www.undl.org/unlsys/unl/unl2005/S.Singh, M.Dalal, V.Vachhani, P.Bhattacharrya and O.DamaniHindi generation from interlingua MTsummit 2007
(www cse iitb ac in/~vishalv)(www.cse.iitb.ac.in/~vishalv)Mrugank Surve, Sarvjeet Singh, Satish Kagathara, Venkatasivaramasastry K, Sunil Dubey, Gajanan Rane, Jaya Saraswati Salil Badodekar Akshay Iyer Ashish AlmeidaSaraswati, Salil Badodekar, Akshay Iyer, Ashish Almeida, Roopali Nikam, Carolina Gallardo Perez, PushpakBhattacharyya, AgroExplorer Group: AgroExplorer: a Meaning Based Multilingual Search Engine International Conference onBased Multilingual Search Engine, International Conference on Digital Libraries (ICDL), New Delhi, India, Feb 2004.Agro Explorer : http://agro.mlasia.iitb.ac.inaAQUA : http://www aaqua orgaAQUA : http://www.aaqua.org
Vishal Vachhani 59