View
218
Download
0
Category
Preview:
Citation preview
Artificial Intelligence
Rishabh Nigam
Shubhdeep Kochhar
Computational modelling of Grammar Acquisition
The Problem
● Computational framework for Grammar Acuisition
● Unsupervised Learning from a real corpus
● Why the problem
● Algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
The Algorithm
● ADIOS – Automatic Distillation Of Structure● What it does :
The Mex Criterion : It uses the M[i,j]=PR[i,j] or PL[i,j] , this 2d matrix is then searched for steep decrease in PR[i,j] and PL[i,j] indicating a possibility of Equivalence classes in between them
Codes Used
● MEX criterion● Training scripts● Generating Scripts
--> Edelman and Zach Solan made these codes available .
Work done so far
● Converted the CHILDES database, HINDI database(WORDNET) into format readable by the ADIOS algorithm.
● Ran the algorithm on the CHILDES database and HINDI database.
● Had a brief correspondence with Shushobhan Nayak and we ran the algorithm on his database of small commentary.
Running on the CHILDES database
● E6478 {we,you,youse}
● P6479 (I,think) 0.0068258047 1 2 201
● P6480 (E6481,you,are) 0 1 3 18
● E6481 {there,here}
● P6482 (who,E6466) 0.0039371848 1 2 36
● P6483 (P6439,P6434,Emily) 0 0.33333334 5.4000001 4
● P6484 (he,is,E6485) 0.0058915019 1 3 28
● E6485 {.,here}
● P6486 (are,we,P6402) 0.0043362379 1 4 10
● P6487 (wait,to,E6488,E6489) 6.1452389e-05 0.5 4 15
● E6488 {we,you}
● E6489 {hear,see}
For eg E6481 you are --> There you are and here you are --< sentences in the corpus used
Running it on Hindi Database
● ID seq p-value gen len occ
● P3487 (भी�,प्रचलि�त) 0.0042799711 1 2 5
● P3488 (के ,E3489,भीगों�) 0 1 3 11
● E3489 {वि�भिभीन्न,मु��यमु}
● P3490 (E3491,के�,भीषा,मु�) 1.9848347e-05 1 4 6
● E3491 {वि�ज्ञान,बो��-च�}
● P3492 (मु�,E3493,�प,केरन,से) 0.0037000179 1 5 4
● E3493 {मिमु�,घो��केर,प�सेकेर}
● P3494 (वि�षा,नष्ट,हो�त) 0.001850009 1 3 4
● P3495 (सेमुन,भीगों) 0.0059099197 1 2 26
Running on the Commentary● P447 (the,E448,square) 0 1 3 65
● E448 {large,big}
● P449 (big,square) 7.212162e-05 1 2 38
● P450 (the,little,E451) 0 1 3 49
● E451 {circle,square}
● P452 (the,big,box) 0 1 3 34
● E455 {opens,closes,enters}
● P456 (the,E457) 0.0055941939 1 2 91
● E457 {bottom,corner,door,entrance}
● P458 (P449,E459,the) 0 1 4 4
● E459 {leaves,closes,enters}
● E461 {--,inside,and,leaves,left,opens,closes,enters}
Precision And Recall
Precision - the proportion of Clearner sentences accepted by the Teacher
Recall - the proportion of Ctarget sentences accepted by the Learner
Values found around 0.6 precision and 0.5 recall
References
[1] Heider. Waterfall ,Ben Sandbank,Luca Onnis and Shimon Edelman , An empirical generativeframework for computational modeling of language acquisition* : Cambridge University Press 2010
[2] Zach Solan PHD thesis under Professor David Horn ,Professor Shimon Edelman, and Professor Eytan Ruppin , AVIV university
Thank You
Recommended