21
Applications of Natural Language Processing Course 8 – 26 April 2012 Diana Trandabăț [email protected] 1

Course 8 – 26 April 2012 Diana Trandabă [email protected] 1

Embed Size (px)

Citation preview

Page 1: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

1

Applications of Natural Language

ProcessingCourse 8 – 26 April 2012

Diana Trandabăț[email protected]

Page 2: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

2

Computational lexicography

Ubiquitous computing

Content

Page 3: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

3

1. Exploiting published dictionaries for use in new computer programs

2. Using computer programs to create new dictionaries

Computational lexicography

Page 4: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

4

Inventory of the words of a language+ tokenization, lemmatization

Word class recognition (noun vs. verb vs. adj.)◦ but dictionaries don’t give comparative

frequencies Word sense disambiguation

◦ assumes that dictionary sense distinctions are reliable.

◦ dictionaries don’t give comparative frequencies!

Using dictionaries for computational purposes

Page 5: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

5

Based on collections of citations (from literary texts)

In some dictionaries, examples were – and are – based on introspection, not taken from actual texts.

Definitions in dictionaries of the future: associate meanings with words in context, not words in isolation.

Dictionaries before corpora

Page 6: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

6

The process of building and publishing the Thesaurus Dictionary of the Romanian Language (DTLR) took almost one century.

The last volume was finally published by the Editing House of the Romanian Academy at the beginning of 2009.

In all, DTLR has 33 volumes, more than 15,000 pages and about 175,000 entries.

The dictionary was created in the traditional pencil-and-paper way, with citations collected from more than 2,500 volumes of the written Romanian literature.

DTLR

Page 7: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

7

 The digital form of DTLR, including its sources in digital form and the software to access them

Steps in Building eDTLR:◦ Preliminary processing of the paper version

Scanning Image Processing Automatic recognition of symbols - OCR

◦ Correction phases◦ Parsing the entries◦ Correcting the structure◦ Linking the dictionary entries to sources

eDTLR

Page 8: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

8

VIVÁCE adj. invar., adv. I. Adj. invar., adv. 1. Adj. invar. (Livresc; despre oameni) Care are o vitalitate deosebită, manifestată prin rapiditate şi uşurinţă în mişcări. V. a g e r, a g i l, s p r i n t e n (1), v i o i (1). Cf. FROLLO, V. 623, LM, GHEŢIE, R. M., BARCIANU, ALEXI, W. Era mică: abia întrecea umărul lui Dănuţ, cu tocuri cu tot – dar prea vivace pentru a da răgaz ochiului să o cuprindă. TEODOREANU, M. II, 15, cf. SCRIBAN, D., DL, DM, M. D. ENC., DEX, DN3, DREV. ♦(Despre manifestările, fizionomia etc. oamenilor) Care dovedeşte, care exprimă vivacitate (1), însufleţire. Copilul îşi spune întîmplările, impresiunile, închipuirile foarte copilăreşte, adică în modul cel mai naiv şi vivace cîteodată. HELIADE, O. II, 65..

2. Adj. invar. (Livresc; despre oameni) Care este înzestrat cu o minte ageră, pătrunzătoare; perspicace, subtil (3); (despre mintea, inteligenţa oamenilor) care dovedeşte agerime, subtilitate. Cf. FROLLO, V. 623. Spirit vivace. LM. Prin mijlocirea iubitului meu profesor, Aron Pumnul, avui fericirea să fac… cunoştinţă cu renumiţii fraţi Hurmuzachi, mai de aproape… cu talentosul, vivaciul şi vîşcătoriul Alesandru. SBIERA, F. S. 106. Nime dintre contimporeni nu ar pute contesta că mişcările politice şi culturale din anii 1848 şi 1871 îşi dătoresc fiinţa şi decursul, în bună parte, spiretelui vivaciu, scînteietoriu şi atîţătoriu al lui A. Hurmuzachi. id. ib. 238..

3. Adj. invar. (Despre tempoul unei bucăţi muzicale sau, p. e x t., despre ritmul versurilor) Foarte rapid, însufleţit. Cu cît se înmulţesc dactilii în exametru, cu atîta versul devine mai răpede, mai vivace şi mai uşor. HELIADE, O. II, 164, cf. DSR. ◊ (Prin extensiune) Şantierul ardea în timp vivace, prin armonizarea tuturor focarelor într-un rug colosal. CĂLINESCU, S. 106. Dar dacă ar fi numai atît, – virtuozitatea absurdului rulat într-un tempo vivace, – în povestirile d-lui Mircea Damian, de bună seamă n-ar fi de ajuns. Şi ar fi mai ales periculos. PERPESSICIUS, M. III, 201.

4. Adv. (Indică modul de executare a unei bucăţi muzicale) În tempo foarte rapid între allegro şi presto; vivo. Cf. ENC. ROM., CADE, DL, DM, M. D. ENC., DEX, DN3, D. MUZ., DSR.

II. Adj. invar. 1. (Astăzi rar; despre fiinţe) Care poate trăi mult timp; (învechit) vieţuielnic (2), (învechit, rar) vieţuial (v. vieţual 2). V. r e z i s t e n t, r o b u s t. Cf. PROT. – POP., N. D., PONTBRIANT, D. Corbul este un animal vivace. COSTINESCU, cf. LM, RESMERIŢĂ, D., ŞĂINEANU, D. U., CADE, SCRIBAN, D. ◊ F i g. Naţionalitatea georgiană, din care mingrelianii sînt o simplă ramură, nu poate fi de aceeaşi ginte cu anticii fasiani,… ci derivă dintr-o altă tulpină mai vînoasă, mai vivace, mai rezistinte, a cării aşezare în văile Caucazului… este posterioară epocei lui Ipocrat. HASDEU, I. C. I, 173.

2. (Despre plante, mai ales despre plantele ierbacee de cultură sau despre părţi ale acestora) Care trăieşte mai mulţi ani (fără a fi nevoie de o nouă însămînţare); care rodeşte timp de mai mulţi ani la rînd; peren. Streliţia reginei, plantă foarte mîndră…, vivace, deşi ierboasă, cere… dese udări vara. BREZOIANU, A. 432/4, cf. 448/7. Vizdeiul... este o plantă prea vivace; dăinirea ei… se întinde adesea pînă la doisprezece sau cincisprezece ani. id. R. 199/3, cf. 207/23. Sînt unele rădăcini care trăiesc numai un an (anuale); altele trăiesc doi ani (bisanuale), pe cînd rădăcinele arburilor care trăiesc mai mulţi ani se numesc vivace. BARASCH, I. N. 107/20, cf. COSTINESCU.

3. F i g. (Astăzi rar) Persistent, durabil. Cîteva din prejudicii sînt vivace, nu se pot lesne desfiinţa. COSTINESCU. Suflarea trecutului e încă atît de vivace în inima ei, încît o va conduce mereu cu aripele destinse către viitor. ODOBESCU, S. II, 261. Prejudiciile sînt vivaci. ŞĂINEANU, D.U. O datină vivace. CADE, cf. SCRIBAN, D.

– Pl.: (neobişnuit) vivaci. – Şi: (învechit, rar) viváci adj. – Din (I) it. vivace, lat. vivax, -acis, (II) fr. vivace, lat. vivax, -acis.

Entry example in DTLR

Page 9: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

9

VIVÁCE adj. invar., adv. I. Adj. invar., adv. 1. Adj. invar. Gloss Examples. ♦ Gloss Examples 2. Adj. invar. Gloss Examples 3. Adj. invar. Gloss Examples 4. Adv. Gloss Examples II. Adj. invar. 1. Gloss Examples 2. Gloss Examples 3. F i g. Gloss Examples – Pl.: (neobişnuit) vivaci. – Şi: (învechit, rar) viváci

adj. – Din (I) it. vivace, lat. vivax, -acis, (II) fr. vivace,

lat. vivax, -acis.

Entry example in DTLR

Page 10: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

10

The Dictionary is parsed using the following components:◦ A set of marker classes: a marker is a boundary for a

specific linguistic category;◦ A hypergraph-like hierarchy that establishes the

dependencies among the marker classes;◦ A searching (parsing) algorithm.

Once a configuration is defined, parsing implies:◦ identifying markers in the text to be parsed, ◦ recognizing the marked text structures◦ classifying them according to the marker sequences within

the pre-established hierarchy ◦ settling the dependencies and correlations among the

parsed textual structures.

Dictionary entry parsing

Page 11: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

11

Sense tree parsing markers◦ The capital letter marker class (A., B., etc.) ◦ The Roman numeral marker class (I., II., etc.)◦ The Arabic numeral marker class (1., 2., etc.)◦ The filled diamond and the empty diamond marker class◦ The lowercase letter markers a), b), c)

Definitions parsing markers◦ Morphological definitions;◦ Gloss definitions;◦ Phrase-based definitions;◦ Collocation definitions;◦ Examples supporting various specific meanings of a certain

definition.

Marker classes

Page 12: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

12

Parsed entry

Page 13: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

13

The word "ubiquitous" can be defined as "existing or being everywhere at the same time," "constantly encountered," and "widespread."

When applying this concept to technology, the term ubiquitous implies that technology is everywhere and we use it all the time.

In ubiquitous computing (ubicomp), computers become a helpful but invisible force, assisting the user in meeting his or her needs without getting in the way.

Also described as pervasive computing, ambient intelligence, everyware, or physical computing.

Ubiquitous computing

Page 14: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

14

Page 15: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

15

A domestic ubiquitous computing environment might interconnect lighting and environmental controls with personal biometric monitors woven into clothing so that illumination and heating conditions in a room might be modulated, continuously and imperceptibly.

Another common scenario posits refrigerators "aware" of their suitably tagged contents, able to both plan a variety of menus from the food actually on hand, and warn users of stale or spoiled food.

Common examples

Page 16: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

16

Computers are able to understand a user’s current situation and offer services, resources, or information relevant to the particular context.

The attributes of context may include the user’s location, past activity, affective state, current date and time, other objects, etc.

Context-Awareness

Page 17: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

17

The idea: to supply services, resources, or information to a user without the user having to think about the rules of how to use the computer to get them.

In this way, the user is not preoccupied with the dual tasks of using the computer and getting the services, resources, or information.

Contemporary devices that lend some support to this latter idea include mobile phones, digital audio players, RFIDs, GPS, and interactive whiteboards.

Natural Interaction

Page 18: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

18

1) Intelligent refrigerator: Plan a menu from the food on the hand◦ Input: set of recipes, list of available food◦ Output: possible recipes using available food

Bonus: offer also recipes which miss one ingredient. 2) Create a list of patterns for at least 15

commands for an intelligent house monitoring. Examples:

open {window/lights} close {tv/window/air conditioning}, etc.

◦ Points are given for originality and task fitness.

Requirements (Team: max 1 person, Deadline: 3 May)

Page 19: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

Further reading Computational lexicography

◦ Marius Răschip, Dan Cristea, Corina Forăscu. (2008). eDTLR – Dicţionarul tezaur al limbii române în format electronic. In Lucrările Seminarului Internaţional al Uniunii Latine "Instrumente pentru asistarea traducerii", Academia Română, Bucureşti, 28-29 februarie, 2008.

◦ Neculai Curteanu, Alexandru-Mihai Moruz, Diana Trandabăț. (2008). Extracting Sense Trees from the Romanian Thesaurus by Sense Segmentation & Dependency Parsing. In Proceedings of the COLING 2008 Workshop on Cognitive Aspects of the Lexicon, pp. 55–63, Manchester.

Ubiquitous computing◦ Abowd, G.D., & Mynatt, E.D. (March, 2000). Charting past,

present, and future research in ubiquitous computing. ACM Transactions on Computer-Human Interaction, 7, pp. 29–58.

19

Page 20: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

20

eDTLR rpoject web page: https://consilr.info.uaic.ro/edtlr/wiki/index.php?title=Despre_proiect

Natural Habitat: http://www.informatics.sussex.ac.uk/research/projects/nathab/scenario.htm

Links

Page 21: Course 8 – 26 April 2012 Diana Trandabă dtrandabat@info.uaic.ro 1

21

Thanks!