Upload
andre-karpistsenko
View
440
Download
0
Embed Size (px)
Citation preview
Cognitive plausibility in learning algorithmsWith application to natural language processing
Arvi Tavast, PhDQlaara Labs, UT, TLUTallinn, 10 May 2016
Introduction Understanding humans Results Application
MotivationWhy cognitive plausibility?
Objective: best product vs best researchModel the brainEnd-to-end learning from raw unlabelled dataGrounded cognitionCognitive computing, neuromorphic computing
Feedback loop: using the model to better understand theobject to be modelled
Introduction Understanding humans Results Application
OutlineHeretical view on language - established learning model - application to NLP
1 Introduction
2 Understanding humansUnderstanding human communicationUnderstanding human learningRescorla-Wagner learning model
3 Results
4 ApplicationNaive Discrimination Learning
Introduction Understanding humans Results Application
My backgroundmainly in linguistics
1993 TUT computer systems
1989-2004 IT translation
2000-2006 Microsoft MILS
2002 UT MA linguistics
2008 UT PhD linguistics
2015 Uni TUbingen postdoc quantitative linguistics
Introduction Understanding humans Results Application
Understanding human communicationHow do we explain the observation that verbal communication sometimes works
The channel metaphor
Speaking is like sending things by train, selecting suitablewagons (words) for each thing (thought)
Hearing is like decoding the message
⇒meanings are properties of words
Communication as uncertainty reduction
Speaking is like sending blueprints for building things, whichthe receiver will have to follow (subject to their abilities,available materials, etc.)
⇒meanings are properties of people
Hearing is like using hints to reduce our uncertainty aboutthe message
Introduction Understanding humans Results Application
Understanding human communicationWhen can the channel metaphor work?
Encoding of a message must contain a set of discriminablestates that is greater than or equal to the number ofdiscriminable states in the to-be-encoded message
or:
Encoding thoughts with words can only work if the numberof possible thoughts is smaller than or equal to the numberof possible words
This is the case only in restricted domains (weather forecasts)
Compare: reconstructing a document based on its hash sum
Introduction Understanding humans Results Application
Understanding human learningCompositional vs discriminative
Possible ways of conceptualising biological learning
Compositional model: we start as an empty page, addingknowledge like articles in an encyclopedia
Discriminative model: we start by perceiving a single object(the world) and gradually learn to discriminate between itsparts
If discriminative:
Human language models can not be constant across time orsubjects
Introduction Understanding humans Results Application
The Rescorla-Wagner learning modelLanguage acquisition can be described as creating a statistical relationship
The Rescorla-Wagner model: how do we learn that Cj means O
if we see that Cj ⇒ O, the relationship is strengthenedless, if there are other cues
if we see that Cj ⇒ ¬O, the relationship is weakenedmore, if there are other cues
(if we see that ¬Cj ⇒ O, the relationship is weakened)
Introduction Understanding humans Results Application
Feature-label-order effectCreating the relationship between word and concept is only possible in one direction
Feature-label-order effect
If concept⇒word, the relationship is strengthened
If word⇒ concept, the relationship is not strengthened
Number of objects in the world� number of words inlanguage
Abstraction inevitably and irreversibly discards information
Recovering a meaning from a word is necessarilyunderspecified
Ramscar, M., Yarlett, D., Dye, M., Denny, K., and Thorpe, K. (2010). The effects of feature-label-order and theirimplications for symbolic learning. Cognitive Science, 34(6), 909–957.
Introduction Understanding humans Results Application
Aging and cognitive declineWhy do our verbal abilities seem to fail around the age of 65?
Ramscar, M., Hendrix, P., Shaoul, C., Milin, P., and Baayen, H. (2014). The myth of cognitive decline: Non-linear dynamicsof lifelong learning. Topics in Cognitive Science, 6(1), 5–42.
Introduction Understanding humans Results Application
MorphologyImplicit morphology (without morphemes)
0.1
0.378
0.116
0.576
0.531
0.4190.39
0.377
0.516
0.475
0.47
0.587
0.124
0.225
0.216
0.1630.138
0.5
0.5
#mA
ki#
#tA
tA# #mt
mtA
tAk
Aki
itA
#mimit
At#mAt
#m@
@tA
m@t
#m::t
m::tA
###
Introduction Understanding humans Results Application
Naive Discrimination LearningThe R package: installation and basic usage
ndl: https://cran.r-project.org/web/packages/ndl/index.html
ndl2 (+ incremental learning): contact the authors
wm = estimateWeights(events) # Danks equilibria
wm = learnWeights(events) # incremental, ndl2 only
Introduction Understanding humans Results Application
Naive Discrimination LearningInput data for Danks estimation: frequencies
Outcomes Cues Frequency
aadress aadress S SG N 1
aadresse aadress S PL P 1
aadressil aadress S SG AD 4
aadressile aadress S SG ALL 1
aasisid aasima V SID 1
aasta aasta S SG G 2
aasta aasta S SG N 1
aastane aastane A SG N 48
Introduction Understanding humans Results Application
Naive Discrimination LearningInput data for incremental learning: single events
Outcomes Cues Frequency
aadress aadress S SG N 1
aadresse aadress S PL P 1
aadressil aadress S SG AD 1
aadressil aadress S SG AD 1
aadressil aadress S SG AD 1
aadressil aadress S SG AD 1
aadressile aadress S SG ALL 1
aasisid aasima V SID 1
aasta aasta S SG G 1
aasta aasta S SG G 1
aasta aasta S SG N 1
aastane aastane A SG N 1
aastane aastane A SG N 1
aastane aastane A SG N 1
...
Introduction Understanding humans Results Application
Naive Discrimination LearningOutput: weight matrix, cues x outcomes
Cues Outcomes Applicationletter ngrams words readingcharacter features words readingwords lexomes POS tagginglexomes letter ngrams morphological synthesiscontexts words distributional semanticsaudio signal words speech recognitionwords audio signal speech synthesis
Introduction Understanding humans Results Application
Naive Discrimination LearningAbout the weight matrix
What we can look at:
Similarity of outcome vectors
Similarity of cue vectors
MAD (median absolute deviation) of outcome vector
Competing cues
Introduction Understanding humans Results Application
Naive Discrimination LearningAbout the weight matrix
Other properties:
No dimensionality reduction (played with 200k x 100k)
Danks equations subject to R’s 232 limit (matrixpseudoinverse)
Slow (weeks on ca 16 cores, 200G ram)
Performance less than word2vec etc, but comparable
Introduction Understanding humans Results Application
Some NLP toolsHow to get started quickly with NLP
Python NLTKEstNLTKGensim (incl word2vec)DISSECT
Java GATE (also web)Stanford NLPDeeplearning4j (incl word2vec)
C word2vecR NDL
Introduction Understanding humans Results Application
Language understandingWhat’s missing from full language understanding
Training material
Interannotator agreement is less than perfect
Corpus is heterogenous
This is not a methodological flaw
Communicative intent and self-awareness
If cues are lexomes (=what the speaker wanted to say), thesystem must want something.
Introduction Understanding humans Results Application
Thanks for listeningContacts and recommended reading
Easy readingblog.qlaara.com
Recommended readingHarald Baayenwww.sfs.uni-tuebingen.de/hbaayen/Michael Ramscarhttps://michaelramscar.wordpress.com/