21
Artificial Cognition Systems Using Simplish Computational Intelligence Unconference UK 2014 Dr. Marcelo Funes-Gallanzi The Goodwill Company, Ltd. Guildford, England. E-mail: [email protected] DISTRIBUTION A: Approved for public release; distribution unlimited.

Marcelo Funes-Gallanzi - Simplish - Computational intelligence unconference

Embed Size (px)

DESCRIPTION

At the computational intelligence unconference 2014, Marcelo Funes-Gallanzi presented Simplish, a system for the conversion of text into Simple English. Here are his slides.

Citation preview

Artificial Cognition Systems Using Simplish

Computational Intelligence Unconference UK 2014

Dr. Marcelo Funes-GallanziThe Goodwill Company, Ltd.Guildford, England.E-mail: [email protected] A: Approved for public release; distribution unlimited.

It is generally thought that an AI explosion will occur in the foreseeable future, given the lack of projects today that aimed at building a general human-level AI system and could lead to systems capable of improving upon themselves which exist today.

Our research programme is precisely directed at building a human-level artificial cognition system capable of improving upon itself.

In order to achieve the goal of developing a viable artificial cognition system, we first of all need to develop a self-evolving brain analogue that is able to acquire unstructured knowledge, store and retrieve knowledge, and constantly improve upon itself, irrespective of the field of knowledge.

Knowledge and experience are most easily transferred through language and at the root of the concept of language lies the very definition of a word.  Wittgenstein (1953) suggested that people are aware of the meaning of a word only in the context of its relationship to other words and that there exists a "family resemblance" , and following the work of Rosch (1973) it was found that there are in fact about 400 core concepts in western children, which are intensively used in growing up to interpret meaning.  Moreover, Rosch argued that there is a "natural" level of categorization that we tend to use to communicate.  This level is known as "basic-level categorization".

Using the idea of meaning being defined in terms of a word's relationship to others is attractive but involves deriving a matrix of word relations that implies many more entries than the average number of neurons in a human brain, if a standard vocabulary is used. This fact leads us to two conclusions: first, that it is likely that basic-level categorization is in fact used by the brain, and second that it would be useful to find a way to represent a full vocabulary in terms of a reduced vocabulary, that can act as a proxy for this set of basic-level categorizations.

Background information

Simplish is a tool that converts standard English into a reduced-vocabulary version of 1,000 words, 850 basic words, 50 international words and 100 specialized words, and we propose to use this as a proxy for a set of basic-level categorizations. This representation therefore yields an effective means of knowledge acquisition. A reduced-vocabulary also has the advantage of reducing ambiguity as a by-product of the translation process.

• Using a reduced-vocabulary representation (say 850 words) enables the mapping of language, through the use of standard multivariate techniques, to a low-dimensionality space where a multidimensional ideogram (a graphic symbol that represents an idea or concept) can be produced as an illustrative point in this subspace, as might be represented in the brain (using variables such as coordinates x, y, z, potential, neurotransmitter, frequency, phase, etc.). The low-dimensionality ideogram is the storage medium. These ideograms will be similar even if different words or grammar are used, because their form and position is given by the relationship of a concept/word to all other members of the basic-level categorization.

• The result of this strategy will be to establish a means to map semantic similarity into spatial proximity, i.e. the distance between two concepts is a measure of how similar their meanings are.

• Spatial proximity can be used to yield a means for knowledge retrieval for a given query, via the association of ideas as a human does, implemented here through the application of nearest-neighbour search algorithms, a method that is well-known in the art.

• This strategy enables concepts to be mapped either as part of a data-driven step or concept-driven contextual information needed for problem-solving. We can plot and step along an evolving path and come across (through intersections of ideograms) both types of information if relevant.

• Thus, we can move from keywords and pattern-matching to concept-matching, for instance in a web search engine! By looking at content directly without looking at the metadata, which precondition information access.

Background information

STANDARD

ENGLISH

BASIC ENGLISH

(Annotated)

100,000 + Words

incl. 30,000 scientific1,000+ Words

www.simplish.org A standard English to Basic English tool

• Some words such as “name” have arguments (e.g. Jesus) while others do not (e.g. flat).

• Idiomatic phrases, names and many places are also considered

• Personal dictionary allows adapting the translation to the user's vocabulary

• File standards: .doc, .txt, .pdf & .html

• Also, aimed at helping Orientals for instance (5m S&T graduates last year!...) & non-scientists

• Pending improvements: apostrophes, images, spaces, hyphens, etc.

This approach enables:UNIQUE INFINITE EXPRESSIVENESS

The use of basic-level categorization in order to represent the meaning of a particular word, in relation to all other words, results in a matrix where all words can be related to each other.  This matrix kernel is what confers order to the memory process, each such matrix being equivalent to describing one mind's perception, so no confusion (deriving from many authors' use of language) as in a corpus of data so ambiguity is reduced. Thus, the shape and position of an ideogram depends on this unique broad abstract “association criteria”, with no need for specific training or ontologies (descriptions of the concepts and relationships that can exist for an agent or a community of agents).

Entries are rated between opposite (minimum) up to highly related (maximum) with columns being ordered by syntax and rows by a rough semantic classification of 50 semantic tags, so word order is important; unlike in say Latent Semantic Analysis.

In fact, various models already exist that provide automatic means of determining similarity of meaning by analysis of large text corpora, without any understanding.

Order in Memory...

The matrix of basic broad abstract relations between words as proposed by Wittgenstein can be converted into a low-dimensionality representation using standard singular value decomposition methods, such as Principal Component Analysis. This subspace, conditioned by word relations, is used to display scientific words and all user data, with a position given by the meaning of user-defined data streams.

We can then display the semantic relations, converted into spatial distances, between words in a low-dimensionality space as shown (and equivalently for the case of syntax):

We can also display more complex concepts being explained in terms of Basic English by assigning a high value to the words being used in an illustrative labelled extra row displayed in the GCE space such as:

Humerus - The bone of the top part of the arm in man

In some cases, a mapping module converts words into a valid form (e.g. “worked” – “past work” & “unsafe” - not safe). It also deals with compound words (e.g. “outline”).

More complex concepts can be displayed that use previously defined simpler concepts:

Elbow - JOINT in the arm between the HUMERUS and the ULNA.

The problem here is that in order to map the word “elbow” we also need to use points defined by “joint” & “ulna” as well as humerus:

Joint - part or structure where two bones or parts of an animal's body are so joined that they have the power of motion in relation to one another.

Ulna - The back one of the two bones of the lower front leg in animals with 4 legs or the arm in man.

So, the solution is that where we need to use previously defined points to display a new more complex concept, we can simply join them together and build trajectories (i.e. ideograms) between a number of points in the GCE. This trajectory definition is done by the mapping module, which segments phrases as required.

Of course, we can build trajectories with many sentences, even multiple sources multiplexed all updating the GCE in a parallelized scheme, and in that process extrapolate and come across relevant facts, thereby resolving incorporating contextual data to a data-driven process.

Displaying knowledge in a preconditioned space...

Trajectories - Ideograms

A segmentation module helps to determine fragments that can be assigned to a point and those that must be broken down and displayed as a trajectory.

Elbow - JOINT in the arm between the HUMERUS and the ULNA...

this is really just a machine-generated multidimensional ideogram!

For any segment/point in a trajectory there is a definition:

[JOINT ; in the arm between the ; HUMERUS]

[HUMERUS ; and the ; ULNA]

Compare with Chinese symbols for example:

“人 man” + “木 tree” = “休 to rest” or...

“日 sun” + “月 moon” = “日 月 clear, bright”.

or even the Aztec symbol for Mazatlan: Mazatl: deer tlantle: teeth

In order to check if similar-meaning sentences in fact are displayed near one another, and the computer is actually able to understand language, we can try displaying four similar sentences, taken from the New Testament:LU - LUKE 23:38 And these words were put in writing over him, THIS IS THE KING OF THE JEWS.MT - MATHEW 27:37 - And they put up over his head the statement of his crime in writing, THIS IS JESUS THE KING OF THE JEWS.MC - MARK 15:26 - And the statement of his crime was put in writing on the cross, THE KING OF THE JEWS.JH - JOHN 19:19 And Pilate put on the cross a statement in writing. The writing was: JESUS THE NAZARENE, THE KING OF THE JEWS.

Notes: 1) where words have arguments, multiple points in the same position are generated (Jesus/Pilate).

2) There is no training, just the sentence being displayed in the GCE module memory space according to meaning!

Demonstration of computer understanding

In the previous graphical display we can see that this method does indeed convert semantic similarity into spatial proximity! Thus, if two phrases have the same meaning they will be mapped close to each other, even if different words or grammar is used.

In the display we can see that Mathew and Luke are closest, with Mark who mentions the cross some distance away, and the sentence that is most dissimilar of John lies furthest away. It is possible to do some conventional ascending hierarchical clustering and show these relationships as below:

Grouping into clusters has the effect, more generally, of agglomerating information according to knowledge domain. Thus, information about anatomy, the New Testament, etc. will agglomerate into distinct clusters.

REPRESENTATION OF THE HIERARCHICAL CLASSIFICATION

+--------+---------+---------+---------+---------+---------+---------+---------+---------+---------+

lu ---------------*----*----------------------------------------------------*------------------------*- | | | | mt ---------------- | | | | | | mc --------------------- | | | | jh -------------------------------------------------------------------------- | | prueba ---------------------------------------------------------------------------------------------------

One can obviously also do the reverse and create a dummy ideogram for a concept for which a certain response is required. This can include the firing of contextual information, updating or other actions related to a given concept. Note that the reply to a given type of concept, however stated, does not have to be a simple answer. It could have contextual data attached to the dummy ideogram, routines that have to be performed, updating, monitoring, calculations, etc. in order to give an output such as a command, action, answer or a simple statement as a response.

For illustration purposes, the diagnosis of diabetes could be undertaken by uploading (although the system could have acquired such knowledge itself by “reading” a book for instance and mapping this knowledge to the correct position and form) onto the knowledge base 5 common symptoms and if they are found to be true, then output the diagnosis of suspected diabetes. In this specific case some more complicated vocabulary is also required in order to display the relevant ideograms:

Dummy ideograms to serve as contextual knowledge:[Med.] An increase in thirst or urination in a child is a sign of diabetes.[Med.] Lethargy in a child is a sign of diabetes.[Med.] Increased desire for food with sudden or unexplained weight loss in a child is a sign of diabetes.[Med.] Vision changes in a child is a sign of diabetes.[Med.] Odor of fruit to the breath in a child is sign of diabetes.

Diagnosis [answer] Diabetes in a child has five common signs that have to be confirmed.

Responding to a given concept - I

Responding to a given concept - Diagnosis

In this example, if the user enters a sentence that is semantically close to one of the symptoms (“My child is thirsty and goes to urinate all day”), whatever the specific wording or grammar, the mapping process, contextual knowledge and association modules enable the system to identify the suspected symptom and, if all other symptoms are confirmed in the patient, the system is able to confirm the diagnosis as diabetes:

In order to test the viability of this approach for scientific material we took the example of some knowledge about anatomy (260 concepts). If we look at the following three phrases:

1) Joint in the arm between the humerus and the ulna. 2) Outgrowth of bone at the top end of the ulna, forming the point of the elbow, to which

the muscle pulling the lower arm straight is fixed. 3) A rounded expansion at the end of a bone which goes into the hollow end of another

bone forming a joint with limited power of motion.

The 3 concepts lie very close to each other. Thus, unstructured knowledge can be acquired, stored in a form and position related to its meaning and retrieved, with similar semantic units being stored in a similar shape and position, regardless of the specific wording or grammar used, without any kind of further training or association of ideas as in an ontology for example, i.e. the system is able to “understand” the meaning of language and logically stores knowledge accordingly (definition of elbow, olecranon & condyle respectively above).

On the other hand, this GCE is able to correctly acquire, store and retrieve these three concepts successfully and identify that they are closely related. If more information is input, it can be displayed in the correct semantic position and where many equivalent sentences are input, the GCE can fuse together trajectories as equivalent, if ideograms are similarly shaped and close.

Specialist knowledge & vocabulary

Specialist knowledge & vocabulary

Comparison of information (I)

As an example of the capability to compare incoming information, 4 paragraphs were compared and, by clustering analysis in the GCE space, those whose information was corroborated by one or more other sources were highlighted. The random example chosen was of the descriptions of the arrival of Christ to Jerusalem in the New Testament (Mathew 21 (1-11), Mark 11 (1-11), Luke 19 (28-40) & John 12 (12-19)) with no other text used neither to compare to or train the engine:

Comparison of information (II)

The Gospel of Mathew is the most corroborated, except for one fragment of text. Mark and Luke increasingly differ, while only one segment is corroborated of what John reports. Based on the corroboration of specific concepts by other Gospels, we can conclude that the most reliable is Mathew, followed by Mark, Luke and lastly John, whose account is the least reliable one; a view consistent with the order in which the Gospels are generally believed to have been written.

Luke, 19 (28-40)And when he had said this, he went on in front of them, going up to Jerusalem. And it came about that when he got near Beth-phage and Bethany by the mountain which is named the Mountain of Olives, he sent two of the disciples, Saying, Go into the little town in front of you, and on going in you will see a young ass fixed with a cord, on which no man has ever been seated; let him loose and take him. And if anyone says to you, Why are you taking him? say, The Lord has need of him. And those whom he sent went away, and it was as he said. And when they were getting the young ass, the owners of it said to them, Why are you taking the young ass? And they said, The Lord has need of him. And they took him to Jesus, and they put their clothing on the ass, and Jesus got on to him. And while he went on his way they put their clothing down on the road in front of him. And when he came near the foot of the Mountain of Olives, all the disciples with loud voices gave praise to God with joy, because of all the great works which they had seen; Saying, A blessing on the King who comes in the name of the Lord; peace in heaven and glory in the highest. And some of the Pharisees among the people said to him, Master, make your disciples be quiet. And he said in answer, I say to you, if these men keep quiet, the very stones will be crying out.

John, 12 (12-19)The day after, a great number of people who were there for the feast, when they had the news that Jesus was coming to Jerusalem, Took branches of palm-trees and went out to him, crying, A blessing on him who comes in the name of the Lord, the King of Israel! And Jesus saw a young ass and took his seat on it; as the Writings say, Have no fear, daughter of Zion: see your King is coming, seated on a young ass. (These things were not clear to his disciples at first: but when Jesus had been lifted up into his glory, then it came to their minds that these things in the Writings were about him and that they had been done to him.) Now the people who were with him when his voice came to Lazarus in the place of the dead, and gave him life again, had been talking about it. And that was the reason the people went out to him, because it had come to their ears that he had done this sign. Then the Pharisees said one to another, You see, you are unable to do anything: the world has gone after him.

Mathew 21 (1-11)And when they were near Jerusalem, and had come to Beth-phage, to the Mountain of Olives, Jesus sent two disciples, Saying to them, Go into the little town in front of you, and straight away you will see an ass with a cord round her neck, and a young one with her; let them loose and come with them to me. And if anyone says anything to you, you will say, The Lord has need of them; and straight away he will send them. Now this took place so that these words of the prophet might come true, Say to the daughter of Zion, See, your King comes to you, gentle and seated on an ass, and on a young ass. And the disciples went and did as Jesus had given them orders, And got the ass and the young one, and put their clothing on them, and he took his seat on it. And all the people put their clothing down in the way; and others got branches from the trees, and put them down in the way. And those who went before him, and those who came after, gave loud cries, saying, Glory to the Son of David: A blessing on him who comes in the name of the Lord: Glory in the highest. And when he came into Jerusalem, all the town was moved, saying, Who is this? And the people said, This is the prophet Jesus, from Nazareth of Galilee.

Mark 11 (1-11)And when they came near to Jerusalem, to Beth-phage and Bethany, at the Mountain of Olives, he sent two of his disciples, And said to them, Go into the little town opposite: and when you come to it, you will see a young ass with a cord round his neck, on which no man has ever been seated; let him loose, and come back with him. And if anyone says to you, Why are you doing this? say, The Lord has need of him and will send him back straight away. And they went away and saw a young ass by the door out-side in the open street; and they were getting him loose. And some of those who were there said to them, What are you doing, taking the ass? And they said to them the words which Jesus had said; and they let them go. And they took the young ass to Jesus, and put their clothing on him, and he got on his back. And a great number put down their clothing in the way; and others put down branches which they had taken from the fields. And those who went in front, and those who came after, were crying, Glory: A blessing on him who comes in the name of the Lord: A blessing on the coming kingdom of our father David: Glory in the highest. And he went into Jerusalem into the Temple; and after looking round about on all things, it being now evening, he went out to Bethany with the twelve.

Comparison of information (III) - Abduction

It is possible also to carry out a process of abduction by fusing together those sentences that are judged to be equivalent (i.e. very close) and placing similar text (i.e. close spatially) in a position closest to a sentence within the text which is chosen as the reference, in this case Matthew's paragraph was chosen as the reference:

And when he had said this he went on in front of them, going up to Jerusalem. And when they were near Jerusalem, and had come to Beth_phage, to the Mountain_of_Olives, Jesus sent two disciples saying to them: go into the little town in front of you, and straight away you will see an ass with a cord round her neck and a young one with her; let them loose and come with them to me. The day after, a great number of people who were there for the feast, when they had the news that Jesus was coming to Jerusalem, took branches of palm trees and went out to him crying: a blessing on him who comes in the name of the Lord, the king of Israel. And Jesus saw a young ass and took his seat on it; as the writings say: have no fear daughter of Zion, see your king is coming, seated on a young ass. And if anyone says anything to you, you will say: the lord has need of them; and straight away he will send them. And those whom he sent went away, and it was as he said. And when they were getting the young ass, the owners of it said to them, why are you taking the young ass, and they said: the Lord has need of him. And some of those who were there said to them, what are you doing taking the ass, and they said to them the words which Jesus had said; and they let them go. Now the people who were with him when his voice came to Lazarus in the place of the dead, and gave him life again, had been talking about it; that was the reason the people went out to him, because it had come to their ears that he had done this sign. Then the pharisees said one to another: you see, you are unable to do anything; the world has gone after him. And they took him to Jesus, and they put their clothing on the ass, and Jesus got on to him. Now this took place so that these words of the prophet might come true: say to the daughter of Zion, see, your king comes to you, gentle and seated on a young ass. And while he went on his way they put their clothing down on the road in front of him. And they went away and saw a young ass by the door outside in the open street; and they were getting him loose. And the disciples went and did as Jesus had given them orders, and got the ass and the young one, and put their clothing on them, and he took his seat on it . And all the people put their clothing down in the way; and others got branches from the trees and put them down in the way . And those who went before him, and those who came after, gave loud cries saying: glory to the son of David, a blessing on him who comes in the name of the lord, glory in the highest. And when he came into Jerusalem, all the town was moved saying: who is this. And he went into Jerusalem into the temple, and after looking round about on all things, it being now evening, he went out to Bethany with the twelve. And some of the pharises among the people said to him: master, make your much friend be quiet . And he said in answer: I say to you, if these men keep quiet, the very stones will be crying out. And the people said: this is the prophet Jesus from Nazareth of Galilee

Black: Matthew, Green: Mark, yellow: Luke, red: John

Summary: Chelyabinsk meteorite

• A first step towards an artificial cognition system is a system capable of intelligent human-machine interaction.

•Rachel is a conversational agent that can be contacted at www.rachaelrepp.org

• She has memory as previously described, including some recollection of her personal details, history, science and knowledge of a few books.

•She can also work out algebra and logic problems (STUDENT), clustering and syntactic analysis.

• She can understand standard English of 100,000 words using simplish.

• Rachael can also do some simple semantic associations.

She’ll be competing for the Loebner

Prize in 2014.

She has a multimedia interface in Blender

A practical application: Rachael Repp bot

Potential short term applications

• All sources intelligence analysis (can accommodate contextual data and multiple source trajectories) so we can arrive to actionable conclusions.

• Human-machine interaction.

• Internet large data volumes analysis.

• Semantic search engines

• Data mining

• Databases

• Games

• Bio-informatics

• Expert systems

• Virtual assistants

Thank You!!

www.thegoodwillcompany.co.uk