Upload
urian
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
BURC: B ootstrapping U sing R esearch C yc. By Kino Coursey. Introduction to the Problem. Goal: To extend Cyc’s knowledge base using “relationships implied to be possible, normal or commonplace in the world” Prior work with Cyc knowledge entry has been manually oriented - PowerPoint PPT Presentation
Citation preview
BURC: BURC: BBootstrapping ootstrapping UUsing sing RResearchesearchCCycyc
By Kino CourseyBy Kino Coursey
Introduction to the ProblemIntroduction to the Problem Goal: To extend Cyc’s knowledge base Goal: To extend Cyc’s knowledge base
using using “relationships implied to be possible, “relationships implied to be possible, normal or commonplace in the world”normal or commonplace in the world”
Prior work with Cyc knowledge entry has Prior work with Cyc knowledge entry has been manually orientedbeen manually oriented
How will we collect commonsense without How will we collect commonsense without a body and manual labor…?a body and manual labor…?
Read, Parse, Mine!Read, Parse, Mine! Proposal: Read text, Parse into a database, Proposal: Read text, Parse into a database,
Extract relations between words, Propose Extract relations between words, Propose hypothetical relations between conceptshypothetical relations between concepts
Common KnowledgeCommon Knowledge Using an information channel modelUsing an information channel model
• Information the Sender considers the Receiver to Information the Sender considers the Receiver to already knowalready know
• If the Sender does sends the info then …If the Sender does sends the info then … Receiver will consider the Receiver will consider the SenderSender to ‘lack intelligence or to ‘lack intelligence or
experience’ (experience’ (The sender is stupidThe sender is stupid).). Receiver will believe the sender thinks the Receiver will believe the sender thinks the ReceiverReceiver ‘lacks ‘lacks
intelligence or experience’ (intelligence or experience’ (The sender thinks I’m stupidThe sender thinks I’m stupid)) Possibly the Sender is clarifying which among many Possibly the Sender is clarifying which among many
possible common options they mean in this casepossible common options they mean in this case• Since both parties know the information to send it would Since both parties know the information to send it would
generate generate negative information contentnegative information content Explains why it is hard to find common sense on Explains why it is hard to find common sense on
the Internet!the Internet!
Basic AnalogyBasic Analogy
The Shotgun approach to the Human The Shotgun approach to the Human GenomeGenome
Extract millions of fragments then Extract millions of fragments then knit them back together by finding knit them back together by finding commonalitiescommonalities
Will it work for the Human Menome?Will it work for the Human Menome?
What is Cyc?What is Cyc? ““the world's largest and the world's largest and
most complete general most complete general knowledge base and knowledge base and commonsense reasoning commonsense reasoning engine”engine”
Started in mid 1980’s Started in mid 1980’s (“should take only 10 (“should take only 10 years….”)years….”)
Logic BasedLogic Based LISP orientedLISP oriented For WordNet users, each For WordNet users, each
Concept Concept ≈≈ Synset Synset Available from Available from
http://www.opencyc.orghttp://www.opencyc.org http://http://researchcyc.cyc.comresearchcyc.cyc.com
Big (ResearchCyc Big (ResearchCyc v0.8)v0.8)• Constants Constants 89,37989,379• Assertions Assertions 968,985968,985• Deduction Deduction 361,185361,185
Sample Collection ExtentsSample Collection Extents• EnglishWord EnglishWord 18,00718,007• Event Event 6,0506,050• PartiallyTangible PartiallyTangible 24,38724,387• Microtheory Microtheory 1,6881,688
Example of what Cyc currently Example of what Cyc currently knows about fingersknows about fingers
Collection : Collection : FingerFingerGAF Arg : 1GAF Arg : 1Mt : Mt : UniversalVocabularyMtUniversalVocabularyMt
isaisa : : AnimalBodyPartTypeAnimalBodyPartType genlsgenls : : Digit-Digit-AnatomicalPartAnatomicalPart
commentcomment : : "The collection of all digits "The collection of all digits of all of all HandHands (q.v.). Fingers are s (q.v.). Fingers are (typically) flexibly jointed and are (typically) flexibly jointed and are necessary to enabling the hand (and its necessary to enabling the hand (and its owner) to perform grasping and owner) to perform grasping and manipulation actions." manipulation actions."
Mt : Mt : BaseKBBaseKBdefiningMtdefiningMt : : AnimalPhysiologyVocabularyMtAnimalPhysiologyVocabularyMt
Mt : Mt : AnimalPhysiologyMtAnimalPhysiologyMtproperPhysicalPartTypesproperPhysicalPartTypes : : FingernailFingernail
Mt : Mt : WordNetMappingMtWordNetMappingMt ((synonymousExternalConceptsynonymousExternalConcept FingerFinger WordNet-Version2_0WordNet-Version2_0 "N05247839") "N05247839") ((synonymousExternalConceptsynonymousExternalConcept FingerFinger WordNet-1997VersionWordNet-1997Version "N04312497") "N04312497")
GAF Arg : 2GAF Arg : 2Mt : Mt : UniversalVocabularyMtUniversalVocabularyMt
((genlsgenls LittleFingerLittleFinger FingerFinger)) ((genlsgenls IndexFingerIndexFinger FingerFinger)) ((genlsgenls ThumbThumb FingerFinger)) ((genlsgenls RingFingerRingFinger FingerFinger)) ((genlsgenls MiddleFingerMiddleFinger FingerFinger))
Mt : HumanActivitiesMtMt : HumanActivitiesMt (bodyPartsUsed-TypeType Typing (bodyPartsUsed-TypeType Typing Finger)Finger)
Mt : HumanSocialLifeMtMt : HumanSocialLifeMt (bodyPartsUsed-TypeType (bodyPartsUsed-TypeType PointingAFinger Finger)PointingAFinger Finger)
Example of what Cyc currently Example of what Cyc currently knows about fingers - 2knows about fingers - 2
Mt : Mt : AnimalPhysiologyMtAnimalPhysiologyMt
-(-(conceptuallyRelatedconceptuallyRelated FingernailFingernail FingerFinger)) ((properPhysicalPartTypesproperPhysicalPartTypes HandHand FingerFinger)) ((relationAllInstancerelationAllInstance ageage FingerFinger ((YearsDurationYearsDuration 0 200)) 0 200)) ((relationAllInstancerelationAllInstance widthOfObjectwidthOfObject FingerFinger ((MeterMeter 0.001 0.2)) 0.001 0.2)) ((relationAllInstancerelationAllInstance heightOfObjectheightOfObject FingerFinger ((MeterMeter 0.001 0.2)) 0.001 0.2)) ((relationAllInstancerelationAllInstance lengthOfObjectlengthOfObject FingerFinger ((MeterMeter 0.01 0.5)) 0.01 0.5)) ((relationAllInstancerelationAllInstance massOfObjectmassOfObject FingerFinger ((KilogramKilogram 0.001 1)) 0.001 1))
GAF Arg : 3GAF Arg : 3
Mt : Mt : HumanPhysiologyMtHumanPhysiologyMt ((relationAllExistsrelationAllExists anatomicalPartsanatomicalParts HomoSapiensHomoSapiens FingerFinger))
Mt : Mt : VertebratePhysiologyMtVertebratePhysiologyMt ((relationAllExistsCountrelationAllExistsCount physicalPartsphysicalParts HandHand FingerFinger 5) 5)
Mt : Mt : UniversalVocabularyMtUniversalVocabularyMt ((relationAllOnlyrelationAllOnly wornOnwornOn Ring-JewelryRing-Jewelry FingerFinger))
Mt : Mt : AnimalPhysiologyMtAnimalPhysiologyMt ((relationExistsAllrelationExistsAll physicalPartsphysicalParts HandHand FingerFinger))
GAF Arg : 4GAF Arg : 4
Mt : Mt : GeneralEnglishMtGeneralEnglishMt ((denotationdenotation Finger-Finger-TheWordTheWord CountNounCountNoun 0 0 FingerFinger))
Bootstrapping with ResearchCycBootstrapping with ResearchCyc
Cyc has vocabulary about objects in the Cyc has vocabulary about objects in the world and relationshipsworld and relationships
Cyc could still use more common Cyc could still use more common relationshipsrelationships
BURC uses what Cyc already has + lots of BURC uses what Cyc already has + lots of parsed text to create new Cyc entries for parsed text to create new Cyc entries for common relationships found in the textcommon relationships found in the text
Lenat’s Bootstrap HypothesisLenat’s Bootstrap Hypothesis: once : once Cyc reaches a certain level/scale it can Cyc reaches a certain level/scale it can help in its own development and start help in its own development and start using NLP to augment its knowledge baseusing NLP to augment its knowledge base
BURC should help test this hypothesisBURC should help test this hypothesis
The BURC ProcessThe BURC Process From seeds…Hypothe-seed’s From seeds…Hypothe-seed’s
Use the link grammar parser for bulk Use the link grammar parser for bulk parsing of text, primarily narratives parsing of text, primarily narratives based in ‘worlds like ours’. Other text based in ‘worlds like ours’. Other text styles could be included. styles could be included.
Operates in two directions: Operates in two directions: • Forward from text to CycLForward from text to CycL• Backwards from existing CycL to the text to Backwards from existing CycL to the text to
find new forward patternsfind new forward patterns
BURC Process - 2BURC Process - 2 Load the link fragments into a database (1 and 2 Load the link fragments into a database (1 and 2
link fragments), and compute frequency of link fragments), and compute frequency of fragment occurrences. The database will be in a fragment occurrences. The database will be in a SQL format so multiple queries can be formed SQL format so multiple queries can be formed dynamically.dynamically.
Using Cyc knowledge as a starting point (the Using Cyc knowledge as a starting point (the seeds), extract knowledge for use in Cyc:seeds), extract knowledge for use in Cyc:• Given a set of seed facts in Cyc, identify how those facts Given a set of seed facts in Cyc, identify how those facts
are represented as link fragments in the databaseare represented as link fragments in the database• Generate conjectures as to new knowledge AND new Generate conjectures as to new knowledge AND new
knowledge extraction patterns using the fragment knowledge extraction patterns using the fragment patterns.patterns.
BURC Process - 3BURC Process - 3 Use Cyc knowledge directly to conjecture new Use Cyc knowledge directly to conjecture new
statements: statements: • Cyc has lexical knowledge, which can be used as Cyc has lexical knowledge, which can be used as
templates against the DB to form new statementstemplates against the DB to form new statements• For example, common adjectives applied to noun classes For example, common adjectives applied to noun classes • Cyc knows “WhiteColor” and “Blouse” but does not know Cyc knows “WhiteColor” and “Blouse” but does not know
that white is a common blouse color, although it becomes that white is a common blouse color, although it becomes apparent after reading some textapparent after reading some text
Optionally, gather supporting background statistics Optionally, gather supporting background statistics for hypothesis verification using other sources: for hypothesis verification using other sources: • Perhaps Google desktop with a larger than fully parsed Perhaps Google desktop with a larger than fully parsed
corpuscorpus• Perhaps check against answer extraction enginesPerhaps check against answer extraction engines
Flow of ProcessingFlow of Processing
BNC Data
Frag File
Merged Frag File
Cyc/Rcyc
Hypothesis File
Extractor / DB Manager
Parser1 Parser2 Parser3 Parser4 Parser5
Frag File
Frag File
Frag File
Frag File
Link Fragments DB
KNEXT (KNEXT (KNKNowledge owledge EXEXtraction traction from from TText)ext)
Deriving general world knowledge from texts and Deriving general world knowledge from texts and taxonomies:taxonomies:• http://www.cs.rochester.edu/~schubert/projects/world-http://www.cs.rochester.edu/~schubert/projects/world-
knowledge-mining.htmlknowledge-mining.html• Lenhart K. Schubert and Matthew Tong, Lenhart K. Schubert and Matthew Tong,
"Extracting and evaluating general world knowledge from "Extracting and evaluating general world knowledge from the Brown Corpus"the Brown Corpus", , Proc. of the HLT-NAACL Workshop on Text MeaningProc. of the HLT-NAACL Workshop on Text Meaning, May , May 31, 2003, Edmonton, Alberta, pp. 7-13.31, 2003, Edmonton, Alberta, pp. 7-13.
System extracts commonsense relationships from System extracts commonsense relationships from texttext
Limited to the pre-parsed Penn TreebankLimited to the pre-parsed Penn Treebank Generated 117,326 propositions (about 2 per Generated 117,326 propositions (about 2 per
sentence)sentence) About 60% judged reasonable by any given judgeAbout 60% judged reasonable by any given judge
KNEXT (Example) KNEXT (Example) (BLANCHE KNEW 0 SOMETHING MUST BE CAUSING STANLEY 'S NEW, STRANGE (BLANCHE KNEW 0 SOMETHING MUST BE CAUSING STANLEY 'S NEW, STRANGE
BEHAVIOR BUT SHE NEVER ONCE CONNECTED IT WITH KITTI WALKER.) BEHAVIOR BUT SHE NEVER ONCE CONNECTED IT WITH KITTI WALKER.)
A FEMALE-INDIVIDUAL MAY KNOW A PROPOSITION.A FEMALE-INDIVIDUAL MAY KNOW A PROPOSITION.SOMETHING MAY CAUSE A BEHAVIOR. SOMETHING MAY CAUSE A BEHAVIOR. A MALE-INDIVIDUAL MAY HAVE A BEHAVIOR. A MALE-INDIVIDUAL MAY HAVE A BEHAVIOR. A BEHAVIOR CAN BE NEW. A BEHAVIOR CAN BE NEW. A BEHAVIOR CAN BE STRANGE. A BEHAVIOR CAN BE STRANGE. A FEMALE-INDIVIDUAL MAY CONNECT A THING-REFERRED-TO WITH A FEMALE-A FEMALE-INDIVIDUAL MAY CONNECT A THING-REFERRED-TO WITH A FEMALE-
INDIVIDUAL.INDIVIDUAL. ((:I (:Q DET FEMALE-INDIVIDUAL) KNOW[V] (:Q DET PROPOS))((:I (:Q DET FEMALE-INDIVIDUAL) KNOW[V] (:Q DET PROPOS)) (:I (:F K SOMETHING[N]) CAUSE[V] (:Q THE BEHAVIOR[N])) (:I (:F K SOMETHING[N]) CAUSE[V] (:Q THE BEHAVIOR[N])) (:I (:Q DET MALE-INDIVIDUAL) HAVE[V] (:Q DET BEHAVIOR[N])) (:I (:Q DET MALE-INDIVIDUAL) HAVE[V] (:Q DET BEHAVIOR[N])) (:I (:Q DET BEHAVIOR[N]) NEW[A]) (:I (:Q DET BEHAVIOR[N]) NEW[A]) (:I (:Q DET BEHAVIOR[N]) STRANGE[A]) (:I (:Q DET BEHAVIOR[N]) STRANGE[A]) (:I (:Q DET FEMALE-INDIVIDUAL) CONNECT[V] (:Q DET THING-REFERRED-TO) (:I (:Q DET FEMALE-INDIVIDUAL) CONNECT[V] (:Q DET THING-REFERRED-TO) (:P WITH[P] (:Q DET FEMALE-INDIVIDUAL))))(:P WITH[P] (:Q DET FEMALE-INDIVIDUAL))))
Other Extraction Pattern ResearchOther Extraction Pattern Research
Towards Terascale Knowledge Acquisition Towards Terascale Knowledge Acquisition (Pantel, Ravichandran and Hovy, 2004)(Pantel, Ravichandran and Hovy, 2004)
Learning Surface Text Patterns for a Learning Surface Text Patterns for a Question Answering System (Ravichandran Question Answering System (Ravichandran & Hovy, 2002)& Hovy, 2002)
Defined Pattern Precision P = Ca/CoDefined Pattern Precision P = Ca/CoCa = total number of patterns with answer term presentCa = total number of patterns with answer term presentCo = Total number of patterns with any term presentCo = Total number of patterns with any term present
DIRT – DIRT – DDiscovery of iscovery of IInference nference RRules from ules from TText (Lin & Pantel, 2001)ext (Lin & Pantel, 2001)
Other Lexical Knowledge ResearchOther Lexical Knowledge Research
VerbOcean (Chklovski & Pantel): Collecting VerbOcean (Chklovski & Pantel): Collecting pairs and searching to verify relationshipspairs and searching to verify relationships
Lexical Acquisition via Constraint Solving Lexical Acquisition via Constraint Solving (Pedersen & Chen): Acquiring syntactic (Pedersen & Chen): Acquiring syntactic and semantic classification rules of and semantic classification rules of unknown words for LGPunknown words for LGP
Information Extraction Using Link Information Extraction Using Link Grammar papersGrammar papers
Automatic Meaning Discovery Using Automatic Meaning Discovery Using GoogleGoogle
Forward Mining Adjective RelationsForward Mining Adjective Relations
There are 1941 GAF’s on There are 1941 GAF’s on adjSemTrans,adjSemTrans, the the primary lexical adjective predicateprimary lexical adjective predicate
Find applicable fragments and use definitions:Find applicable fragments and use definitions:• ““Select * from LGPTable Where NumLinks=1 and Select * from LGPTable Where NumLinks=1 and
Link1='a' and Term1 like '%.a' and Term2 like '%.n‘ ”Link1='a' and Term1 like '%.a' and Term2 like '%.n‘ ”• Returns records [Term1.a | a | Term2.n] Returns records [Term1.a | a | Term2.n] • Potentially test using either an internal or search engine Potentially test using either an internal or search engine
based relevancy metricbased relevancy metric• Query Cyc for “(adjSemTrans <term1>-TheWord ?N Query Cyc for “(adjSemTrans <term1>-TheWord ?N
RegularAdjFrame (?Pred :NOUN ?Val))”RegularAdjFrame (?Pred :NOUN ?Val))”• Generate (plausiblePredValOFType <term2> <?Pred> Generate (plausiblePredValOFType <term2> <?Pred>
<?Val>)<?Val>)• Possibly generate parsing rulePossibly generate parsing rule
Mining Adjective Knowledge Mining Adjective Knowledge ExampleExample
““white blouse” as factoidwhite blouse” as factoid [white.a | a | blouse.n][white.a | a | blouse.n] Potentially test using an internal or search Potentially test using an internal or search
engine relevancy metric [GC=70400]engine relevancy metric [GC=70400] (adjSemTrans White-TheWord 11 (adjSemTrans White-TheWord 11
RegularAdjFrame RegularAdjFrame (mainColorOfObject :NOUN WhiteColor))(mainColorOfObject :NOUN WhiteColor))
Hypothesis: Hypothesis: (plausiblePredValueOfType (plausiblePredValueOfType Blouse mainColorOfObject WhiteColor)Blouse mainColorOfObject WhiteColor)
Mined Finger DescriptionsMined Finger Descriptions000010:(#$plausiblePredValueOfType #$Finger #$feelsSensation (#$PositiveAmountFn 000010:(#$plausiblePredValueOfType #$Finger #$feelsSensation (#$PositiveAmountFn
#$LevelOfSoreness)) #$LevelOfSoreness)) 000037:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong) 000037:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong) 000025:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong)000025:(#$plausiblePredValueOfType #$Finger #$forceCapacity #$Strong)000025:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject #$Hard) 000025:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject #$Hard) 000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject 000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject
(#$MediumToVeryHighAmountFn #$Hardness))(#$MediumToVeryHighAmountFn #$Hardness))000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject 000037:(#$plausiblePredValueOfType #$Finger #$hardnessOfObject
(#$MediumToVeryHighAmountFn #$Hardness))(#$MediumToVeryHighAmountFn #$Hardness))000002:(#$plausiblePredValueOfType #$Finger #$hasEvaluativeQuantity 000002:(#$plausiblePredValueOfType #$Finger #$hasEvaluativeQuantity
(#$MediumToVeryHighAmountFn #$Goodness-Generic))(#$MediumToVeryHighAmountFn #$Goodness-Generic))000002:(#$plausiblePredValueOfType #$Finger #$hasPhysicalAttractiveness #$GoodLooking) 000002:(#$plausiblePredValueOfType #$Finger #$hasPhysicalAttractiveness #$GoodLooking) 000047:(#$plausiblePredValueOfType #$Finger #$isa (#$LeftObjectOfPairFn :REPLACE)) 000047:(#$plausiblePredValueOfType #$Finger #$isa (#$LeftObjectOfPairFn :REPLACE)) 000015:(#$plausiblePredValueOfType #$Finger #$isa (#$RightObjectOfPairFn :REPLACE)) 000015:(#$plausiblePredValueOfType #$Finger #$isa (#$RightObjectOfPairFn :REPLACE)) 000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn 000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn
#$lengthOfObject :REPLACE #$highAmountOf))#$lengthOfObject :REPLACE #$highAmountOf))000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn 000155:(#$plausiblePredValueOfType #$Finger #$lengthOfObject (#$RelativeGenericValueFn
#$lengthOfObject :REPLACE #$highToVeryHighAmountOf)) #$lengthOfObject :REPLACE #$highToVeryHighAmountOf)) 000003:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$BlackColor) 000003:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$BlackColor) 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$LightYellowishBrown-000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$LightYellowishBrown-
Color) Color) 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject
#$ModerateYellowishBrown-Color)#$ModerateYellowishBrown-Color)000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$SunTan-FleshColor) 000010:(#$plausiblePredValueOfType #$Finger #$mainColorOfObject #$SunTan-FleshColor) 000002:(#$plausiblePredValueOfType #$Finger #$possessiveRelation #$SuddenChange) 000002:(#$plausiblePredValueOfType #$Finger #$possessiveRelation #$SuddenChange)
Mined Finger DescriptionsMined Finger Descriptions000006:(#$plausiblePredValueOfType #$Finger #$possessiveRelation (#$HighAmountFn 000006:(#$plausiblePredValueOfType #$Finger #$possessiveRelation (#$HighAmountFn
#$Speed))#$Speed))000094:(#$plausiblePredValueOfType #$Finger #$rigidityOfObject (#$HighAmountFn 000094:(#$plausiblePredValueOfType #$Finger #$rigidityOfObject (#$HighAmountFn
#$Rigidity))#$Rigidity))000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject
(#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highAmountOf)) (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highAmountOf)) 000052:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000052:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject
(#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highToVeryHighAmountOf))#$highToVeryHighAmountOf))
000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000060:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$highToVeryHighAmountOf))#$highToVeryHighAmountOf))
000285:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000285:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$veryLowToLowAmountOf))#$veryLowToLowAmountOf))
000074:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject 000074:(#$plausiblePredValueOfType #$Finger #$sizeParameterOfObject (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE (#$RelativeGenericValueFn #$sizeParameterOfObject :REPLACE #$veryLowToLowAmountOf)) #$veryLowToLowAmountOf))
000029:(#$plausiblePredValueOfType #$Finger #$speedOfObject-Underspecified 000029:(#$plausiblePredValueOfType #$Finger #$speedOfObject-Underspecified (#$LowAmountFn #$Speed)) (#$LowAmountFn #$Speed))
000138:(#$plausiblePredValueOfType #$Finger #$surfaceFeatureOfObj #$Slippery) 000138:(#$plausiblePredValueOfType #$Finger #$surfaceFeatureOfObj #$Slippery) 000074:(#$plausiblePredValueOfType #$Finger #$temperatureOfObject #$Warm) 000074:(#$plausiblePredValueOfType #$Finger #$temperatureOfObject #$Warm) 000004:(#$plausiblePredValueOfType #$Finger #$textureOfObject #$Rough) 000004:(#$plausiblePredValueOfType #$Finger #$textureOfObject #$Rough) 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject
(#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highAmountOf)) (#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highAmountOf)) 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject 000168:(#$plausiblePredValueOfType #$Finger #$thicknessOfObject
(#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highToVeryHighAmountOf))(#$RelativeGenericValueFn #$thicknessOfObject :REPLACE #$highToVeryHighAmountOf))000182:(#$plausiblePredValueOfType #$Finger #$wetnessOfObject #$Wet)000182:(#$plausiblePredValueOfType #$Finger #$wetnessOfObject #$Wet)
Verb Semantic Filtering -1Verb Semantic Filtering -1Discovering what a finger can do…Discovering what a finger can do…
A similar process can be used finding information based on verb A similar process can be used finding information based on verb semantic parsing framessemantic parsing frames
For each potential <NOUNWORD>-<VERB> pair query Cyc to find For each potential <NOUNWORD>-<VERB> pair query Cyc to find basic relationships using the verb semantic templatesbasic relationships using the verb semantic templates
(#$and (#$and (#$denotation <NOUNWORD> ?NOUNTYPE ?N ?CYCTERM)(#$denotation <NOUNWORD> ?NOUNTYPE ?N ?CYCTERM) (#$wordForms ?WORD ?PRED ""<VERB>"")(#$wordForms ?WORD ?PRED ""<VERB>"") (#$speechPartPreds ?POS ?PRED)(#$speechPartPreds ?POS ?PRED) (#$semTransPredForPOS ?POS ?SEMTRANSPRED)(#$semTransPredForPOS ?POS ?SEMTRANSPRED) (?SEMTRANSPRED ?WORD ?NUM ?FRAME ?TEMPLATE))(?SEMTRANSPRED ?WORD ?NUM ?FRAME ?TEMPLATE))
Verify for each potential relationship (<SPRED> <VERTERM> Verify for each potential relationship (<SPRED> <VERTERM> <CYCTERM>) derivable from ?TEMPLATE that it makes sense in <CYCTERM>) derivable from ?TEMPLATE that it makes sense in the ontologythe ontology
(#$and (#$and (#$arg1Isa <SPRED> ?VTYP)(#$arg1Isa <SPRED> ?VTYP) (#$arg2Isa <SPRED> ?CTYP)(#$arg2Isa <SPRED> ?CTYP) (#$genls <CYCTERM> ?CTYP)(#$genls <CYCTERM> ?CTYP) (#$genls <VERBTERM> ?VTYP) )(#$genls <VERBTERM> ?VTYP) )
Verb Semantic Filtering -2Verb Semantic Filtering -2Templates of Movement…Templates of Movement…
((verbSemTransverbSemTrans Move-Move-TheWordTheWord 0 0 IntransitiveVerbFrameIntransitiveVerbFrame ( (andand ( (isaisa :ACTION :ACTION MovementEventMovementEvent) ) ( (primaryObjectMovingprimaryObjectMoving :ACTION :SUBJECT))) :ACTION :SUBJECT)))
((verbSemTransverbSemTrans Move-Move-TheWordTheWord 1 1 IntransitiveVerbFrameIntransitiveVerbFrame ( (andand ( (isaisa :ACTION :ACTION ChangeOfResidenceChangeOfResidence) ) ( (performedByperformedBy :ACTION :SUBJECT))) :ACTION :SUBJECT)))
((verbSemTransverbSemTrans Move-Move-TheWordTheWord 2 2 TransitiveNPFrameTransitiveNPFrame ( (andand ( (isaisa :ACTION :ACTION CausingAnotherObjectsTranslationalMotionCausingAnotherObjectsTranslationalMotion) ) ( (objectActedOnobjectActedOn :ACTION :OBJECT) :ACTION :OBJECT) ( (doneBydoneBy :ACTION :SUBJECT))) :ACTION :SUBJECT)))
((arg1Isaarg1Isa performedByperformedBy ActionAction))((arg2Isaarg2Isa performedByperformedBy Agent-GenericAgent-Generic) )
Verb Semantic Filtering - 3Verb Semantic Filtering - 3 BURC can use Cyc’s knowledge of what things can perform BURC can use Cyc’s knowledge of what things can perform
what actions or have what attributes to filter out what actions or have what attributes to filter out implausible relationships.implausible relationships.
(#$behaviorCapableOf #$Finger #$CausingAnotherObjectsTranslationalMotion #$doneBy) (#$behaviorCapableOf #$Finger #$CausingAnotherObjectsTranslationalMotion #$doneBy) (#$behaviorCapableOf #$Finger #$ChangeOfResidence #$performedBy)(#$behaviorCapableOf #$Finger #$ChangeOfResidence #$performedBy)(#$behaviorCapableOf #$Finger #$Inspecting #$performedBy)(#$behaviorCapableOf #$Finger #$Inspecting #$performedBy) (#$behaviorCapableOf #$Finger #$Movement-TranslationEvent #$primaryObjectMoving) (#$behaviorCapableOf #$Finger #$Movement-TranslationEvent #$primaryObjectMoving) (#$behaviorCapableOf #$Finger #$MovementEvent #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$MovementEvent #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$PushingAnObject #$providerOfMotiveForce)(#$behaviorCapableOf #$Finger #$PushingAnObject #$providerOfMotiveForce)(#$behaviorCapableOf #$Finger #$Sliding-Generic #$objectMoving) (#$behaviorCapableOf #$Finger #$Sliding-Generic #$objectMoving) (#$behaviorCapableOf #$Finger #$Sliding-Generic #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$Sliding-Generic #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$Slipping #$objectMoving) (#$behaviorCapableOf #$Finger #$Slipping #$objectMoving) (#$behaviorCapableOf #$Finger #$Slipping #$primaryObjectMoving)(#$behaviorCapableOf #$Finger #$Slipping #$primaryObjectMoving)
Cyc Cyc cancan help in its own knowledge entry process. 62% of help in its own knowledge entry process. 62% of generated hypothesis were filtered out using semantic role generated hypothesis were filtered out using semantic role filtering.filtering.
The General Backwards ModelThe General Backwards Model
Given some Cyc relation Pred(?X,?Y)Given some Cyc relation Pred(?X,?Y) Create SQL search queryCreate SQL search query
• Lookup in Cyc lexical entries for X & Y Lookup in Cyc lexical entries for X & Y LX, LY LX, LY• Select * from LGPTable where Term1="<LX>" and Select * from LGPTable where Term1="<LX>" and
Term3="<LY>“Term3="<LY>“• System returns records [LX | Link1 | Term2 | Link2 | LY] (Freq) System returns records [LX | Link1 | Term2 | Link2 | LY] (Freq)
Generate new hypothetical extraction Generate new hypothetical extraction patternspatterns• Select * from LGPTable where Link1="<L1>" and Link2="<L2>" Select * from LGPTable where Link1="<L1>" and Link2="<L2>"
and Term2="<T2>“and Term2="<T2>“• [* L1 T2 L2 *] [* L1 T2 L2 *] generate hypothetical record ( Pred |?S1|?S3 ) generate hypothetical record ( Pred |?S1|?S3 )• Frequency information is propagated forwardFrequency information is propagated forward
Flow of ProcessingFlow of Processing
BNC Data
Frag File
Merged Frag File
Cyc/Rcyc
Hypothesis File
Extractor / DB Manager
Parser1 Parser2 Parser3 Parser4 Parser5
Frag File
Frag File
Frag File
Frag File
Link Fragments DB
Running the systemRunning the system
Used a filtered set of the BNC (650 Used a filtered set of the BNC (650 Meg of data)Meg of data)
5 parsers running in parallel for 70 5 parsers running in parallel for 70 hours generated 1.91 Gig of outputhours generated 1.91 Gig of output
Reduced to 1 Gig of unique records Reduced to 1 Gig of unique records with countswith counts
783 Meg or 22 million fragments783 Meg or 22 million fragments
Frequency of FragmentsFrequency of Fragments
The distribution of The distribution of fragments follow a fragments follow a smooth curve in smooth curve in log spacelog space
Similar to zipf Similar to zipf distribution for distribution for words, characters words, characters and n-gramsand n-grams
Number of fragments at each frequency level
1
10
100
1000
10000
100000
1000000
10000000
100000000
Number of Occurances
Nu
mb
er o
f F
rag
men
ts
The Hunt for Common FragmentsThe Hunt for Common Fragments
Forward mining was run over Forward mining was run over adjective links with more than one adjective links with more than one fragment and subject-verb with more fragment and subject-verb with more than two linksthan two links
In both cases this was approximately In both cases this was approximately the top 15% for each search classthe top 15% for each search class
ReductionsReductions
0
200000
400000
600000
800000
1000000
Elements
Filtering Stage
From Fragments into Hypothesis
Adjectives 996810 147074 26690
Subject-Verbs 934029 144208 9079
Raw Fragments
Common Fragments
Generated Hypothesis
A source of potential knowledgeA source of potential knowledge The various versions The various versions
of Cyc have 10 to 20 of Cyc have 10 to 20 assertions per assertions per constantconstant
BURC generates 14.29 BURC generates 14.29 hypothetical hypothetical assertions per assertions per constantconstant
Need to quantify the Need to quantify the quality of BURC quality of BURC knowledgeknowledge
0
50
100
150
200
250
300
350
1 5 9 13 17 21 25 29 33 37 41 45 49
Number of Hypothesis
Hypothesis generated for constants
Number ofconstants
Future Work -1Future Work -1 Modify Cyc to utilize the extracted knowledgeModify Cyc to utilize the extracted knowledge
• Question generation (curiosity ?)Question generation (curiosity ?)• Noticing exceptionsNoticing exceptions
Update parser and generate data in other Update parser and generate data in other knowledge formats (i.e. OpenMind/ConceptNet)knowledge formats (i.e. OpenMind/ConceptNet)
Generate better filtering methods for polysemous Generate better filtering methods for polysemous words in fragmentswords in fragments
Use synonyms and antonyms to expand Use synonyms and antonyms to expand hypothesis using WordNethypothesis using WordNet
Examine effect of reporting the unusual instead of Examine effect of reporting the unusual instead of the usualthe usual
Future Work -2Future Work -2 Define admissibility criteria. How much Define admissibility criteria. How much
evidence is necessary to consider a fact evidence is necessary to consider a fact worthy of addition to the KB as worthy of addition to the KB as commonplace? commonplace?
Determine performance relative to and in Determine performance relative to and in conjunction with volunteer commonsense conjunction with volunteer commonsense knowledge entry projects.knowledge entry projects.
Create an interface for quick review of Create an interface for quick review of hypothesis by humans.hypothesis by humans.
Utilize knowledge and experience on the Utilize knowledge and experience on the backwards minerbackwards miner
Can we ever be “Done” ?Can we ever be “Done” ?
Explore definition of semantic Explore definition of semantic coverage metrics for unmapped coverage metrics for unmapped domains. domains.
The space of 2.4K of binary The space of 2.4K of binary predicates applied to 85K constants predicates applied to 85K constants provides a 16 trillion combination provides a 16 trillion combination search space, only a fraction of search space, only a fraction of would be considered part of would be considered part of ‘common knowledge’. ‘common knowledge’.