QUIRK:Project Progress Report December 3-5 2002
Cycorp IBM
Notable Progress
• Query decomposition extensions
• Argument-structure approximation
• Syntactic analysis of textual sources
• Reflexive justifications
Single Literal Query Decomposition
P?
Q?, R?, Z?
Q? R? Z?
(Q & R & Z) P
Multi Literal Query Decomposition
P?
Q?, R?, Z?
Q?, R? Z?
(Q & R & Z) P
Multi Literal Query Decomposition
(likes Bob ?X)
(isa X French), (isa X Movie), (likes Amy X)
((isa X French) & (isa X Movie) & (likes Amy X)) (likes Bob X)
(likes Amy X)(isa X French), (isa X Movie),
Examples
• Joins in external DBs (NIMA, USGS… )
– Airports in Travis County, TX
– Hospitals located in port cities
– ...
• Web services, e.g. IMDB
– Actors from the ‘50s
• As a bridge between KR formats
Davidsonian KR bridge
Wellington defeated Napoleon in Waterloo.
(thereExists ?EV (and (isa ?EV DefeatingAnOpponent) (performedBy ?EV Wellington) (objectActedOn ?EV Napoleon) (eventOccursAt ?EV Waterloo)))
Argument-Type bridge
John lives in a French village
(thereExists ?V (and (isa ?V Village) (geographicalSubRegions France ?V) (residesInRegion John ?V)))
Registration of multi-literal removal modules
• at the moment sufficiently few such modules exist that they can be defined in code
• plans for declarative registration of such modules in Cyc’s KB even with run-time KB edits.
Arg based query generation
(thereExists ?EV (and (isa ?EV AttackOnObject) (maleficiary ?EV Djibouti) (performedBy ?EV ?WHO))
[SUBJ [VERB OBJ]]
@PHR(2 PERSON$ attack *Djibouti)
Secretary
• Input:– A CycL query such as (president France ?WHO)– A textual paragraph
• Output: a ranked list of CycL terms that– represent entities mentioned in the paragraph – are type-appropriate as substitutions for the free
variables in the query (?WHO:Person)
• Three types of Secretary
Secretary 1• Use IBM’s Talent system to learn new lexical
entries
• Tag paragraph with lexical mappings
• Select type-appropriate CycL tags
• Rank them by proximity to query focus, as determined by recorded position in the paragraph of all the ground terms in the CycL query
Secretary 2
• Use IBM’s Talent system to learn new lexical entries
• Use output of UPenn’s dependency parser to generate a set of CycL interpretation of the paragraph
• Select “best” interpretation
• Return CycL entity in the appropriate relationship to the query’s predicate.
Secretary 3• Use IBM’s Talent system to learn new lexical
entries
• Use output of UPenn’s dependency parser to generate a set of CycL interpretation of the paragraph
• Select “best” interpretation and turn it into a virtual assertion in Cyc’s KB
• Ask the original query in the KB so obtained
• Return all answers.
General observations
• Secretary 2 and 3 have better precision than Secretary 1, but much lower recall– possibly due to the non-verb-like nature of
many Cyc predicates; need to check if the same holds true of multi-literal events
• Linear proximity of Secretary 1 is almost as good as the argument based analysis of Secretary 2 and 3
Introspective Justifications
Introspective Justifications
Dialog evaluation
• Basic knowledge representation performed for each of the topics
• Used KRAKEN GUI for interpretation of questions
• Used KRAKEN NL generation for reporting answers to analyst
KRAKEN GUI
• Which paintings about war did Picasso create?
Contextual vs Keyhole approach
• Several questions asked simultaneously:– “Need background data on the Cuban dissident
Elizardo Sanchez to include birth data, education, work ethics, organization affiliations to name a few.”
• Analyst happy with a summary of all facts known about an entity of interest
Lessons learned
• Analysts like to ask/see questions/anwers in context
• Single question/single answer approach could be extended to:– dossier about entity X– preliminary dialog on desired properties of dossier
inferred from properties of entity X
• Justifications become interesting only if answers are sufficiently surprising.
Definitional Questions Evaluation
• Expectations:– large answer set– both redundancy AND irrelevance– opportunity for structuring answer set by salient
features of question focus
• Actual experience– limited answer set– mostly redundancy
Original Plan• Use appositives to learn type of entity
“Massimo Cacciari” “Venice Mayor”• Use Cyc to
– understand type (a kind of elected official)– generatelist of questions salient for the type
• when was he elected?
• what is his party affiliation?
• …
• Answer salient questions from textual sources
Revised Plan
• Use syntactic analysis to extract appositives and relevant VPs
• Cluster strings so extracted
• Return one string from each cluster, ranked by the size of the cluster.
Lessons learned
• Punctuation and function words are crucial
• Textual sources don’t always support an analysis by “salient features”
• Semantic analysis not necessarily useful of the end result is expected to be a string that could be easily interpreted by the analyst.