20
1 Cold Start Knowledge Base Population at TAC 2016 Task Description 1 Version 1.0 of July 11, 2016 What’s New............................................................................................................................................. 2 Introduction........................................................................................................................................... 2 Schema ................................................................................................................................................................ 6 Document Collection ...................................................................................................................................... 7 Evaluation Queries.......................................................................................................................................... 9 Task Output – Knowledge Base Variant ..................................................................................... 11 Entities ............................................................................................................................................................. 11 Predicates ....................................................................................................................................................... 11 Task Output – Slot Filling Variant................................................................................................. 13 Task Output – All Variants .............................................................................................................. 14 Provenance ..................................................................................................................................................... 14 Confidence Measure .................................................................................................................................... 15 Comments ....................................................................................................................................................... 15 Examples ......................................................................................................................................................... 15 Differences between 2014 Slot Filling and the 2016 Cold Start Slot Filling Variant... 16 Evaluation............................................................................................................................................. 16 Slot Filling Assessment ............................................................................................................................... 16 Slot Filling Scoring ....................................................................................................................................... 18 Entity Discovery Scoring ............................................................................................................................ 19 Submissions ......................................................................................................................................... 19 Change History .................................................................................................................................... 20 1 The TAC organizing committee welcomes comments on this Task Description, or on any aspect of the TAC evaluation. Please send comments to [email protected].

Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

1

ColdStartKnowledgeBasePopulationatTAC2016TaskDescription1Version1.0ofJuly11,2016

What’sNew.............................................................................................................................................2

Introduction...........................................................................................................................................2Schema................................................................................................................................................................6DocumentCollection......................................................................................................................................7EvaluationQueries..........................................................................................................................................9

TaskOutput–KnowledgeBaseVariant.....................................................................................11Entities.............................................................................................................................................................11Predicates.......................................................................................................................................................11

TaskOutput–SlotFillingVariant.................................................................................................13TaskOutput–AllVariants..............................................................................................................14Provenance.....................................................................................................................................................14ConfidenceMeasure....................................................................................................................................15Comments.......................................................................................................................................................15Examples.........................................................................................................................................................15

Differencesbetween2014SlotFillingandthe2016ColdStartSlotFillingVariant...16

Evaluation.............................................................................................................................................16SlotFillingAssessment...............................................................................................................................16SlotFillingScoring.......................................................................................................................................18EntityDiscoveryScoring............................................................................................................................19

Submissions.........................................................................................................................................19

ChangeHistory....................................................................................................................................20

1TheTACorganizingcommitteewelcomescommentsonthisTaskDescription,oronanyaspectoftheTACevaluation.Pleasesendcommentstotac-kbp@nist.gov.

Page 2: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

2

What’sNewColdStart2016hastwotaskvariants:Afullend-to-endKnowledgeBaseConstruction(CSKB)task,andacomponentSlotFilling(CSSF)task.AcomponentEntityDiscoveryandLinking(EDL)taskisorganizedcompletelyundertheEDLtrack,buthasthesameinputdocumentsasCSKBandCSSF,toallowtheentitydiscoverycomponentofCSKBsystemstobeevaluatedonthesamedocumentsasstandaloneEDLsystems.Inordertoenableteamswithslotfillingsystemstoalsoparticipateintheend-to-endKBconstructiontask,twoEDLevaluationwindowsareofferedandstagedsuchthatteamsconstructingaKBaregiventheoutputofEDLsystemsparticipatinginthefirstEDLevaluationwindow.

Thisdocumentdescribesthe2016ColdStartSF/KBConstructiontasks.ThedetailedtaskdescriptionforEDLisattheEDL2016website(http://nlp.cs.rpi.edu/kbp/2016/).

The2016ColdStartSF/KBConstructiontasksareidenticaltothe2015tasks,withthefollowingchanges:

1. TheColdStarttasksarecross-lingual;inadditiontoEnglish,the2016sourcecorpusincludesChineseandSpanishdocuments.Cross-lingualSF/KBConstructionsystemsmayreturnentitymentions,slotfillersandprovenancefromanycombinationofEnglish,Chinese,andSpanishdocuments.Additionally,3diagnosticmonolingualversionsofthesetasksareoffered(oneforeachlanguage),inwhichentitymentions,slotfillersandprovenancemustcomefromonlythesinglelanguage.

2. Inadditiontoperson(PER),organization(ORG),andgeopoliticalentity(GPE)types,KBConstructionsystemsmustreturnmentionsoflocation(LOC)andfacility(FAC)entities(althoughtheslotinventorywillnotbemodifiedtoincludeLOCandFACentities).

3. Inadditiontonamedmentions,KBConstructionsystemsmustextractandlinkallnominalmentionsofspecificindividualPER,ORG,GPE,LOC,andFACentities.

4. ColdStartSF/KBConstructionsystemsmayreturnanominalmentionasafillerifnonamementionisavailableinthesourcecorpus.

IntroductionSince2009,TAChasevaluatedperformanceontwoimportantaspectsofknowledgebasepopulation:entitylinkingandslotfilling.ThegoaloftheColdStarttrackistoexercisebothoftheseareas,andevaluatetheabilityofasystemtousethesetechnologiestoactuallyconstructaknowledgebase(KB)fromtheinformationprovidedinatextcollection.ColdStartparticipantsbuildasoftwaresystemthatprocessesalargetextcollectionandcreatesaknowledgebasethatisconsistentwithandaccuratelyrepresentsthecontentofthatcollection.Theknowledgebaseisthenevaluatedasasingleconnectedresource,usingqueriesthattraverseentitynodesandrelation(slot)linksintheKBtodetermineiftheKBcontainscorrectrelationsbetweencorrectentities.

In2016,ColdStarthastwotaskvariants.

1. IntheKnowledgeBasevariant(CSKB),participantssubmitentireknowledgebases,withoutpriorknowledgeoftheevaluationqueries.

2. TheSlotFillingvariant(CSSF)supersedesthe2014SlotFillingtrackandisdesignedtomakeiteasyforsiteswithslotfillingsystemstoparticipateinColdStart.Inthisvariant,theColdStartevaluationqueriesaresplitintoColdStartSlotFillingqueries,withoneentrypointperquery,andaredistributedatthestartofthetaskevaluationwindow.Participantsdonothavetosubmitentireknowledgebases.Rather,theyapplytheirslotfillingsystem

Page 3: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

3

twice,thefirsttimeontheentrypointforeachquery,thesecondtimeoneachoftheresultsofthefirstround.

TheEntityDiscoveryandLinkingtaskandtheSlotFillingtaskhavedoneagoodjobofevaluatingkeycomponentsofknowledgebasepopulation.Theydonot,however,evaluateeveryaspectofanautomaticallygeneratedknowledgebase.Thingsonemightliketoknowaboutsuchaknowledgebaseinclude:

• Aretheentitiesintheknowledgebasecorrectlytiedtoreal-worldmentionsofthoseentities?TACEntityDiscoveryandLinking(EDL)taskshavemeasuredthis.

• Arethefactsandrelationsintheknowledgebaseaccuratereflectionsofthefactsandrelationsdescribedinthesourcedocuments?TheTACSlotFillingtaskshavemeasuredthis,aswillTACColdStartSF.

• Areentitylinkingandslotfillingcorrectlycoordinatedtoproduceameaningfulknowledgebase?TheTACColdStartKBtaskmeasuresthis.

• Cantheknowledgebasecorrectlyperforminferenceovertheextractedentities,suchastemporalreasoning,confidenceestimation,defaultreasoning,transitiveclosure,etc.?ColdStartisjustbeginningtomeasurethis;itisdesignedtofacilitatethiskindofevaluationmorethoroughlyinfutureyears.

WecallthetaskColdStartKnowledgeBasePopulationtoconveytwofeaturesoftheevaluation:itimpliesboththataknowledgebaseschemahasbeenestablishedatthestartofthetask,andthattheknowledgebaseisinitiallyunpopulated.Thus,weassumethataschemaexistsfortheentities,facts,andrelationsthatwillcomposetheknowledgebase;itisnotpartofthetasktoautomaticallyidentifyandnamefactsandrelationshipspresentinthetextcollection.In2016,weuseaschemathatcombinestheentitytypesfromTACKBP2016EntityDiscoveryandLinking,andtherelationtypesfromTACKBP2015ColdStartKnowledgeBasePopulation.Thus,theschemawillincludefiveentitytypes(person,organization,geopoliticalentity,facility,andlocation)andforty-onerelationtypesandtheirinverses.Forrelationswhosefillsarethemselvesentities(suchasper:siblingsororg:subsidiaries),CSKBsystemswillberequiredtolinkthatslottothenodeinthesubmittedKBrepresentingthecorrectentity2.Slotswhosefillsarestrings(suchasper:titleororg:website)willsimplyusestringstorepresenttheinformation.

ColdStartalsoimpliesthattheknowledgebaseisinitiallyempty.ToavoidsolutionsthatrelyonverifyingcontentalreadypresentinWikipediaorotherlargedatasourcesaboutentities,thequeriesusedinColdStartwillbedominatedbyentitiesthatarenotpresentinWikipedia.

Allparticipatingsystemswillreceivethefollowinginput:

1. adocumentcollection;2. aknowledgebaseschema

Fromthese,systemsparticipatingintheKnowledgeBasevariantwillproduceaknowledgebase.ThisKBwillbesubmittedtoNISTasasetofaugmentedtriples.ParticipatingKBsystemsmusttieeachentitymentioninthedocumentcollectiontoaparticularKBentitynode;inthisway,theknowledgebasecanbequeriedwithoutfirstaligningittoareferenceknowledgebase.Knowledgebaseswillincludemention, nominal_mention, canonical_mention,andtypetriples,aswellasthefullrangeofslotfills(alltriplesaredescribedmorefullybelow).

SystemsparticipatingintheSlotFillingvariantwillalsoreceive:2Becausefacilityandlocationentitiesarenotincludedintheslotdefinitions,onlyperson,organization,andgeopoliticalentitynodesmustbelinkedtotheslots.

Page 4: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

4

3. asetofColdStartSlotFilling(CSSF)evaluationqueries(eachevaluationqueryisasequenceofoneortwoslotfillingqueriestobeappliedinseries).

Forbothvariants,theresultswillthenbeevaluatedbyNIST:

• SystemsparticipatingintheSlotFillingvariantreturnslotfillersdirectlyinresponsetothegivenCSSFevaluationqueries,andthefillersarethenassessedandscoredforprecisionandrecall.

• EvaluationoftheKnowledgeBasevariantwillstartbyapplyingthesameCSSFevaluationqueriestothesubmittedknowledgebase.Eachquerywillstartatanamedentitymentioninadocument(identifiedbythequery’s<beg>and<end>tags),identifytheknowledgebaseentitythatcorrespondstothatmention,followasequenceofoneormorerelationswithintheknowledgebase,andendinaslotfill.TheresultingslotfillswillbeassessedandscoredinthesamewayasintheSlotFillingvariant.Forexample,aCSSFevaluationquerymightask‘whataretheagesofthesiblingsoftheBartSimpson3mentionedinDocument42?’AsystemthatcorrectlyidentifieddescriptionsofBart’ssiblingsinthedocumentcollection,linkedthemtotheappropriatenodeintheKB,andalsofoundevidenceforandcorrectlyrepresentedtheagesofthosesiblingswouldreceivefullcredit.

3ManyoftheexamplesusedtoillustratetheColdStarttaskaredrawnfromTheSimpsonstelevisionshow.ReaderslackingadetailedworkingknowledgeofgenealogicalrelationshipsintheBouvier/Simpsonfamilyneednotagonizeoverwhattheyhavebeendoingwiththeirlivesforthepastquartercentury,butmaysimplyvisithttp://simpsons.wikia.com/wiki/Simpson_Family.

Page 5: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

5

Relation Inverse(s)per:children per:parents per:other_family per:other_family per:parents per:children per:siblings per:siblings per:spouse per:spouse per:employee_or_member_of {org,gpe}:employees_or_members* per:schools_attended org:students* per:city_of_birth gpe:births_in_city* per:stateorprovince_of_birth gpe:births_in_stateorprovince* per:country_of_birth gpe:births_in_country* per:cities_of_residence gpe:residents_of_city* per:statesorprovinces_of_residence gpe:residents_of_stateorprovince per:countries_of_residence gpe:residents_of_country* per:city_of_death gpe:deaths_in_city* per:stateorprovince_of_death gpe:deaths_in_stateorprovince* per:country_of_death gpe:deaths_in_country* org:shareholders {per,org,gpe}:holds_shares_in* org:founded_by {per,org,gpe}:organizations_founded* org:top_members_employees per:top_member_employee_of* {org,gpe}:member_of org:members org:members {org,gpe}:member_of org:parents {org,gpe}:subsidiaries org:subsidiaries org:parents org:city_of_headquarters gpe:headquarters_in_city* org:stateorprovince_of_headquarters gpe:headquarters_in_stateorprovince* org:country_of_headquarters gpe:headquarters_in_country*

Table1.Entity-valuedslots.SlotswithasterisksrepresentinverserelationsthatwillneedtobeaddedbyparticipantsfrompreviousyearsSlotFillingtask(2014andearlier).Thetypequalifierofeachrelation(per ,org orgpe)isthetypeofitssubject,whilethetypequalifierforitsinverseisthetypeofitsobject.Asetoftypesmeansthatanyofthosetypesisacceptableforthatslot.Allsubmittedslotnamesmustuseonlyasingletypespecification.

Page 6: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

6

Schema

TheschemaforColdStart2016combinestheentityandmentiontypesfromTACKBP2016EntityDiscoveryandLinking,andtherelationtypesfromTACKBP2015ColdStartKnowledgeBasePopulation.Thus,theschemaincludesfiveentitytypes(person,organization,geopoliticalentity,facility,andlocation)andforty-onerelationtypesandtheirinverses.Annotation/assessmentguidelinesareavailableontheTACwebsite(http://www.nist.gov/tac/2016/KBP/ColdStart/guidelines.html),andaremorefullydocumentedinthedatapackagesthatcanberequestedfromtheLDCuponcompletionofTACKBPtrackregistration.

ColdStartentitiesandentitymentionsaredefinedbyDEFTRichERE.FullannotationguidelinesforDEFTRichEREentitiesareincludedintheDEFTRichEREannotationpackages,availablefromtheLDC,butahigh-levelsummaryofthefiveentitytypesandtheirmentionsareavailableinRichEREAnnotationGuidelinesOverview.ForColdStart,theentitymentiontypesthatmustbeextractedarelimitedtonamedandnominalmentions,andtheentitiesmustbespecificindividualentities(asdescribedinAnnotationGuidelinesforIndividualityofSpecificEntities).AColdStartnamedentitymentionisthesameasanamedentitymentioninRichERE;i.e.,aColdStartnamedentitymentionisamentionthatuniquelyreferstoanentitybyitspropername,acronym,nickname,alias,abbreviation,orotheralternatename,andincludespostauthornamesfoundinthemetadataofdiscussionforumdocuments.Theextentofthenamedentitymentionistheentirestringrepresentingthename,excludingtheprecedingdefinitearticleandanyotherpre-posedorpost-posedmodifiers.AColdStartnominalentitymentionistheheadofthenominalentitymentioninRichERE;i.e.,aColdStartnominalentitymentionisamentionnotincludingtheentity'spropername,referringtoitbyacommonnounphrase(butforColdStart,thenominalmentionisonlytheheadnounofthenominalphrase).Entitymentionsareallowedtonestoroverlap;forexample,thestring“PhiladelphiaEagles”mightbeamentionofanORG(thefootballteam),whilethefirstwordmightsimultaneouslybeamentionofaGPE(thecityofPhiladelphia).

TheColdStartinventoryofslotsisdescribedthoroughlyinTACKBP2015SlotDescriptionsandTACKBP2015AssessmentGuidelinesavailableontheTACWebsite.Forty-oneslotsandtheirinversesareusedfortheevaluation.Twenty-sixofthesehavefillsthatarethemselvesentities,asshowninTable1.Theremainingfifteenslotshavestringfills,asshowninTable2.Eachentity-valuedslotwill

per:alternate_names org:alternate_names per:date_of_birth org:political_religious_affiliation per:age org:number_of_employees_members per:origin org:date_founded per:date_of_death org:date_dissolved per:cause_of_death org:website per:title per:religion per:charges

Table2.String-valuedslots.

Page 7: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

7

haveaninverse.4Allinverserelationsmustbeexplicitlyidentifiedinthesubmittedknowledgebase.Thatis,iftheKBassertsthatrelationRholdsbetweenentitiesAandB,thenitmustalsoassertthatrelationR-1holdsbetweenBandA.Asaconvenience,theColdStartKBvalidationscriptcanbeusedtointroducemissinginversesintoaKB.

DocumentCollection

TheColdStart2016evaluationdocumentcollectionwillbetheTACKBP2016EvaluationSourceCorpus,whichcomprisesapproximately90,000documents,roughlyequallydistributedbetweenEnglish,Spanish,andChinese,andbalancedbetweennewswire(NW)andmulti-postdiscussionforum(MPDF)documents.Thesedocumentswillbenew(previouslyunreleased)documentsthatwillbedistributedbyNISTviaWebdownloadatthebeginningoftheColdStartevaluationwindow.Therewillbeexactlyonefileperdocument,andallfileswillbeparsableasXML.Eachfilewillbeginwiththeopeningtagofthe<DOC>element(<doc>forMPDF);5notethat<DOC>canbespelledwitheitheruppercaseorlowercaseletters,dependingonthegenre,andmayoptionallyincludeadditionalattributes(suchas"type"forsomenewswiredata).

Newswiredatawillusethefollowingmarkupframework:

<DOC id="{doc_id_string}" type="{doc_type_label}">

<HEADLINE>

...

</HEADLINE>

<DATELINE>

...

</DATELINE>

<TEXT>

<P>

...

</P>

...

</TEXT>

</DOC>

4Someslots,suchasper:siblings,aresymmetric.Others,suchasper:parents,haveinversesthatwerealreadyinthe2014EnglishSlotFillingtrack(inthiscase,per:children).Theremainingslots(e.g.,org:founded_by)hadnocorrespondingslotinthe2014EnglishSlotFillingtrack;ColdStartspecifiesnewslotnamesfortheseinverses.Allsuchslotsarelist-valued.5IncontrasttosomeoftheKBPsourcecorporafrompreviousyears,theTACKBP2016SourceCorpuswillnotcontainanyfilesthatbeginwithxmldeclarationssuchas<?xmlversion="1.0"encoding="utf-8"?>.ThisistoensurethatoffsetsalignacrossthevariousKBP2016tracksthatareusingthissameevaluationsourcecorpus,regardlessofwhetheroffsetsarecountedfromthebeginningofthefile,orthebeginningofthe<DOC>tag.

Page 8: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

8

wheretheHEADLINEandDATELINEtagsareoptional(notalwayspresent),andtheTEXTcontentmayormaynotinclude"<P>...</P>"tags(dependingonwhetherornotthe"doc_type_label"is"story").

Multi-PostDiscussionForumfiles(MPDFs)arederivedfromDiscussionForumthreads.Theyconsistofacontinuousrunofpostsfromathreadbuttheyareonlyapproximately800wordsinlength(excludingmetadataandtextwithin<quote>elements).Whentakenfromashortthread,aMPDFmaycomprisetheentirethread.However,whentakenfromlongerthreads,aMPDFisatruncatedversionofitssource,thoughitwillalwaysstartwiththepreliminarypost.TheMPDFfileswillusethefollowingmarkupframework,inwhichtheremayalsobearbitrarilydeepnestingofquoteelements,andotherelementsmaybepresent(e.g."<a...>...</a>"anchortags):

<doc id="{doc_id_string}">

<headline>

...

</headline>

<post ...>

...

<quote ...>

...

</quote>

...

</post>

...

</doc>

Allprovenance/justificationsforallKBP2016tasksmustbedrawnfromthedocumentsintheTACKBP2016EvaluationSourceCorpus.EachdocumentisrepresentedasaUTF-8characterarrayandbeginswiththe<DOC>tag,wherethe“<”characterhasindex0forthedocument.Thus,offsetsforprovenancearecountedbeforeXMLtagsareremoved.Startoffsetsmustbetheindexofthefirstcharacterinthecorrespondingstring,andendoffsetsmustbetheindexofthelastcharacterofthestring(therefore,thelengthofthecorrespondingstringisendoffset–startoffset+1).

AllKBP2016systemsshouldreturnextractionsfromanywhereinthedocument,including<quote>regionsofMPDFdocuments.However,forthefollowingKBPtasks,inwhichevaluationisbycomparisonwithgoldstandardRichEREannotations(whichwillnotincludeannotationsof<quote>regions),thetrackcoordinatorwillautomaticallyfilterout<quote>regionsfromsubmittedrunsbeforescoring,soastoavoidpenalizingrunsthatinclude<quote>regions:

(a)EDL (b)BeliefandSentiment (c)EventNuggets (d)EventArgumentsForthefollowingKBPtasks,inwhichevaluationisbyassessment,assessmentandscoringwillallowprovenanceandextractionsfromanywhereinthedocument,including<quote>regions: (a)ColdStartSF (b)ColdStartKBConstruction

Page 9: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

9

EvaluationQueriesCSKBandCSSFsystemsareevaluatedbythesamesetofColdStartevaluationqueries.AColdStartevaluationquerybeginswithoneormorementionsofthesameentity,followedbyasequenceofslotstobefilledfortheentity.Eachmentioninthequeryiscalledanentrypointbecauseitcanbeusedtoselect(atmost)oneentitynodeinaKBthatisbeingevaluated;multipleentrypointsareincludedforeachColdStartevaluationqueryinordertoincreasethechancesthattheKBwillhavearesponsetothequeryevenifitmissesoneentrypoint.EachColdStartevaluationqueryissplitintomultipleColdStartSlotFilling(CSSF)queries,withoneentrypointperCSSFquery(theCSSFquerieswillrequestthesameslots,buteachwillhaveadifferententrypoint).

ParticipantsintheSlotFillingvariantofColdStartwillreceivetheCSSFevaluationqueriesatthebeginningoftheCSSFevaluationwindow,andwillapplyascripttoincrementallyconvertthosequeriestoaformthatlookssimilartoqueriesfromthe2014EnglishSlotFillingtask.ParticipantsintheKnowledgeBasevariantwillnotreceivethequeries;rather,NISTwillapplytheevaluationqueriestoeachsubmittedknowledgebaseandassesstheresults. AnoutlineoftheNISTassessmentprocessforbothColdStartvariantsisgivenbelow.

AllCSSFevaluationqueriesstartwithanentrypointintotheknowledgebasebeingevaluated.Theentrypointisdefinedbyanamedentitymention(name,docid,beginoffset,andendoffset),andisfollowedbytheentitytypeandeitheroneortwoslotstobeextractedfortheentity.

Evaluationqueriescouldtakeoneoftwoforms:single-hopormultiple-hop.Forexample,hereisasamplesingle-hopCSSFevaluationquerythatasks“WhatistheageoftheJuneMcCarthymentionedatoffsets16931-16943inDocument42?”:

<query id="CSSF16_ENG_00243754cd"> <name>June McCarthy</name> <docid>42</docid> <beg>16931</beg> <end>16943</end> <enttype>PER</enttype> <slot>per:age</slot> <slot0>per:age</slot0> </query>

Thissingle-hopquerylooksverymuchlikeaqueryfromthe2014EnglishSlotFillingtask,exceptthateachqueryinColdStartasksforaspecificslot,ratherthanallslotsforwhichthereisinformationinthedocumentcollection.6

Amorecomplex“two-hop”querymightask,“WhataretheagesofthechildrenoftheJuneMcCarthymentionedatoffsets16931-16943inDocument42”:

<query id="CSSF16_ENG_002109743e"> <name>June McCarthy</name> <docid>42</docid> <beg>16931</beg> <end>16943</end>

6ParticipantsintheSlotFillingvariantshouldtreatallotherslotsasiftheyappearinthe<ignore>fieldofaSlotFillingqueryfrom2013orearlier.

Page 10: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

10

<enttype>PER</enttype> <slot>per:children</slot> <slot0>per:children</slot0> <slot1>per:age</slot1> </query>

Ingeneral,two-hopquerieswillstartfromanentrypoint(selectingthecorrespondingKBentityofaCSKBsubmission),followasingleentity-valuedrelation(fromTable1),thenaskforasingleslotvalue(fromeitherTable1orTable2).7Suchquerieswillverifythattheknowledgebaseiswell-formedinawaythatgoesbeyondbasicentitylinkingandslotfilling,withoutallowingcombinationsoferrorstodrivescorestozero.

Becausetwo-hopqueriesdonotlooklikeanyslotfillingqueriesfromKBP2009-2014,participantsintheColdStartSlotFillingvariantmustprocesstheCSSFqueriesintwo“rounds”usingtheCS-GenerateCSQueries.plscriptfromNIST,whichaddsthe<slot>entrytotheNIST-distributedCSSFqueries.ParticipantsintheSlotFillingvariantmusttreat<slot>astheslottobefilled.Duringthefirstround,<slot>willbeidenticalto<slot0>.TheCS-GenerateCSQueries.plscriptwillthenconvertafirstroundoutputfiletoasecondroundqueryfile.Secondroundqueriesgeneratedbythisscriptwillbear<slot>entriesequivalentto<slot1>.ThoughsomeoftheCSSFquerieswilldifferonlyinhavingdifferentmentions(possiblyforthesameentity)astheirentrypoints,participatingCSSFsystemsareprohibitedfromusinginformationaboutonequerytoinformtheprocessingofanotherquery.

FortheKnowledgeBasevariant,thefollowingrulesareappliedtomapfromaCSSFevaluationquerytoaknowledgebaseentry:First,formacandidatesetofallKBnodementionsthathaveatleastonecharacterincommonwiththeevaluationquerymentionandthathavethesametype.Ifthissetisempty,thesubmissiondoesnotcontainanyanswersfortheevaluationquery.Otherwise,foreachmentionKinthecandidateset,calculate:

• COMMON,thenumberofcharactersinKthatarealsointhequerymentionQ.• K_ONLY,thenumberofcharactersinKthatarenotinQ.

Executeeachthefollowingeliminationsuntilthecandidatesetissizeone,andselectthatcandidateastheKBnodethatmatchesthequery:

• EliminateanycandidatethatdoesnothavethemaximalvalueofCOMMON• EliminateanycandidatethatdoesnothavetheminimalvalueofK_ONLY• Eliminateallbutthecandidatethatappearsfirstinthesubmissionfile

TheproperspecificationofmentionrelationsinaKBisthereforeimportantforscoringwell;CSKBparticipantsshouldthereforetakecaretoensurethateverynamedentitymentionintheevaluationcollectionservesasamentionrelationforanodeintheKB.

TheNISTevaluationofaKBwillproceedbyfindingallentriesintheKBthatfulfillanevaluationquery.Forexample,iftheevaluationquery‘schoolsattendedbythesiblingsofBartSimpson’findstwosiblingsforthenodespecifiedbytheentrypoint,andtheKBindicatesthatthosesiblingsattendedtwoandoneschoolsrespectively,thenthreeresultswouldbeassessedbyNIST.These

7Inprinciple,multiple-hopqueriescouldincludemorethantworelations,butwelimitourselvestotwoinColdStart2016.

Page 11: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

11

resultswillbeconvertedtothesameformastheoutputfortheSlotFillingvariant.ResultswillbepooledacrossallCSKBandCSSFsubmissions,andassessorswilljudgethevalidityofeachresult.Finally,ascoringscriptwillreportavarietyofstatisticsforeachsubmittedrun.

Increatingevaluationqueries,LDCwillstrivetobalanceevendistributionacrossslottypeswithproductivityofthoseslots.Singlehopqueries,whichareofgreaterinterestforslotfilling,willinmanycasesaskformultipleslotsforagivenentityregardlessofwhetherfillersforthoseslotsareattestedinthedocumentcollection.Multiplehopquerieswillfavorentitiesandslotsequencesthatareattestedinthedocumentcollection(althoughheretoo,availabilityofanswersisnotguaranteedatanyhoplevel).

TaskOutput–KnowledgeBaseVariantCSKBsystemsmustproduceaknowledgebaseasoutput.ThefirstlineoftheoutputfilemustcontainauniquerunID.TheremainderoftheKBisrepresentedasasetofaugmentedtriples.Assertionswillappear,one-per-line,intab-separatedformat.TheoutputfilewillbeautomaticallyconvertedtoRDFstatementsduringevaluation.AlloutputmustbeencodedinUTF-8.

Eachtripleappearsintheoutputfileinsubject-predicate-objectorder.Forexample,toindicatethatentity-4hasentity-7asasibling,thetriplemightbe:

:e4 per:siblings :e7

Ifentity-4hassiblingsinadditiontoentity-7,theserelationsshouldbeenteredasseparatetriples.

Entities

Eachentityspecificationbeginswithacolon,followedbyasequenceofletters,digitsandunderscores.Examplesoflegalentityspecificationsinclude:Entity42,:EE74_R29,and:there_were_two_muffins_in_the_oven.Nomeaningisascribedtothissequencebytheevaluationsoftware;itisusedonlyasauniqueidentifier.Anysubsequentuseofthesamecolon-precededsequencewillbetakenasareferencetothesameentity.

Predicates

ThelegalpredicatesarethoseshowninTable1(includinginverses)andTable2,plustype,mention,nominal_mention,andcanonical_mention.

PredicatesfoundinTable1musthaveentityspecificationsinboththesubjectandobjectpositions.PredicatesfoundinTable2musthaveanentityspecificationinthesubjectposition,andadoublequote-delimitedstringintheobjectposition;thestringintheobjectpositionwillexactlycorrespondwiththeslotfillforthatrelationintheSlotFillingtask.Abackslashcharactermustprecedeanyoccurrenceofadoublequoteorabackslashinsuchastring.8Atleastoneinstanceofeachuniquesubject-predicate-objecttriplewillbeevaluated.Ifmorethanoneinstanceofagiventripleappearsintheoutput(witheachtriplehavingdifferentprovenance),LDCwillassesstheinstancewiththehighestconfidencevalue(seebelow),andwillassessadditionalinstancesifresourcesallow.Ifmorethanonesuchtriplesharesthesameconfidencevalue,thetriplethatappearsearlierintheoutputwillbeconsideredtohavehigherconfidence.

8Eachbackslashusedtoquotethefollowingcharacterdoesn’titselfhavetobeprecededbyanotherbackslash.

Page 12: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

12

type

Eachentitywillbethesubjectofexactlyonetypetriple.TheobjectofthattriplewillbeeitherPER,ORG,GPE,FACorLOCdependingonthetypeoftheentity.Itisuptosubmittingsystemstocorrectlyidentifyandreportthetypeofeachentity.

mention and nominal_mention

Eachentitywillbethesubjectofone9ormorementionornominal_mentiontriples.Togetherwiththeprovenanceinformation(seebelow),thesetriplesindicatehowtheknowledgebaseistiedtothedocumentcollection.Eachnamedentitymentioninthecollectionmustbesubmittedastheobjectofamentiontriple,whileeachnominalentitymentioninthecollectionmustbesubmittedastheobjectofanominal_mentiontriple.Forexample,ifanentityismentionedbynamefivetimesinadocument,fivementiontriplesshouldbegenerated.Theobjectofamentionornominal_mentiontripleisthedouble-quotedmentionstring;documentIDandoffsetappearunderprovenanceinformation(seebelow).ThedefinitionofwhatconstitutesanamedornominalentitymentionforColdStartisdescribedintheColdStartschemaabove.

canonical_mention

Foreachdocumentthatmentionsanentity,oneofthementionsornominal_mentionsmustbeidentifiedasthecanonicalmentionforthatentityinthatdocument;itisthestringthatwillbeseenbytheassessorifthatentityappearsasaslotfill,supportedbythatdocument(inSlotFillingtaskterms,itisthecontentofColumn5ofaCSSF2016submission,anditsprovenancewillserveasColumn7oftheCSSFsubmission).10Canonicalmentionsareexpressedusingacanonical_mentiontriple.Theargumentsforcanonical_mention arethesameasformentionandnominal_mention.Notethatthereisnorequirementthatsubmissionsselectasingle,globalcanonicalmentionforanentity.Whilesuchamentionmightbeuseful,herewerequirethatacanonicalmentionbeprovidedwithineachdocumentfortheassessortouseduringassessment.Eachcanonical_mentionisalsoamention or nominal_mention.Asaconvenience,ifasubmittedKBdoesnotcontainamentionornominal_mentiontripleforeachcanonical_mentiontriple,themissingrelationswillbeinferred(perhapsincorrectly)asnamedmentions(albeitwithawarning).ThisshortcutisprovidedtomakesubmittedKBseasiertoview,anddoesnotrelievesubmittersfromtherequirementtoprovideeachoftherequiredmentions,nominal_mentions,andcanonical_mentions.

9Unlikepreviousyears,ColdStart2016requiresbothnamedandnominalentitymentionstobeextractedandincludedintheKB.10IntheSlotFillingtaskofKBP2009-2014(andintheSlotFillingvariantofColdStart),allslotfillsarestrings.Assessorsverifythevalidityofaslotfillbylookingforthatstringinthespecifieddocument,usingtheprovenanceinformationprovidedinthesystemresponse.InasubmittedKB,slotsthatarefilledwithentitiesholdnotstrings,butpointerstotheKBstructurefortheappropriateentity.Thus,acanonicalmentionmustbeidentifiedbytheColdStartKBforeachentityineachdocument,sothattheassessorcanbepresentedwithastringthatrepresentstheentityduringassessment.Arelationprovenance(seebelow)entrymayincludemorethanonedocument,andatleastoneofthosedocumentsmustcontainamentionoftheobjectoftherelation;thatdocumentmustthereforecontainacanonicalmentionfortheobject.Whenselectingacanonicalmentionforpresentationtotheassessor,thefirstdocumentappearingintherelationprovenancethatcontainsamentionoftheobjectwillbeusedforthecanonicalmention.

Page 13: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

13

TaskOutput–SlotFillingVariantOutputfortheSlotFillingvariantwillbeintheformofatab-separatedfile.Thecolumnsofthesubmittedfileareasfollows:

Column1 QueryID.Forthefirstround,thisistakendirectlyfromthe<query>XMLtag.Forthesecondround,thisisdrawnfromthe<query>tagofthequerygeneratedfromthefirstroundoutput.

Column2 Thenameoftheslotbeingfilled.

Column3 AuniquerunIDforthesubmission.

Column4 Provenancefortherelationbetweenthequeryentityandslotfiller,consistingofupto4docid:startoffset-endoffsettriplesseparatedbycommas.Individualspansmaycompriseatmost150UTF-8characters.Unlikethe2014SlotFillingtask,thereisnorequirementtogenerateNILentrieswhennoinformationaboutthetargetentityisavailable.

Column5 Aslotfiller(possiblynormalized,e.g.,fordates).Thisisusedbothtopopulatethe<name>entryofthenextroundquery,andbytheassessortojudgetheslotfill.ThestringshouldbeextractedfromthefillerprovenanceinColumn7,exceptthatanyembeddedtabsornewlinecharactersshouldbeconvertedtoaspacecharacteranddatesmustbenormalized(therefore,slotfillersshouldnotbetranslatedacrosslanguages).Ifanominalmentionisreturnedasaslotfiller,onlytheheadwordofthenominalphraseshouldbereturned(consistentwiththeEDLdefinitionofnominalmentions).Fordates,systemsmustnormalizedocumenttextstringstostandardizedmonth,day,and/oryearvalues,followingtheTIMEX2formatofyyyy-mm-dd(e.g.,documenttext“NewYear’sDay1985”wouldbenormalizedas“1985-01-01”);ifafulldatecannotbeinferredusingdocumenttextandmetadata,partialdatenormalizationsareallowedusing“X”forthemissinginformation.

Column6 Afillertype,selectedfrom{PER,ORG,GPE,STRING}.TheSTRINGfillerisusedforstring-valuedslotsshowninTable2.

Column7 Provenancefortheslotfillerstring.Thisiseitherasinglespan(docid:startoffset-endoffset)fromthedocumentwherethecanonicalslotfillerstringwasextracted,or(inthecasewhentheslotfillerstringinColumn5hasbeennormalized)asetofuptotwocomma-separateddocid:startoffset-endoffsetspansforthebasestringsthatwereusedtogeneratethenormalizedslotfillerstring.ThedocumentsusedfortheslotfillerstringprovenancemustbeasubsetofthedocumentsprovidedinColumn4.Thiscolumnservestwopurposes.First,LDCwilljudgeCorrectvs.Inexactwithrespecttothedocument(s)providedintheslotfillerstringprovenance.Second,thiscolumnisusedtofillthe<docid>,<beg>and<end>entriesinsecondroundqueries.Ifmorethanoneprovenancetripleisprovidedhere,thefirstonewillbeusedtofillthesecondroundquery.

Page 14: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

14

Column8 Confidencescore.

TheprocessforconstructingaSlotFillingvariantsubmissionisasfollows:

• DownloadthefollowingfromtheNISTWebsite:o TheColdStartevaluationdocumentsCS-GenerateQueries.plscripto CS-PackageOutput.plscripto CS-ValidateSF.plscript

[email protected]:o TheCSSFevaluationqueries

• ConfigureyoursystemtoproduceresultsonlyfromtheColdStartevaluationdocuments.• RuntheCS-GenerateQueries.plscriptontheevaluationqueriestoproducethefirstround

queriesyoursystemwillrunon.Notethattherawevaluationqueriesmightdifferfromtheformatgivenabove,soyoushouldnotassumethatyoucanusethemasinputtoyoursystemwithoutrunningthisscript.

• Runyoursystem,producingaslot-fillingsubmissionforthefirstroundqueries.• RuntheCS-ValidateSF.plscriptonyourfirstroundoutputtoverifythatitisformatted

correctly.• RuntheCS-GenerateQueries.plscriptontheevaluationqueriesandyourfirstround

outputtoproducethesecondroundqueries.• Runyoursystemonthesecondroundqueriestoproduceasecondoutputfile.• RuntheCS-PackageOutput.plscriptonthetwooutputfilestoproduceyoursubmission.• RuntheCS-ValidateSF.plscriptonyoursubmissiontoverifythatitisformattedcorrectly.• UploadthesubmissiontoNIST.

TaskOutput–AllVariants

Provenance

EachtripleinCSKBsubmissionsandeachoutputlineinCSSFsubmissionswillincludeasetofaugmentations(againusingtabsasseparators).Exceptforthetypepredicate(whichdoesnotrequireexplicitsupportfromadocument)thefirstaugmentationswilldescribetheprovenanceoftheassertion.ProvenanceforsubmissionsfortheSlotFillingvarianthavealreadybeendescribedabove;correspondingprovenancefortriplesinKBvariantsubmissionsaredetailedhere:

ForpredicatesforrelationsfromTable1orTable2,uptofourcomma-separatedjustificationswillbeallowedforeachentry,atthesubmitter’sdiscretion.Justificationsdonotneedtobeexplicitlyassociatedwithsubject,relationorobject.EachjustificationwillincludeadocumentID,followedbyacolon,followedbytwodash-separatedoffsets(beginandendoffsets).Theoffsetsthatshowtheprovenanceofanextractedrelationareusedtonarrowtheassessor’sfocuswithinthedocumentswhenassessingthecorrectnessofthatrelation.Provenanceforasinglerelationmaybedrawnfrommorethanonedocument.FortheKBvariant,whenselectingacanonicalmentionforpresentationtotheassessor,thefirstdocumentappearingintherelationprovenancethatcontainsanamedornominalmentionoftheobjectwillbeusedforthecanonicalmention.(AtleastoneofthedocumentsintheKB’srelationprovenancemustcontainanamedornominalmentionoftheobjectoftherelation;thatdocumentmustthereforecontainacanonicalmentionfortheobject.)Therefore,participantsshouldbecarefultoensurethatifsomedocumentscontainnominalcanonicalmentions,andsomedocumentscontainnamedcanonicalmentions,thatthedocument

Page 15: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

15

containinganamedcanonicalmentionappearsasthefirstdocumentintheprovenance.String-valuedslots(fromTable2)whosevaluesdonotrepresententities,placeanadditionalconstraintonprovenanceforKnowledgeBasevariantparticipants:thefirstjustificationmustrepresentthedocumentIDandoffsetsofthestringfill.(SlotFillingvariantparticipantsarealreadyprovidingthisinformationinColumn7oftheirsubmissions.)Thisrequirementwillallowassessorstoquicklyseethetextfromwhichthestringfillwasextracted.

UnlikeentriesforSlotFillingrelations,themention, nominal_mention, andcanonical_mentionpredicateswillhaveonlyasinglejustification,representingtheexactlocationofthementioninthetext.Thetypepredicaterequiresnoprovenance.

ConfidenceMeasure

Topromoteresearchintoprobabilisticknowledgebasesandconfidenceestimation,eachtripleorslotfillmayhaveanassociatedconfidencescore.ConfidencescoreswillnotbeusedforanyofficialTAC2016measure.However,thescoringsystemmayproduceadditionalmeasuresifconfidencescoresareincluded.Confidencescoreswillbeusedtoinduceatotalorderoverthefactsbeingevaluated(tiesarebrokenwhentwoscoresareequalbyassumingthattheassertionappearingearlierinthesubmissionhasahigherscore).Anysubmittedconfidencescoremustbeapositiverealnumberbetween0.0(exclusive,representingthelowestconfidence)and1.0(inclusive,representingthehighestconfidence),andmustincludeadecimalpoint(nocommas,please)toclearlydistinguishitfromadocumentoffset.Confidencescores,ifpresent,willappearattheendofeachoutputline,separatedfromtheprovenanceinformationwithatab.Confidencescoresmaynotbeusedtoqualifytwoincompatiblefillsforasingleslot;submittersystemsmustdecideamongstsuchpossibilitiesandsubmitonlyone.Forexample,ifthesystembelievesthatBart’sonlysiblingisLisawithconfidence0.7andMilhousewithconfidence0.3,itshouldsubmitonlyoneofthesepossibilities.Ifbotharesubmitted,itwillbeinterpretedasBarthavingtwosiblings.

Comments

Outputfilesmaycontaincomments,whichbeginatanyoccurrenceofapoundsign(#)andcontinuethrough(butdonotinclude)theendoftheline.Commentsandblanklineswillbeignored.ThefirstlineofaKBvariantoutputfilemustcontaintheuniquerunID(i.e.,itmaynotbeblank).Submittersmayliketoaddacommenttothislinegivingfurtherdetailsabouttherun.

Examples

ThefollowingfivelinesfromaKnowledgeBasevariantsubmission11showexamplesof:onetriplewithoutanyaugmentations,twowithonlymentionextent,onewithonlyrelationprovenance,andonewithbothrelationprovenanceandconfidence.

:e4 type PER :e4 mention “Bart Simpson” Doc726:37-48 :e4 nominal_mention “brother” Doc726:15-21 :e4 per:siblings :e7 Doc124:283-288,Doc885:173-179,Doc885:274-281 :e4 per:age "10" Doc124:180-181,Doc885:173-179 0.9

HereareexamplelinesfromaSlotFillingvariantsubmission:

11ThefirstthreelinescanreadilybeconvertedtoformpartofanEDLsubmission,whichcanbeevaluatedasintheEDLtrack.

Page 16: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

16

Q4 org:city_of_headquarters myrun1 Doc42:3-8,Doc8:3-11 Baltimore GPE Doc8:3-11 1.0 Q5 per:siblings myrun1 Doc124:283-288,Doc885:173-179 Lisa PER Doc124:283-286 0.7

Q6 per:age myrun1 Doc124:180-181,Doc885:173-179 10 STRING Doc124:180-181 0.9

Differencesbetween2014SlotFillingandthe2016ColdStartSlotFillingVariantSlotfillingsystemsthatparticipatedinthe2014SlotFillingtaskwillneedtohandlethefollowingdifferencestosuccessfullyparticipateinthe2016CSSFtask:

• Onlytheslotspecifiedbythe<slot>entryistobefilled;allotherslotsshouldbeignored.The<slot>entryisaddedtothequeriesreceivedfromNISTbyrunningtheCS-GenerateQueries.plscript.

• Participantswillneedtodooneroundofslotfilling,runtheCS-GenerateQueries.plscripttocreatethesecondroundqueries,thenrunslotfillingagainonthenewqueries.TheresultsofroundsoneandtwoaretobeconcatenatedbeforesubmissionusingtheCS-PackageOutput.plscript.

• CSSFrequiresthatparticipantsbeabletofillallslotsinbothdirections.Forexample,the2014SlotFillingtaskrequireddetectionofthe per:cities_of_residence slot.CSSFalsorequiressystemstobeabletodetecttheinverseofthatslot,gpe:residents_of_city.

• Eachslotfillermustbeassignedatype,selectedfrom{PER,ORG,GPE,STRING}.Thisfieldrepresentsanadditionaloutputcolumnnotfoundinthe2014SlotFillingorCSSFtasks.

• NILentries,indicatingthatnoinformationaboutaparticularslotisavailable,arenotrequiredinCSSF.

• Nominalmentionsofslotfillersmaybereturnifnonamedentitymentionisavailableinthedocumentcollection.(Returningnominalentitymentionsisnotrequired,butmayimprovesystemrecallifdonecorrectly.)

• InadditiontoEnglish,slotfillersandprovenancemayalsobereturnedfromChineseandSpanishdocuments(onlyiftheteamisparticipatinginoneofthelanguageconditionsthatisn'tmono-lingualEnglish).

EvaluationTheprimaryevaluationforbothColdStartSFandColdStartKBconstructionistheslotfillingevaluation,basedonassessmentofslotfillersfoundinresponsetoColdStartevaluationqueries.Inaddition,theentitydiscoverycomponentofColdStartKBsissecondarilyevaluatedusingthesamesetofevaluationdocumentsandannotationsasintheEDLtrack.

SlotFillingAssessment

ColdStart2015assessmentandscoringwillproceedasfollows:Theresponsesforeachevaluationquery(frombothtaskvariantsandfromhuman-generatedresults)willbepooled,andeachresponsewillbeassessedbyaperson.TheresultoffollowingthefirstrelationwillbeassessedasifitwereaSlotFillingquery(forKnowledgeBasevariantentries,thecanonicalmentionoftheobjectentityinthefirstsupportingdocumentthatmentionsthatentitywillbeusedfortheslotfill).ThesecondrelationinthequerywillalsobeassessedasaSlotFillingquery,butonlyifthefillforthefirstrelationiscorrect.Ifthefillforthefirstrelationisnotcorrect,eachfillforthesecondrelationisautomaticallycountedasWrong.Forexample,ifthequeryasksfortheagesofthesiblingsof“BartSimpson,”andthesubmittedknowledgebasegives“Lisaage8”and“Milhouseage

Page 17: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

17

10”assiblings,thenonlythereportedageofLisawillbeassessed(MilhouseisnotBart’ssibling),andthereportedageofMillhousewillautomaticallybecountedasWrong.

ColdStartusespseudo-slotscoringtoevaluatemultiple-hopqueries,inwhicheachevaluationqueryistreatedasifitselectsasingleindivisibleslot.Forexample,anevaluationquerythatasksforthechildrenofthesiblingsofanentitywillbescoredasifitwereaqueryaboutavirtualper:nieces_and_nephewsslot.12TheguidelinesinTACKBP2015SlotDescriptionsspecifywhethereachofthecomponentslotsofapseudo-slotissingle-valued(e.g.,per:date_of_birth)orlist-valued(e.g.,per:employee_of,per:children).Apseudoslotissingle-valuedifeachofitscomponentslotsissingle-valued,andlist-valuedotherwise.IncontrasttotheSlotFillingtask,KnowledgeBasevariantsubmissionsmaycontainmultiplefillsforsingle-valuedslots.Ifsucharepresentinthesubmission,LDCwillassesstheslotfillwiththehighestconfidencevalue,andwillassessadditionalslotfillsifresourcesallow.Ifmorethanonesuchslotfillsharesthesameconfidencevalue,theslotfillthatappearsearlierintheoutputwillbeconsideredtohavehigherconfidence.

EachCSSFslotfillerresponse(orCSKBobjectofeachcomponentrelationthatmakesupasingleevaluationqueryresponse)isassessedasCorrect,ineXact,orWrong,followingguidelinesinTACKBP2015AssessmentGuidelines.Foreachquery,allsystemresponsesinwhichtheslotfillerisassessedasCorrectorineXactwillbepartitionedintoequivalenceclasses,whereslotfillersinthesameequivalenceclassrepresentthesameentityorvalue(asinthecaseofdates).EachCorrectorineXactresponsewillreceiveanannotationforfillermentiontype(eitherNAMorNOM),andeachequivalenceclasswillreceiveanannotationforequivalenceclassmentiontype(NAMiftheassessorcanfindanamedmentionforthefilleranywhereintheprovenancesinanyoftheresponses;otherwise,NOMifonlynominalmentionsappearintheprovenancesofallresponses).

Pseudo-slotswillbescoredjustasslotsintheSlotFillingtask,withtheadditionalconstraintthatboththeslotfillandthepathleadingtothatfillmustbecorrectfortheentiretytobejudgedcorrect.ToreceivecreditforidentifyingMaggieSimpsonasPattyBouvier’sniece,theknowledgebasemustnotonlyincludeMaggieastheslotfill,butmustalsorepresentMaggieasMarge’schild,andMargeasPatty’ssibling:13

Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children) GroundTruth: :PattyBouvier per:siblings :MargeSimpson :MargeSimpson per:children :MaggieSimpson Submission: :PattyBouvier per:siblings :MargeSimpson

:MargeSimpson per:children :MaggieSimpson⇒correct

AKBthatindicatedthatMaggiewasPatty’sniecebecauseshewasPatty’ssisterSelma’schildwouldbescoredasincorrect:

Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children) GroundTruth: :PattyBouvier per:siblings :MargeSimpson :MargeSimpson per:children :MaggieSimpson Submission: :PattyBouvier per:siblings :SelmaBouvier

:SelmaBouvier per:children :MaggieSimpson⇒incorrect

12Apseudo-slotissimilartotheconceptofarolechain,whichissupportedbysomeknowledgerepresentationsystemsbasedondescriptionlogic,includingOWL2.13Ineachoftheseexamples,onlythesubject,predicateandobjectareshown,andonlyasubsetoftherelevantknowledgebaseispresented.Eachentityisnamedafterthementionthatgaverisetoit.

Page 18: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

18

Aresponseisinexactifiteitherincludesonlyapartofthecorrectanswerorincludesthecorrectanswerplusextraneousmaterial.InexactanswersarecountedasWrongforthepurposesofscoring:

Evaluationquery: TitlesofparentsofBartSimpson(per:parents,per:title) GroundTruth: :BartSimpson per:parents :HomerSimpson

:HomerSimpson per:title "Attack-dog trainer" Submission: :BartSimpson per:parents :HomerSimpson

:HomerSimpson per:title "dog trainer Pitiless Pup"⇒inexact

Inaddition,theobjectofthefinalrelationinapseudo-slotmayberatedasredundantifitisequivalenttoanotherfillforthepseudo-slot.RedundantanswersarecountedasWrongforthepurposesofscoring:

Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children)GroundTruth: :PattyBouvier per:siblings :MargeSimpson

:MargeSimpson per:children :MaggieSimpson :MaggieSimpson per:alternate_names "Margaret Simpson" Submission: :PattyBouvier per:siblings :MargeSimpson :MargeSimpson per:children :MaggieSimpson⇒correct

:MargeSimpson per:children :MargaretSimpson⇒redundant

However,objectsofrelationsotherthanthefinalrelationwillneverberatedasredundant:

Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children)GroundTruth: :PattyBouvier per:siblings :MargeSimpson

:MargeSimpson per:children :LisaSimpson :MargeSimpson per:children :BartSimpson :MargeSimpson per:alternate_names "Marjorie Simpson" Submission: :PattyBouvier per:siblings :MargeSimpson :PattyBouvier per:siblings :MarjorieSimpson :MargeSimpson per:children :LisaSimpson⇒correct

:MarjorieSimpson per:children :BartSimpson⇒correctHere,MargeSimpsonandMarjorieSimpsonrepresentthesamepersoninthegroundtruth,buttwodistinctentitiesintheKB.However,becausethequeryisaboutMarge’schildrenandnotaboutMargeherself,bothresponsestotheevaluationqueryareassessedascorrect.

SinceinColdStartthefactsbeingevaluatedcomefromsequencesoftriples,confidencescoreswouldneedtobecombinedifwewantedtogenerateconfidencescoresforaderivedpseudo-relation.Theproperwaytocombinescoresofcoursedependsonthemeaningofthosescores,andfornow,ColdStartisnotmandatinganyparticularmeaning.Threegeneralscorecombinationfunctionsaremin,maxandproduct;wewelcomecommentsfromthecommunityonwhichcombinationmethodstoreport.

SlotFillingScoring

Giventheaboveapproachtoassessment,basicscoringforagivensystemproceedsasfollows:

• EachresponseassessedasWrongorineXact,iscountedasSpurious• EachresponseforRound2whoseRound1parentfillerisassessedasWrongorineXact,is

countedasSpurious• ResponsesassessedasCorrectaregroupedintoequivalenceclasses.Foreachequivalence

class,atmostoneresponsefromthesystemiscountedasRight;allotherresponsesarecountedasSpurious(therefore,systemsshouldnotreturnredundantanswerstothesame

Page 19: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

19

query).IfthesystemhasaNAMentitymentionintheequivalenceclass,orifthesystemhasonlyNOMentitymentionsandtheequivalenceclassisannotatedasNOM,thentheoneresponseiscountedasRight;otherwise,ifthesystemhasonlyNOMentitymentionsintheequivalenceclassandtheequivalenceclassisannotatedasNAM,thentheoneresponseiscountedasIgnore(i.e.,treatedasifitwasneverreturnedbythesystem).Thus,namedentitymentionsarepreferred.

• Reference=numberofsingle-valuedpseudo-slotswithacorrectresponse+numberofequivalenceclasses14foralllist-valuedpseudo-slots

• Recall=#Right/Reference• Precision=#Right/(#Right+#Spurious)• F1=2*Precision*Recall/(Precision+Recall)

Asin2015,eachColdStartevaluationqueryin2016mayhavemorethanoneentrypoint.BecausethenumberofentrypointsmaydifferarbitrarilybetweenColdStartevaluationqueries,wefocusontwoprimarymetricsforthe2016ColdStartKnowledgeBasePopulationsystemevaluation:

• MAX(micro-average):computeF1foreachentrypointasoutlinedabovetoselectasingle"maximal"entrypointforeachevaluationquery,wheretheselectedentrypointhasamaximalF1amongallentrypointsforthatquery.TheMAXmicro-averagePrecision,Recall,andF1forthesystemiscomputedbysummingthecountsacrossallqueries,usingonlytheselectedmaximalentrypointforeachquery.

• MEAN(macro-average):computeF1foreachentrypointasoutlinedabove.Thequery-levelscoreforaqueryisthemeanoftheF1scoresofeachofitsconstituententrypoints.TheMEANscoreforthesystemisthemeanofitsquery-levelscores.TheMEANmetricgivesequalweighttoeachquery,and(withineachquery)equalweighttoeachofitsentrypoints.

EntityDiscoveryScoring

ThescoringfortheEntityDiscoverycomponentofsubmittedColdStartKBswillbeidenticaltoscoringforthe2016TACTrilingualEntityDiscoveryandLinkingtask,withtheexceptionthatnolinkingtoanexistingknowledgebaseisrequired(thatis,allmentionswillbetreatedasNILentries).PleaseseeTACKBP2016EntityDiscoveryandLinkingTaskDescriptionforcompletedetailsonscoring.

SubmissionsAfour-weekwindowfromMondayAugust1toMondayAugust29willbeavailablefordownloadingtheTACKBP2016EvaluationSourceCorpus,producingCSSFandCSKBsystemoutput,andsubmittingresults.Systemsshouldnotbemodifiedoncethecorpushasbeendownloaded.StartingMonday,August15,participantsintheCSSFtaskmayemailNISTtorequesttheCSSFevaluationqueries,butteamsparticipatinginboththeCSSFandCSKBtasksmustsubmitallCSKBrunsbeforerequestingtheCSSFevaluationqueriesfromNIST.OnAugust15,automaticEDLoutputfromsystemsparticipatinginthefirstEDLevaluationwindow,willalsobemadeavailableasanoptionalresourcetoColdStartparticipants.

14SeeTACKBP2015SlotDescriptionsandTACKBP2015AssessmentGuidelinesforfurtherinformationonhowandwhentwoslotfillsaretreatedasequivalent.

Page 20: Cold Start Knowledge Base Population at TAC 2016 Task ...€¦ · In 2016, Cold Start has two task variants. 1. In the Knowledge Base variant (CSKB), participants submit entire knowledge

20

ForeachoftheColdStarttaskvariants(CSSFandCSKB),ateammaysubmitupto5runsforeachofthefollowing4languageconditions:

1. MonolingualEnglish:entitymentions,slotfillsandprovenancesareextractedonlyfromEnglishdocuments

2. MonolingualSpanish:entitymentions,slotfillsandprovenancesareextractedonlyfromSpanishdocuments

3. MonolingualChinese:entitymentions,slotfillsandprovenancesareextractedonlyfromChinesedocuments

4. Cross-lingual:entitymentions,slotfillsandprovenancesareextractedfromanycombinationofEnglish,Spanish,andChinesedocuments.

IfateamsubmitsaruninvolvingmorethanonelanguageundertheCross-lingualcondition,itmustalsosubmitatleastonerununderthemonolingualconditionforeachlanguageinvolved(withadescriptionofwhichmonolingualrunconfigurationswereusedforeachcross-lingualrun).

Submittedrunsmustberanked(1-5)inorderofevaluationpreference.ThenumberofrunsactuallyevaluatedwilldependuponresourcesavailabletoNIST;the3top-rankedrunsfromeachteamwillbeassessedforeachtaskandlanguagecondition,andlower-rankedsubmissionsmaybeassessedifresourcesallow.TherunIDincludedineachteam'ssubmissionfilemustbeaconcatenationoftheteam'sTACKBP2016teamID,thetask(KBorSF),thelanguagecondition(ENG,CMN,SPA,orXLING),andarank(1-5);thus"Acme_KB_XLING_1"wouldbethetop-rankedrunfortheAcmeteamfortheCSKBtaskvariantunderthecross-lingualcondition.

Thetop-rankedsubmissionmustbemadeasa‘closed’system;inparticular,itmustnotaccesstheWebduringtheevaluationperiod.Allsubmissionsmustobeythefollowingexternalresourcerestrictions:

• Structuredknowledgebases(e.g.,Wikipediainfoboxes,DBPedia,Freebase)maynotbeusedtodirectlyfillslotsordirectlyvalidatecandidateslotfillers.

• Structuredknowledgebaseentriesfortargetentitiesmaynotbeedited,eitherduring,oraftertheevaluation.

Inaddition,becauseColdStartfocusesontheconditionwheretheknowledgebaseisinitiallyempty,weaskthateachparticipatingsitesubmitatleastonerunthatconsultsexternalentityknowledgebasesonlyafterentitiesandrelationshavebeenextractedfromthedocumentcollection.Detailsaboutsubmissionprocedureswillbecommunicatedtothetrackmailinglist.ToolstovalidateformatswillbeavailableontheTACWebsite(http://www.nist.gov/tac/2016/KBP/ColdStart/tools.html).

ChangeHistory• Version1.0

o Originalversion,basedonthe2015specificationo Addeddescriptionofmulti-lingualtaskso AligneddefinitionofentitytypesandmentiontypesintheKBConstructiontask,

withthoseinthe2016EntityDiscoveryandLinkingtracko Addeddescriptionofnominalentitymentionsandslotfillers