Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
ColdStartKnowledgeBasePopulationatTAC2016TaskDescription1Version1.0ofJuly11,2016
What’sNew.............................................................................................................................................2
Introduction...........................................................................................................................................2Schema................................................................................................................................................................6DocumentCollection......................................................................................................................................7EvaluationQueries..........................................................................................................................................9
TaskOutput–KnowledgeBaseVariant.....................................................................................11Entities.............................................................................................................................................................11Predicates.......................................................................................................................................................11
TaskOutput–SlotFillingVariant.................................................................................................13TaskOutput–AllVariants..............................................................................................................14Provenance.....................................................................................................................................................14ConfidenceMeasure....................................................................................................................................15Comments.......................................................................................................................................................15Examples.........................................................................................................................................................15
Differencesbetween2014SlotFillingandthe2016ColdStartSlotFillingVariant...16
Evaluation.............................................................................................................................................16SlotFillingAssessment...............................................................................................................................16SlotFillingScoring.......................................................................................................................................18EntityDiscoveryScoring............................................................................................................................19
Submissions.........................................................................................................................................19
ChangeHistory....................................................................................................................................20
1TheTACorganizingcommitteewelcomescommentsonthisTaskDescription,oronanyaspectoftheTACevaluation.Pleasesendcommentstotac-kbp@nist.gov.
2
What’sNewColdStart2016hastwotaskvariants:Afullend-to-endKnowledgeBaseConstruction(CSKB)task,andacomponentSlotFilling(CSSF)task.AcomponentEntityDiscoveryandLinking(EDL)taskisorganizedcompletelyundertheEDLtrack,buthasthesameinputdocumentsasCSKBandCSSF,toallowtheentitydiscoverycomponentofCSKBsystemstobeevaluatedonthesamedocumentsasstandaloneEDLsystems.Inordertoenableteamswithslotfillingsystemstoalsoparticipateintheend-to-endKBconstructiontask,twoEDLevaluationwindowsareofferedandstagedsuchthatteamsconstructingaKBaregiventheoutputofEDLsystemsparticipatinginthefirstEDLevaluationwindow.
Thisdocumentdescribesthe2016ColdStartSF/KBConstructiontasks.ThedetailedtaskdescriptionforEDLisattheEDL2016website(http://nlp.cs.rpi.edu/kbp/2016/).
The2016ColdStartSF/KBConstructiontasksareidenticaltothe2015tasks,withthefollowingchanges:
1. TheColdStarttasksarecross-lingual;inadditiontoEnglish,the2016sourcecorpusincludesChineseandSpanishdocuments.Cross-lingualSF/KBConstructionsystemsmayreturnentitymentions,slotfillersandprovenancefromanycombinationofEnglish,Chinese,andSpanishdocuments.Additionally,3diagnosticmonolingualversionsofthesetasksareoffered(oneforeachlanguage),inwhichentitymentions,slotfillersandprovenancemustcomefromonlythesinglelanguage.
2. Inadditiontoperson(PER),organization(ORG),andgeopoliticalentity(GPE)types,KBConstructionsystemsmustreturnmentionsoflocation(LOC)andfacility(FAC)entities(althoughtheslotinventorywillnotbemodifiedtoincludeLOCandFACentities).
3. Inadditiontonamedmentions,KBConstructionsystemsmustextractandlinkallnominalmentionsofspecificindividualPER,ORG,GPE,LOC,andFACentities.
4. ColdStartSF/KBConstructionsystemsmayreturnanominalmentionasafillerifnonamementionisavailableinthesourcecorpus.
IntroductionSince2009,TAChasevaluatedperformanceontwoimportantaspectsofknowledgebasepopulation:entitylinkingandslotfilling.ThegoaloftheColdStarttrackistoexercisebothoftheseareas,andevaluatetheabilityofasystemtousethesetechnologiestoactuallyconstructaknowledgebase(KB)fromtheinformationprovidedinatextcollection.ColdStartparticipantsbuildasoftwaresystemthatprocessesalargetextcollectionandcreatesaknowledgebasethatisconsistentwithandaccuratelyrepresentsthecontentofthatcollection.Theknowledgebaseisthenevaluatedasasingleconnectedresource,usingqueriesthattraverseentitynodesandrelation(slot)linksintheKBtodetermineiftheKBcontainscorrectrelationsbetweencorrectentities.
In2016,ColdStarthastwotaskvariants.
1. IntheKnowledgeBasevariant(CSKB),participantssubmitentireknowledgebases,withoutpriorknowledgeoftheevaluationqueries.
2. TheSlotFillingvariant(CSSF)supersedesthe2014SlotFillingtrackandisdesignedtomakeiteasyforsiteswithslotfillingsystemstoparticipateinColdStart.Inthisvariant,theColdStartevaluationqueriesaresplitintoColdStartSlotFillingqueries,withoneentrypointperquery,andaredistributedatthestartofthetaskevaluationwindow.Participantsdonothavetosubmitentireknowledgebases.Rather,theyapplytheirslotfillingsystem
3
twice,thefirsttimeontheentrypointforeachquery,thesecondtimeoneachoftheresultsofthefirstround.
TheEntityDiscoveryandLinkingtaskandtheSlotFillingtaskhavedoneagoodjobofevaluatingkeycomponentsofknowledgebasepopulation.Theydonot,however,evaluateeveryaspectofanautomaticallygeneratedknowledgebase.Thingsonemightliketoknowaboutsuchaknowledgebaseinclude:
• Aretheentitiesintheknowledgebasecorrectlytiedtoreal-worldmentionsofthoseentities?TACEntityDiscoveryandLinking(EDL)taskshavemeasuredthis.
• Arethefactsandrelationsintheknowledgebaseaccuratereflectionsofthefactsandrelationsdescribedinthesourcedocuments?TheTACSlotFillingtaskshavemeasuredthis,aswillTACColdStartSF.
• Areentitylinkingandslotfillingcorrectlycoordinatedtoproduceameaningfulknowledgebase?TheTACColdStartKBtaskmeasuresthis.
• Cantheknowledgebasecorrectlyperforminferenceovertheextractedentities,suchastemporalreasoning,confidenceestimation,defaultreasoning,transitiveclosure,etc.?ColdStartisjustbeginningtomeasurethis;itisdesignedtofacilitatethiskindofevaluationmorethoroughlyinfutureyears.
WecallthetaskColdStartKnowledgeBasePopulationtoconveytwofeaturesoftheevaluation:itimpliesboththataknowledgebaseschemahasbeenestablishedatthestartofthetask,andthattheknowledgebaseisinitiallyunpopulated.Thus,weassumethataschemaexistsfortheentities,facts,andrelationsthatwillcomposetheknowledgebase;itisnotpartofthetasktoautomaticallyidentifyandnamefactsandrelationshipspresentinthetextcollection.In2016,weuseaschemathatcombinestheentitytypesfromTACKBP2016EntityDiscoveryandLinking,andtherelationtypesfromTACKBP2015ColdStartKnowledgeBasePopulation.Thus,theschemawillincludefiveentitytypes(person,organization,geopoliticalentity,facility,andlocation)andforty-onerelationtypesandtheirinverses.Forrelationswhosefillsarethemselvesentities(suchasper:siblingsororg:subsidiaries),CSKBsystemswillberequiredtolinkthatslottothenodeinthesubmittedKBrepresentingthecorrectentity2.Slotswhosefillsarestrings(suchasper:titleororg:website)willsimplyusestringstorepresenttheinformation.
ColdStartalsoimpliesthattheknowledgebaseisinitiallyempty.ToavoidsolutionsthatrelyonverifyingcontentalreadypresentinWikipediaorotherlargedatasourcesaboutentities,thequeriesusedinColdStartwillbedominatedbyentitiesthatarenotpresentinWikipedia.
Allparticipatingsystemswillreceivethefollowinginput:
1. adocumentcollection;2. aknowledgebaseschema
Fromthese,systemsparticipatingintheKnowledgeBasevariantwillproduceaknowledgebase.ThisKBwillbesubmittedtoNISTasasetofaugmentedtriples.ParticipatingKBsystemsmusttieeachentitymentioninthedocumentcollectiontoaparticularKBentitynode;inthisway,theknowledgebasecanbequeriedwithoutfirstaligningittoareferenceknowledgebase.Knowledgebaseswillincludemention, nominal_mention, canonical_mention,andtypetriples,aswellasthefullrangeofslotfills(alltriplesaredescribedmorefullybelow).
SystemsparticipatingintheSlotFillingvariantwillalsoreceive:2Becausefacilityandlocationentitiesarenotincludedintheslotdefinitions,onlyperson,organization,andgeopoliticalentitynodesmustbelinkedtotheslots.
4
3. asetofColdStartSlotFilling(CSSF)evaluationqueries(eachevaluationqueryisasequenceofoneortwoslotfillingqueriestobeappliedinseries).
Forbothvariants,theresultswillthenbeevaluatedbyNIST:
• SystemsparticipatingintheSlotFillingvariantreturnslotfillersdirectlyinresponsetothegivenCSSFevaluationqueries,andthefillersarethenassessedandscoredforprecisionandrecall.
• EvaluationoftheKnowledgeBasevariantwillstartbyapplyingthesameCSSFevaluationqueriestothesubmittedknowledgebase.Eachquerywillstartatanamedentitymentioninadocument(identifiedbythequery’s<beg>and<end>tags),identifytheknowledgebaseentitythatcorrespondstothatmention,followasequenceofoneormorerelationswithintheknowledgebase,andendinaslotfill.TheresultingslotfillswillbeassessedandscoredinthesamewayasintheSlotFillingvariant.Forexample,aCSSFevaluationquerymightask‘whataretheagesofthesiblingsoftheBartSimpson3mentionedinDocument42?’AsystemthatcorrectlyidentifieddescriptionsofBart’ssiblingsinthedocumentcollection,linkedthemtotheappropriatenodeintheKB,andalsofoundevidenceforandcorrectlyrepresentedtheagesofthosesiblingswouldreceivefullcredit.
3ManyoftheexamplesusedtoillustratetheColdStarttaskaredrawnfromTheSimpsonstelevisionshow.ReaderslackingadetailedworkingknowledgeofgenealogicalrelationshipsintheBouvier/Simpsonfamilyneednotagonizeoverwhattheyhavebeendoingwiththeirlivesforthepastquartercentury,butmaysimplyvisithttp://simpsons.wikia.com/wiki/Simpson_Family.
5
Relation Inverse(s)per:children per:parents per:other_family per:other_family per:parents per:children per:siblings per:siblings per:spouse per:spouse per:employee_or_member_of {org,gpe}:employees_or_members* per:schools_attended org:students* per:city_of_birth gpe:births_in_city* per:stateorprovince_of_birth gpe:births_in_stateorprovince* per:country_of_birth gpe:births_in_country* per:cities_of_residence gpe:residents_of_city* per:statesorprovinces_of_residence gpe:residents_of_stateorprovince per:countries_of_residence gpe:residents_of_country* per:city_of_death gpe:deaths_in_city* per:stateorprovince_of_death gpe:deaths_in_stateorprovince* per:country_of_death gpe:deaths_in_country* org:shareholders {per,org,gpe}:holds_shares_in* org:founded_by {per,org,gpe}:organizations_founded* org:top_members_employees per:top_member_employee_of* {org,gpe}:member_of org:members org:members {org,gpe}:member_of org:parents {org,gpe}:subsidiaries org:subsidiaries org:parents org:city_of_headquarters gpe:headquarters_in_city* org:stateorprovince_of_headquarters gpe:headquarters_in_stateorprovince* org:country_of_headquarters gpe:headquarters_in_country*
Table1.Entity-valuedslots.SlotswithasterisksrepresentinverserelationsthatwillneedtobeaddedbyparticipantsfrompreviousyearsSlotFillingtask(2014andearlier).Thetypequalifierofeachrelation(per ,org orgpe)isthetypeofitssubject,whilethetypequalifierforitsinverseisthetypeofitsobject.Asetoftypesmeansthatanyofthosetypesisacceptableforthatslot.Allsubmittedslotnamesmustuseonlyasingletypespecification.
6
Schema
TheschemaforColdStart2016combinestheentityandmentiontypesfromTACKBP2016EntityDiscoveryandLinking,andtherelationtypesfromTACKBP2015ColdStartKnowledgeBasePopulation.Thus,theschemaincludesfiveentitytypes(person,organization,geopoliticalentity,facility,andlocation)andforty-onerelationtypesandtheirinverses.Annotation/assessmentguidelinesareavailableontheTACwebsite(http://www.nist.gov/tac/2016/KBP/ColdStart/guidelines.html),andaremorefullydocumentedinthedatapackagesthatcanberequestedfromtheLDCuponcompletionofTACKBPtrackregistration.
ColdStartentitiesandentitymentionsaredefinedbyDEFTRichERE.FullannotationguidelinesforDEFTRichEREentitiesareincludedintheDEFTRichEREannotationpackages,availablefromtheLDC,butahigh-levelsummaryofthefiveentitytypesandtheirmentionsareavailableinRichEREAnnotationGuidelinesOverview.ForColdStart,theentitymentiontypesthatmustbeextractedarelimitedtonamedandnominalmentions,andtheentitiesmustbespecificindividualentities(asdescribedinAnnotationGuidelinesforIndividualityofSpecificEntities).AColdStartnamedentitymentionisthesameasanamedentitymentioninRichERE;i.e.,aColdStartnamedentitymentionisamentionthatuniquelyreferstoanentitybyitspropername,acronym,nickname,alias,abbreviation,orotheralternatename,andincludespostauthornamesfoundinthemetadataofdiscussionforumdocuments.Theextentofthenamedentitymentionistheentirestringrepresentingthename,excludingtheprecedingdefinitearticleandanyotherpre-posedorpost-posedmodifiers.AColdStartnominalentitymentionistheheadofthenominalentitymentioninRichERE;i.e.,aColdStartnominalentitymentionisamentionnotincludingtheentity'spropername,referringtoitbyacommonnounphrase(butforColdStart,thenominalmentionisonlytheheadnounofthenominalphrase).Entitymentionsareallowedtonestoroverlap;forexample,thestring“PhiladelphiaEagles”mightbeamentionofanORG(thefootballteam),whilethefirstwordmightsimultaneouslybeamentionofaGPE(thecityofPhiladelphia).
TheColdStartinventoryofslotsisdescribedthoroughlyinTACKBP2015SlotDescriptionsandTACKBP2015AssessmentGuidelinesavailableontheTACWebsite.Forty-oneslotsandtheirinversesareusedfortheevaluation.Twenty-sixofthesehavefillsthatarethemselvesentities,asshowninTable1.Theremainingfifteenslotshavestringfills,asshowninTable2.Eachentity-valuedslotwill
per:alternate_names org:alternate_names per:date_of_birth org:political_religious_affiliation per:age org:number_of_employees_members per:origin org:date_founded per:date_of_death org:date_dissolved per:cause_of_death org:website per:title per:religion per:charges
Table2.String-valuedslots.
7
haveaninverse.4Allinverserelationsmustbeexplicitlyidentifiedinthesubmittedknowledgebase.Thatis,iftheKBassertsthatrelationRholdsbetweenentitiesAandB,thenitmustalsoassertthatrelationR-1holdsbetweenBandA.Asaconvenience,theColdStartKBvalidationscriptcanbeusedtointroducemissinginversesintoaKB.
DocumentCollection
TheColdStart2016evaluationdocumentcollectionwillbetheTACKBP2016EvaluationSourceCorpus,whichcomprisesapproximately90,000documents,roughlyequallydistributedbetweenEnglish,Spanish,andChinese,andbalancedbetweennewswire(NW)andmulti-postdiscussionforum(MPDF)documents.Thesedocumentswillbenew(previouslyunreleased)documentsthatwillbedistributedbyNISTviaWebdownloadatthebeginningoftheColdStartevaluationwindow.Therewillbeexactlyonefileperdocument,andallfileswillbeparsableasXML.Eachfilewillbeginwiththeopeningtagofthe<DOC>element(<doc>forMPDF);5notethat<DOC>canbespelledwitheitheruppercaseorlowercaseletters,dependingonthegenre,andmayoptionallyincludeadditionalattributes(suchas"type"forsomenewswiredata).
Newswiredatawillusethefollowingmarkupframework:
<DOC id="{doc_id_string}" type="{doc_type_label}">
<HEADLINE>
...
</HEADLINE>
<DATELINE>
...
</DATELINE>
<TEXT>
<P>
...
</P>
...
</TEXT>
</DOC>
4Someslots,suchasper:siblings,aresymmetric.Others,suchasper:parents,haveinversesthatwerealreadyinthe2014EnglishSlotFillingtrack(inthiscase,per:children).Theremainingslots(e.g.,org:founded_by)hadnocorrespondingslotinthe2014EnglishSlotFillingtrack;ColdStartspecifiesnewslotnamesfortheseinverses.Allsuchslotsarelist-valued.5IncontrasttosomeoftheKBPsourcecorporafrompreviousyears,theTACKBP2016SourceCorpuswillnotcontainanyfilesthatbeginwithxmldeclarationssuchas<?xmlversion="1.0"encoding="utf-8"?>.ThisistoensurethatoffsetsalignacrossthevariousKBP2016tracksthatareusingthissameevaluationsourcecorpus,regardlessofwhetheroffsetsarecountedfromthebeginningofthefile,orthebeginningofthe<DOC>tag.
8
wheretheHEADLINEandDATELINEtagsareoptional(notalwayspresent),andtheTEXTcontentmayormaynotinclude"<P>...</P>"tags(dependingonwhetherornotthe"doc_type_label"is"story").
Multi-PostDiscussionForumfiles(MPDFs)arederivedfromDiscussionForumthreads.Theyconsistofacontinuousrunofpostsfromathreadbuttheyareonlyapproximately800wordsinlength(excludingmetadataandtextwithin<quote>elements).Whentakenfromashortthread,aMPDFmaycomprisetheentirethread.However,whentakenfromlongerthreads,aMPDFisatruncatedversionofitssource,thoughitwillalwaysstartwiththepreliminarypost.TheMPDFfileswillusethefollowingmarkupframework,inwhichtheremayalsobearbitrarilydeepnestingofquoteelements,andotherelementsmaybepresent(e.g."<a...>...</a>"anchortags):
<doc id="{doc_id_string}">
<headline>
...
</headline>
<post ...>
...
<quote ...>
...
</quote>
...
</post>
...
</doc>
Allprovenance/justificationsforallKBP2016tasksmustbedrawnfromthedocumentsintheTACKBP2016EvaluationSourceCorpus.EachdocumentisrepresentedasaUTF-8characterarrayandbeginswiththe<DOC>tag,wherethe“<”characterhasindex0forthedocument.Thus,offsetsforprovenancearecountedbeforeXMLtagsareremoved.Startoffsetsmustbetheindexofthefirstcharacterinthecorrespondingstring,andendoffsetsmustbetheindexofthelastcharacterofthestring(therefore,thelengthofthecorrespondingstringisendoffset–startoffset+1).
AllKBP2016systemsshouldreturnextractionsfromanywhereinthedocument,including<quote>regionsofMPDFdocuments.However,forthefollowingKBPtasks,inwhichevaluationisbycomparisonwithgoldstandardRichEREannotations(whichwillnotincludeannotationsof<quote>regions),thetrackcoordinatorwillautomaticallyfilterout<quote>regionsfromsubmittedrunsbeforescoring,soastoavoidpenalizingrunsthatinclude<quote>regions:
(a)EDL (b)BeliefandSentiment (c)EventNuggets (d)EventArgumentsForthefollowingKBPtasks,inwhichevaluationisbyassessment,assessmentandscoringwillallowprovenanceandextractionsfromanywhereinthedocument,including<quote>regions: (a)ColdStartSF (b)ColdStartKBConstruction
9
EvaluationQueriesCSKBandCSSFsystemsareevaluatedbythesamesetofColdStartevaluationqueries.AColdStartevaluationquerybeginswithoneormorementionsofthesameentity,followedbyasequenceofslotstobefilledfortheentity.Eachmentioninthequeryiscalledanentrypointbecauseitcanbeusedtoselect(atmost)oneentitynodeinaKBthatisbeingevaluated;multipleentrypointsareincludedforeachColdStartevaluationqueryinordertoincreasethechancesthattheKBwillhavearesponsetothequeryevenifitmissesoneentrypoint.EachColdStartevaluationqueryissplitintomultipleColdStartSlotFilling(CSSF)queries,withoneentrypointperCSSFquery(theCSSFquerieswillrequestthesameslots,buteachwillhaveadifferententrypoint).
ParticipantsintheSlotFillingvariantofColdStartwillreceivetheCSSFevaluationqueriesatthebeginningoftheCSSFevaluationwindow,andwillapplyascripttoincrementallyconvertthosequeriestoaformthatlookssimilartoqueriesfromthe2014EnglishSlotFillingtask.ParticipantsintheKnowledgeBasevariantwillnotreceivethequeries;rather,NISTwillapplytheevaluationqueriestoeachsubmittedknowledgebaseandassesstheresults. AnoutlineoftheNISTassessmentprocessforbothColdStartvariantsisgivenbelow.
AllCSSFevaluationqueriesstartwithanentrypointintotheknowledgebasebeingevaluated.Theentrypointisdefinedbyanamedentitymention(name,docid,beginoffset,andendoffset),andisfollowedbytheentitytypeandeitheroneortwoslotstobeextractedfortheentity.
Evaluationqueriescouldtakeoneoftwoforms:single-hopormultiple-hop.Forexample,hereisasamplesingle-hopCSSFevaluationquerythatasks“WhatistheageoftheJuneMcCarthymentionedatoffsets16931-16943inDocument42?”:
<query id="CSSF16_ENG_00243754cd"> <name>June McCarthy</name> <docid>42</docid> <beg>16931</beg> <end>16943</end> <enttype>PER</enttype> <slot>per:age</slot> <slot0>per:age</slot0> </query>
Thissingle-hopquerylooksverymuchlikeaqueryfromthe2014EnglishSlotFillingtask,exceptthateachqueryinColdStartasksforaspecificslot,ratherthanallslotsforwhichthereisinformationinthedocumentcollection.6
Amorecomplex“two-hop”querymightask,“WhataretheagesofthechildrenoftheJuneMcCarthymentionedatoffsets16931-16943inDocument42”:
<query id="CSSF16_ENG_002109743e"> <name>June McCarthy</name> <docid>42</docid> <beg>16931</beg> <end>16943</end>
6ParticipantsintheSlotFillingvariantshouldtreatallotherslotsasiftheyappearinthe<ignore>fieldofaSlotFillingqueryfrom2013orearlier.
10
<enttype>PER</enttype> <slot>per:children</slot> <slot0>per:children</slot0> <slot1>per:age</slot1> </query>
Ingeneral,two-hopquerieswillstartfromanentrypoint(selectingthecorrespondingKBentityofaCSKBsubmission),followasingleentity-valuedrelation(fromTable1),thenaskforasingleslotvalue(fromeitherTable1orTable2).7Suchquerieswillverifythattheknowledgebaseiswell-formedinawaythatgoesbeyondbasicentitylinkingandslotfilling,withoutallowingcombinationsoferrorstodrivescorestozero.
Becausetwo-hopqueriesdonotlooklikeanyslotfillingqueriesfromKBP2009-2014,participantsintheColdStartSlotFillingvariantmustprocesstheCSSFqueriesintwo“rounds”usingtheCS-GenerateCSQueries.plscriptfromNIST,whichaddsthe<slot>entrytotheNIST-distributedCSSFqueries.ParticipantsintheSlotFillingvariantmusttreat<slot>astheslottobefilled.Duringthefirstround,<slot>willbeidenticalto<slot0>.TheCS-GenerateCSQueries.plscriptwillthenconvertafirstroundoutputfiletoasecondroundqueryfile.Secondroundqueriesgeneratedbythisscriptwillbear<slot>entriesequivalentto<slot1>.ThoughsomeoftheCSSFquerieswilldifferonlyinhavingdifferentmentions(possiblyforthesameentity)astheirentrypoints,participatingCSSFsystemsareprohibitedfromusinginformationaboutonequerytoinformtheprocessingofanotherquery.
FortheKnowledgeBasevariant,thefollowingrulesareappliedtomapfromaCSSFevaluationquerytoaknowledgebaseentry:First,formacandidatesetofallKBnodementionsthathaveatleastonecharacterincommonwiththeevaluationquerymentionandthathavethesametype.Ifthissetisempty,thesubmissiondoesnotcontainanyanswersfortheevaluationquery.Otherwise,foreachmentionKinthecandidateset,calculate:
• COMMON,thenumberofcharactersinKthatarealsointhequerymentionQ.• K_ONLY,thenumberofcharactersinKthatarenotinQ.
Executeeachthefollowingeliminationsuntilthecandidatesetissizeone,andselectthatcandidateastheKBnodethatmatchesthequery:
• EliminateanycandidatethatdoesnothavethemaximalvalueofCOMMON• EliminateanycandidatethatdoesnothavetheminimalvalueofK_ONLY• Eliminateallbutthecandidatethatappearsfirstinthesubmissionfile
TheproperspecificationofmentionrelationsinaKBisthereforeimportantforscoringwell;CSKBparticipantsshouldthereforetakecaretoensurethateverynamedentitymentionintheevaluationcollectionservesasamentionrelationforanodeintheKB.
TheNISTevaluationofaKBwillproceedbyfindingallentriesintheKBthatfulfillanevaluationquery.Forexample,iftheevaluationquery‘schoolsattendedbythesiblingsofBartSimpson’findstwosiblingsforthenodespecifiedbytheentrypoint,andtheKBindicatesthatthosesiblingsattendedtwoandoneschoolsrespectively,thenthreeresultswouldbeassessedbyNIST.These
7Inprinciple,multiple-hopqueriescouldincludemorethantworelations,butwelimitourselvestotwoinColdStart2016.
11
resultswillbeconvertedtothesameformastheoutputfortheSlotFillingvariant.ResultswillbepooledacrossallCSKBandCSSFsubmissions,andassessorswilljudgethevalidityofeachresult.Finally,ascoringscriptwillreportavarietyofstatisticsforeachsubmittedrun.
Increatingevaluationqueries,LDCwillstrivetobalanceevendistributionacrossslottypeswithproductivityofthoseslots.Singlehopqueries,whichareofgreaterinterestforslotfilling,willinmanycasesaskformultipleslotsforagivenentityregardlessofwhetherfillersforthoseslotsareattestedinthedocumentcollection.Multiplehopquerieswillfavorentitiesandslotsequencesthatareattestedinthedocumentcollection(althoughheretoo,availabilityofanswersisnotguaranteedatanyhoplevel).
TaskOutput–KnowledgeBaseVariantCSKBsystemsmustproduceaknowledgebaseasoutput.ThefirstlineoftheoutputfilemustcontainauniquerunID.TheremainderoftheKBisrepresentedasasetofaugmentedtriples.Assertionswillappear,one-per-line,intab-separatedformat.TheoutputfilewillbeautomaticallyconvertedtoRDFstatementsduringevaluation.AlloutputmustbeencodedinUTF-8.
Eachtripleappearsintheoutputfileinsubject-predicate-objectorder.Forexample,toindicatethatentity-4hasentity-7asasibling,thetriplemightbe:
:e4 per:siblings :e7
Ifentity-4hassiblingsinadditiontoentity-7,theserelationsshouldbeenteredasseparatetriples.
Entities
Eachentityspecificationbeginswithacolon,followedbyasequenceofletters,digitsandunderscores.Examplesoflegalentityspecificationsinclude:Entity42,:EE74_R29,and:there_were_two_muffins_in_the_oven.Nomeaningisascribedtothissequencebytheevaluationsoftware;itisusedonlyasauniqueidentifier.Anysubsequentuseofthesamecolon-precededsequencewillbetakenasareferencetothesameentity.
Predicates
ThelegalpredicatesarethoseshowninTable1(includinginverses)andTable2,plustype,mention,nominal_mention,andcanonical_mention.
PredicatesfoundinTable1musthaveentityspecificationsinboththesubjectandobjectpositions.PredicatesfoundinTable2musthaveanentityspecificationinthesubjectposition,andadoublequote-delimitedstringintheobjectposition;thestringintheobjectpositionwillexactlycorrespondwiththeslotfillforthatrelationintheSlotFillingtask.Abackslashcharactermustprecedeanyoccurrenceofadoublequoteorabackslashinsuchastring.8Atleastoneinstanceofeachuniquesubject-predicate-objecttriplewillbeevaluated.Ifmorethanoneinstanceofagiventripleappearsintheoutput(witheachtriplehavingdifferentprovenance),LDCwillassesstheinstancewiththehighestconfidencevalue(seebelow),andwillassessadditionalinstancesifresourcesallow.Ifmorethanonesuchtriplesharesthesameconfidencevalue,thetriplethatappearsearlierintheoutputwillbeconsideredtohavehigherconfidence.
8Eachbackslashusedtoquotethefollowingcharacterdoesn’titselfhavetobeprecededbyanotherbackslash.
12
type
Eachentitywillbethesubjectofexactlyonetypetriple.TheobjectofthattriplewillbeeitherPER,ORG,GPE,FACorLOCdependingonthetypeoftheentity.Itisuptosubmittingsystemstocorrectlyidentifyandreportthetypeofeachentity.
mention and nominal_mention
Eachentitywillbethesubjectofone9ormorementionornominal_mentiontriples.Togetherwiththeprovenanceinformation(seebelow),thesetriplesindicatehowtheknowledgebaseistiedtothedocumentcollection.Eachnamedentitymentioninthecollectionmustbesubmittedastheobjectofamentiontriple,whileeachnominalentitymentioninthecollectionmustbesubmittedastheobjectofanominal_mentiontriple.Forexample,ifanentityismentionedbynamefivetimesinadocument,fivementiontriplesshouldbegenerated.Theobjectofamentionornominal_mentiontripleisthedouble-quotedmentionstring;documentIDandoffsetappearunderprovenanceinformation(seebelow).ThedefinitionofwhatconstitutesanamedornominalentitymentionforColdStartisdescribedintheColdStartschemaabove.
canonical_mention
Foreachdocumentthatmentionsanentity,oneofthementionsornominal_mentionsmustbeidentifiedasthecanonicalmentionforthatentityinthatdocument;itisthestringthatwillbeseenbytheassessorifthatentityappearsasaslotfill,supportedbythatdocument(inSlotFillingtaskterms,itisthecontentofColumn5ofaCSSF2016submission,anditsprovenancewillserveasColumn7oftheCSSFsubmission).10Canonicalmentionsareexpressedusingacanonical_mentiontriple.Theargumentsforcanonical_mention arethesameasformentionandnominal_mention.Notethatthereisnorequirementthatsubmissionsselectasingle,globalcanonicalmentionforanentity.Whilesuchamentionmightbeuseful,herewerequirethatacanonicalmentionbeprovidedwithineachdocumentfortheassessortouseduringassessment.Eachcanonical_mentionisalsoamention or nominal_mention.Asaconvenience,ifasubmittedKBdoesnotcontainamentionornominal_mentiontripleforeachcanonical_mentiontriple,themissingrelationswillbeinferred(perhapsincorrectly)asnamedmentions(albeitwithawarning).ThisshortcutisprovidedtomakesubmittedKBseasiertoview,anddoesnotrelievesubmittersfromtherequirementtoprovideeachoftherequiredmentions,nominal_mentions,andcanonical_mentions.
9Unlikepreviousyears,ColdStart2016requiresbothnamedandnominalentitymentionstobeextractedandincludedintheKB.10IntheSlotFillingtaskofKBP2009-2014(andintheSlotFillingvariantofColdStart),allslotfillsarestrings.Assessorsverifythevalidityofaslotfillbylookingforthatstringinthespecifieddocument,usingtheprovenanceinformationprovidedinthesystemresponse.InasubmittedKB,slotsthatarefilledwithentitiesholdnotstrings,butpointerstotheKBstructurefortheappropriateentity.Thus,acanonicalmentionmustbeidentifiedbytheColdStartKBforeachentityineachdocument,sothattheassessorcanbepresentedwithastringthatrepresentstheentityduringassessment.Arelationprovenance(seebelow)entrymayincludemorethanonedocument,andatleastoneofthosedocumentsmustcontainamentionoftheobjectoftherelation;thatdocumentmustthereforecontainacanonicalmentionfortheobject.Whenselectingacanonicalmentionforpresentationtotheassessor,thefirstdocumentappearingintherelationprovenancethatcontainsamentionoftheobjectwillbeusedforthecanonicalmention.
13
TaskOutput–SlotFillingVariantOutputfortheSlotFillingvariantwillbeintheformofatab-separatedfile.Thecolumnsofthesubmittedfileareasfollows:
Column1 QueryID.Forthefirstround,thisistakendirectlyfromthe<query>XMLtag.Forthesecondround,thisisdrawnfromthe<query>tagofthequerygeneratedfromthefirstroundoutput.
Column2 Thenameoftheslotbeingfilled.
Column3 AuniquerunIDforthesubmission.
Column4 Provenancefortherelationbetweenthequeryentityandslotfiller,consistingofupto4docid:startoffset-endoffsettriplesseparatedbycommas.Individualspansmaycompriseatmost150UTF-8characters.Unlikethe2014SlotFillingtask,thereisnorequirementtogenerateNILentrieswhennoinformationaboutthetargetentityisavailable.
Column5 Aslotfiller(possiblynormalized,e.g.,fordates).Thisisusedbothtopopulatethe<name>entryofthenextroundquery,andbytheassessortojudgetheslotfill.ThestringshouldbeextractedfromthefillerprovenanceinColumn7,exceptthatanyembeddedtabsornewlinecharactersshouldbeconvertedtoaspacecharacteranddatesmustbenormalized(therefore,slotfillersshouldnotbetranslatedacrosslanguages).Ifanominalmentionisreturnedasaslotfiller,onlytheheadwordofthenominalphraseshouldbereturned(consistentwiththeEDLdefinitionofnominalmentions).Fordates,systemsmustnormalizedocumenttextstringstostandardizedmonth,day,and/oryearvalues,followingtheTIMEX2formatofyyyy-mm-dd(e.g.,documenttext“NewYear’sDay1985”wouldbenormalizedas“1985-01-01”);ifafulldatecannotbeinferredusingdocumenttextandmetadata,partialdatenormalizationsareallowedusing“X”forthemissinginformation.
Column6 Afillertype,selectedfrom{PER,ORG,GPE,STRING}.TheSTRINGfillerisusedforstring-valuedslotsshowninTable2.
Column7 Provenancefortheslotfillerstring.Thisiseitherasinglespan(docid:startoffset-endoffset)fromthedocumentwherethecanonicalslotfillerstringwasextracted,or(inthecasewhentheslotfillerstringinColumn5hasbeennormalized)asetofuptotwocomma-separateddocid:startoffset-endoffsetspansforthebasestringsthatwereusedtogeneratethenormalizedslotfillerstring.ThedocumentsusedfortheslotfillerstringprovenancemustbeasubsetofthedocumentsprovidedinColumn4.Thiscolumnservestwopurposes.First,LDCwilljudgeCorrectvs.Inexactwithrespecttothedocument(s)providedintheslotfillerstringprovenance.Second,thiscolumnisusedtofillthe<docid>,<beg>and<end>entriesinsecondroundqueries.Ifmorethanoneprovenancetripleisprovidedhere,thefirstonewillbeusedtofillthesecondroundquery.
14
Column8 Confidencescore.
TheprocessforconstructingaSlotFillingvariantsubmissionisasfollows:
• DownloadthefollowingfromtheNISTWebsite:o TheColdStartevaluationdocumentsCS-GenerateQueries.plscripto CS-PackageOutput.plscripto CS-ValidateSF.plscript
• [email protected]:o TheCSSFevaluationqueries
• ConfigureyoursystemtoproduceresultsonlyfromtheColdStartevaluationdocuments.• RuntheCS-GenerateQueries.plscriptontheevaluationqueriestoproducethefirstround
queriesyoursystemwillrunon.Notethattherawevaluationqueriesmightdifferfromtheformatgivenabove,soyoushouldnotassumethatyoucanusethemasinputtoyoursystemwithoutrunningthisscript.
• Runyoursystem,producingaslot-fillingsubmissionforthefirstroundqueries.• RuntheCS-ValidateSF.plscriptonyourfirstroundoutputtoverifythatitisformatted
correctly.• RuntheCS-GenerateQueries.plscriptontheevaluationqueriesandyourfirstround
outputtoproducethesecondroundqueries.• Runyoursystemonthesecondroundqueriestoproduceasecondoutputfile.• RuntheCS-PackageOutput.plscriptonthetwooutputfilestoproduceyoursubmission.• RuntheCS-ValidateSF.plscriptonyoursubmissiontoverifythatitisformattedcorrectly.• UploadthesubmissiontoNIST.
TaskOutput–AllVariants
Provenance
EachtripleinCSKBsubmissionsandeachoutputlineinCSSFsubmissionswillincludeasetofaugmentations(againusingtabsasseparators).Exceptforthetypepredicate(whichdoesnotrequireexplicitsupportfromadocument)thefirstaugmentationswilldescribetheprovenanceoftheassertion.ProvenanceforsubmissionsfortheSlotFillingvarianthavealreadybeendescribedabove;correspondingprovenancefortriplesinKBvariantsubmissionsaredetailedhere:
ForpredicatesforrelationsfromTable1orTable2,uptofourcomma-separatedjustificationswillbeallowedforeachentry,atthesubmitter’sdiscretion.Justificationsdonotneedtobeexplicitlyassociatedwithsubject,relationorobject.EachjustificationwillincludeadocumentID,followedbyacolon,followedbytwodash-separatedoffsets(beginandendoffsets).Theoffsetsthatshowtheprovenanceofanextractedrelationareusedtonarrowtheassessor’sfocuswithinthedocumentswhenassessingthecorrectnessofthatrelation.Provenanceforasinglerelationmaybedrawnfrommorethanonedocument.FortheKBvariant,whenselectingacanonicalmentionforpresentationtotheassessor,thefirstdocumentappearingintherelationprovenancethatcontainsanamedornominalmentionoftheobjectwillbeusedforthecanonicalmention.(AtleastoneofthedocumentsintheKB’srelationprovenancemustcontainanamedornominalmentionoftheobjectoftherelation;thatdocumentmustthereforecontainacanonicalmentionfortheobject.)Therefore,participantsshouldbecarefultoensurethatifsomedocumentscontainnominalcanonicalmentions,andsomedocumentscontainnamedcanonicalmentions,thatthedocument
15
containinganamedcanonicalmentionappearsasthefirstdocumentintheprovenance.String-valuedslots(fromTable2)whosevaluesdonotrepresententities,placeanadditionalconstraintonprovenanceforKnowledgeBasevariantparticipants:thefirstjustificationmustrepresentthedocumentIDandoffsetsofthestringfill.(SlotFillingvariantparticipantsarealreadyprovidingthisinformationinColumn7oftheirsubmissions.)Thisrequirementwillallowassessorstoquicklyseethetextfromwhichthestringfillwasextracted.
UnlikeentriesforSlotFillingrelations,themention, nominal_mention, andcanonical_mentionpredicateswillhaveonlyasinglejustification,representingtheexactlocationofthementioninthetext.Thetypepredicaterequiresnoprovenance.
ConfidenceMeasure
Topromoteresearchintoprobabilisticknowledgebasesandconfidenceestimation,eachtripleorslotfillmayhaveanassociatedconfidencescore.ConfidencescoreswillnotbeusedforanyofficialTAC2016measure.However,thescoringsystemmayproduceadditionalmeasuresifconfidencescoresareincluded.Confidencescoreswillbeusedtoinduceatotalorderoverthefactsbeingevaluated(tiesarebrokenwhentwoscoresareequalbyassumingthattheassertionappearingearlierinthesubmissionhasahigherscore).Anysubmittedconfidencescoremustbeapositiverealnumberbetween0.0(exclusive,representingthelowestconfidence)and1.0(inclusive,representingthehighestconfidence),andmustincludeadecimalpoint(nocommas,please)toclearlydistinguishitfromadocumentoffset.Confidencescores,ifpresent,willappearattheendofeachoutputline,separatedfromtheprovenanceinformationwithatab.Confidencescoresmaynotbeusedtoqualifytwoincompatiblefillsforasingleslot;submittersystemsmustdecideamongstsuchpossibilitiesandsubmitonlyone.Forexample,ifthesystembelievesthatBart’sonlysiblingisLisawithconfidence0.7andMilhousewithconfidence0.3,itshouldsubmitonlyoneofthesepossibilities.Ifbotharesubmitted,itwillbeinterpretedasBarthavingtwosiblings.
Comments
Outputfilesmaycontaincomments,whichbeginatanyoccurrenceofapoundsign(#)andcontinuethrough(butdonotinclude)theendoftheline.Commentsandblanklineswillbeignored.ThefirstlineofaKBvariantoutputfilemustcontaintheuniquerunID(i.e.,itmaynotbeblank).Submittersmayliketoaddacommenttothislinegivingfurtherdetailsabouttherun.
Examples
ThefollowingfivelinesfromaKnowledgeBasevariantsubmission11showexamplesof:onetriplewithoutanyaugmentations,twowithonlymentionextent,onewithonlyrelationprovenance,andonewithbothrelationprovenanceandconfidence.
:e4 type PER :e4 mention “Bart Simpson” Doc726:37-48 :e4 nominal_mention “brother” Doc726:15-21 :e4 per:siblings :e7 Doc124:283-288,Doc885:173-179,Doc885:274-281 :e4 per:age "10" Doc124:180-181,Doc885:173-179 0.9
HereareexamplelinesfromaSlotFillingvariantsubmission:
11ThefirstthreelinescanreadilybeconvertedtoformpartofanEDLsubmission,whichcanbeevaluatedasintheEDLtrack.
16
Q4 org:city_of_headquarters myrun1 Doc42:3-8,Doc8:3-11 Baltimore GPE Doc8:3-11 1.0 Q5 per:siblings myrun1 Doc124:283-288,Doc885:173-179 Lisa PER Doc124:283-286 0.7
Q6 per:age myrun1 Doc124:180-181,Doc885:173-179 10 STRING Doc124:180-181 0.9
Differencesbetween2014SlotFillingandthe2016ColdStartSlotFillingVariantSlotfillingsystemsthatparticipatedinthe2014SlotFillingtaskwillneedtohandlethefollowingdifferencestosuccessfullyparticipateinthe2016CSSFtask:
• Onlytheslotspecifiedbythe<slot>entryistobefilled;allotherslotsshouldbeignored.The<slot>entryisaddedtothequeriesreceivedfromNISTbyrunningtheCS-GenerateQueries.plscript.
• Participantswillneedtodooneroundofslotfilling,runtheCS-GenerateQueries.plscripttocreatethesecondroundqueries,thenrunslotfillingagainonthenewqueries.TheresultsofroundsoneandtwoaretobeconcatenatedbeforesubmissionusingtheCS-PackageOutput.plscript.
• CSSFrequiresthatparticipantsbeabletofillallslotsinbothdirections.Forexample,the2014SlotFillingtaskrequireddetectionofthe per:cities_of_residence slot.CSSFalsorequiressystemstobeabletodetecttheinverseofthatslot,gpe:residents_of_city.
• Eachslotfillermustbeassignedatype,selectedfrom{PER,ORG,GPE,STRING}.Thisfieldrepresentsanadditionaloutputcolumnnotfoundinthe2014SlotFillingorCSSFtasks.
• NILentries,indicatingthatnoinformationaboutaparticularslotisavailable,arenotrequiredinCSSF.
• Nominalmentionsofslotfillersmaybereturnifnonamedentitymentionisavailableinthedocumentcollection.(Returningnominalentitymentionsisnotrequired,butmayimprovesystemrecallifdonecorrectly.)
• InadditiontoEnglish,slotfillersandprovenancemayalsobereturnedfromChineseandSpanishdocuments(onlyiftheteamisparticipatinginoneofthelanguageconditionsthatisn'tmono-lingualEnglish).
EvaluationTheprimaryevaluationforbothColdStartSFandColdStartKBconstructionistheslotfillingevaluation,basedonassessmentofslotfillersfoundinresponsetoColdStartevaluationqueries.Inaddition,theentitydiscoverycomponentofColdStartKBsissecondarilyevaluatedusingthesamesetofevaluationdocumentsandannotationsasintheEDLtrack.
SlotFillingAssessment
ColdStart2015assessmentandscoringwillproceedasfollows:Theresponsesforeachevaluationquery(frombothtaskvariantsandfromhuman-generatedresults)willbepooled,andeachresponsewillbeassessedbyaperson.TheresultoffollowingthefirstrelationwillbeassessedasifitwereaSlotFillingquery(forKnowledgeBasevariantentries,thecanonicalmentionoftheobjectentityinthefirstsupportingdocumentthatmentionsthatentitywillbeusedfortheslotfill).ThesecondrelationinthequerywillalsobeassessedasaSlotFillingquery,butonlyifthefillforthefirstrelationiscorrect.Ifthefillforthefirstrelationisnotcorrect,eachfillforthesecondrelationisautomaticallycountedasWrong.Forexample,ifthequeryasksfortheagesofthesiblingsof“BartSimpson,”andthesubmittedknowledgebasegives“Lisaage8”and“Milhouseage
17
10”assiblings,thenonlythereportedageofLisawillbeassessed(MilhouseisnotBart’ssibling),andthereportedageofMillhousewillautomaticallybecountedasWrong.
ColdStartusespseudo-slotscoringtoevaluatemultiple-hopqueries,inwhicheachevaluationqueryistreatedasifitselectsasingleindivisibleslot.Forexample,anevaluationquerythatasksforthechildrenofthesiblingsofanentitywillbescoredasifitwereaqueryaboutavirtualper:nieces_and_nephewsslot.12TheguidelinesinTACKBP2015SlotDescriptionsspecifywhethereachofthecomponentslotsofapseudo-slotissingle-valued(e.g.,per:date_of_birth)orlist-valued(e.g.,per:employee_of,per:children).Apseudoslotissingle-valuedifeachofitscomponentslotsissingle-valued,andlist-valuedotherwise.IncontrasttotheSlotFillingtask,KnowledgeBasevariantsubmissionsmaycontainmultiplefillsforsingle-valuedslots.Ifsucharepresentinthesubmission,LDCwillassesstheslotfillwiththehighestconfidencevalue,andwillassessadditionalslotfillsifresourcesallow.Ifmorethanonesuchslotfillsharesthesameconfidencevalue,theslotfillthatappearsearlierintheoutputwillbeconsideredtohavehigherconfidence.
EachCSSFslotfillerresponse(orCSKBobjectofeachcomponentrelationthatmakesupasingleevaluationqueryresponse)isassessedasCorrect,ineXact,orWrong,followingguidelinesinTACKBP2015AssessmentGuidelines.Foreachquery,allsystemresponsesinwhichtheslotfillerisassessedasCorrectorineXactwillbepartitionedintoequivalenceclasses,whereslotfillersinthesameequivalenceclassrepresentthesameentityorvalue(asinthecaseofdates).EachCorrectorineXactresponsewillreceiveanannotationforfillermentiontype(eitherNAMorNOM),andeachequivalenceclasswillreceiveanannotationforequivalenceclassmentiontype(NAMiftheassessorcanfindanamedmentionforthefilleranywhereintheprovenancesinanyoftheresponses;otherwise,NOMifonlynominalmentionsappearintheprovenancesofallresponses).
Pseudo-slotswillbescoredjustasslotsintheSlotFillingtask,withtheadditionalconstraintthatboththeslotfillandthepathleadingtothatfillmustbecorrectfortheentiretytobejudgedcorrect.ToreceivecreditforidentifyingMaggieSimpsonasPattyBouvier’sniece,theknowledgebasemustnotonlyincludeMaggieastheslotfill,butmustalsorepresentMaggieasMarge’schild,andMargeasPatty’ssibling:13
Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children) GroundTruth: :PattyBouvier per:siblings :MargeSimpson :MargeSimpson per:children :MaggieSimpson Submission: :PattyBouvier per:siblings :MargeSimpson
:MargeSimpson per:children :MaggieSimpson⇒correct
AKBthatindicatedthatMaggiewasPatty’sniecebecauseshewasPatty’ssisterSelma’schildwouldbescoredasincorrect:
Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children) GroundTruth: :PattyBouvier per:siblings :MargeSimpson :MargeSimpson per:children :MaggieSimpson Submission: :PattyBouvier per:siblings :SelmaBouvier
:SelmaBouvier per:children :MaggieSimpson⇒incorrect
12Apseudo-slotissimilartotheconceptofarolechain,whichissupportedbysomeknowledgerepresentationsystemsbasedondescriptionlogic,includingOWL2.13Ineachoftheseexamples,onlythesubject,predicateandobjectareshown,andonlyasubsetoftherelevantknowledgebaseispresented.Eachentityisnamedafterthementionthatgaverisetoit.
18
Aresponseisinexactifiteitherincludesonlyapartofthecorrectanswerorincludesthecorrectanswerplusextraneousmaterial.InexactanswersarecountedasWrongforthepurposesofscoring:
Evaluationquery: TitlesofparentsofBartSimpson(per:parents,per:title) GroundTruth: :BartSimpson per:parents :HomerSimpson
:HomerSimpson per:title "Attack-dog trainer" Submission: :BartSimpson per:parents :HomerSimpson
:HomerSimpson per:title "dog trainer Pitiless Pup"⇒inexact
Inaddition,theobjectofthefinalrelationinapseudo-slotmayberatedasredundantifitisequivalenttoanotherfillforthepseudo-slot.RedundantanswersarecountedasWrongforthepurposesofscoring:
Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children)GroundTruth: :PattyBouvier per:siblings :MargeSimpson
:MargeSimpson per:children :MaggieSimpson :MaggieSimpson per:alternate_names "Margaret Simpson" Submission: :PattyBouvier per:siblings :MargeSimpson :MargeSimpson per:children :MaggieSimpson⇒correct
:MargeSimpson per:children :MargaretSimpson⇒redundant
However,objectsofrelationsotherthanthefinalrelationwillneverberatedasredundant:
Evaluationquery: NiecesandnephewsofPattyBouvier(per:siblings,per:children)GroundTruth: :PattyBouvier per:siblings :MargeSimpson
:MargeSimpson per:children :LisaSimpson :MargeSimpson per:children :BartSimpson :MargeSimpson per:alternate_names "Marjorie Simpson" Submission: :PattyBouvier per:siblings :MargeSimpson :PattyBouvier per:siblings :MarjorieSimpson :MargeSimpson per:children :LisaSimpson⇒correct
:MarjorieSimpson per:children :BartSimpson⇒correctHere,MargeSimpsonandMarjorieSimpsonrepresentthesamepersoninthegroundtruth,buttwodistinctentitiesintheKB.However,becausethequeryisaboutMarge’schildrenandnotaboutMargeherself,bothresponsestotheevaluationqueryareassessedascorrect.
SinceinColdStartthefactsbeingevaluatedcomefromsequencesoftriples,confidencescoreswouldneedtobecombinedifwewantedtogenerateconfidencescoresforaderivedpseudo-relation.Theproperwaytocombinescoresofcoursedependsonthemeaningofthosescores,andfornow,ColdStartisnotmandatinganyparticularmeaning.Threegeneralscorecombinationfunctionsaremin,maxandproduct;wewelcomecommentsfromthecommunityonwhichcombinationmethodstoreport.
SlotFillingScoring
Giventheaboveapproachtoassessment,basicscoringforagivensystemproceedsasfollows:
• EachresponseassessedasWrongorineXact,iscountedasSpurious• EachresponseforRound2whoseRound1parentfillerisassessedasWrongorineXact,is
countedasSpurious• ResponsesassessedasCorrectaregroupedintoequivalenceclasses.Foreachequivalence
class,atmostoneresponsefromthesystemiscountedasRight;allotherresponsesarecountedasSpurious(therefore,systemsshouldnotreturnredundantanswerstothesame
19
query).IfthesystemhasaNAMentitymentionintheequivalenceclass,orifthesystemhasonlyNOMentitymentionsandtheequivalenceclassisannotatedasNOM,thentheoneresponseiscountedasRight;otherwise,ifthesystemhasonlyNOMentitymentionsintheequivalenceclassandtheequivalenceclassisannotatedasNAM,thentheoneresponseiscountedasIgnore(i.e.,treatedasifitwasneverreturnedbythesystem).Thus,namedentitymentionsarepreferred.
• Reference=numberofsingle-valuedpseudo-slotswithacorrectresponse+numberofequivalenceclasses14foralllist-valuedpseudo-slots
• Recall=#Right/Reference• Precision=#Right/(#Right+#Spurious)• F1=2*Precision*Recall/(Precision+Recall)
Asin2015,eachColdStartevaluationqueryin2016mayhavemorethanoneentrypoint.BecausethenumberofentrypointsmaydifferarbitrarilybetweenColdStartevaluationqueries,wefocusontwoprimarymetricsforthe2016ColdStartKnowledgeBasePopulationsystemevaluation:
• MAX(micro-average):computeF1foreachentrypointasoutlinedabovetoselectasingle"maximal"entrypointforeachevaluationquery,wheretheselectedentrypointhasamaximalF1amongallentrypointsforthatquery.TheMAXmicro-averagePrecision,Recall,andF1forthesystemiscomputedbysummingthecountsacrossallqueries,usingonlytheselectedmaximalentrypointforeachquery.
• MEAN(macro-average):computeF1foreachentrypointasoutlinedabove.Thequery-levelscoreforaqueryisthemeanoftheF1scoresofeachofitsconstituententrypoints.TheMEANscoreforthesystemisthemeanofitsquery-levelscores.TheMEANmetricgivesequalweighttoeachquery,and(withineachquery)equalweighttoeachofitsentrypoints.
EntityDiscoveryScoring
ThescoringfortheEntityDiscoverycomponentofsubmittedColdStartKBswillbeidenticaltoscoringforthe2016TACTrilingualEntityDiscoveryandLinkingtask,withtheexceptionthatnolinkingtoanexistingknowledgebaseisrequired(thatis,allmentionswillbetreatedasNILentries).PleaseseeTACKBP2016EntityDiscoveryandLinkingTaskDescriptionforcompletedetailsonscoring.
SubmissionsAfour-weekwindowfromMondayAugust1toMondayAugust29willbeavailablefordownloadingtheTACKBP2016EvaluationSourceCorpus,producingCSSFandCSKBsystemoutput,andsubmittingresults.Systemsshouldnotbemodifiedoncethecorpushasbeendownloaded.StartingMonday,August15,participantsintheCSSFtaskmayemailNISTtorequesttheCSSFevaluationqueries,butteamsparticipatinginboththeCSSFandCSKBtasksmustsubmitallCSKBrunsbeforerequestingtheCSSFevaluationqueriesfromNIST.OnAugust15,automaticEDLoutputfromsystemsparticipatinginthefirstEDLevaluationwindow,willalsobemadeavailableasanoptionalresourcetoColdStartparticipants.
14SeeTACKBP2015SlotDescriptionsandTACKBP2015AssessmentGuidelinesforfurtherinformationonhowandwhentwoslotfillsaretreatedasequivalent.
20
ForeachoftheColdStarttaskvariants(CSSFandCSKB),ateammaysubmitupto5runsforeachofthefollowing4languageconditions:
1. MonolingualEnglish:entitymentions,slotfillsandprovenancesareextractedonlyfromEnglishdocuments
2. MonolingualSpanish:entitymentions,slotfillsandprovenancesareextractedonlyfromSpanishdocuments
3. MonolingualChinese:entitymentions,slotfillsandprovenancesareextractedonlyfromChinesedocuments
4. Cross-lingual:entitymentions,slotfillsandprovenancesareextractedfromanycombinationofEnglish,Spanish,andChinesedocuments.
IfateamsubmitsaruninvolvingmorethanonelanguageundertheCross-lingualcondition,itmustalsosubmitatleastonerununderthemonolingualconditionforeachlanguageinvolved(withadescriptionofwhichmonolingualrunconfigurationswereusedforeachcross-lingualrun).
Submittedrunsmustberanked(1-5)inorderofevaluationpreference.ThenumberofrunsactuallyevaluatedwilldependuponresourcesavailabletoNIST;the3top-rankedrunsfromeachteamwillbeassessedforeachtaskandlanguagecondition,andlower-rankedsubmissionsmaybeassessedifresourcesallow.TherunIDincludedineachteam'ssubmissionfilemustbeaconcatenationoftheteam'sTACKBP2016teamID,thetask(KBorSF),thelanguagecondition(ENG,CMN,SPA,orXLING),andarank(1-5);thus"Acme_KB_XLING_1"wouldbethetop-rankedrunfortheAcmeteamfortheCSKBtaskvariantunderthecross-lingualcondition.
Thetop-rankedsubmissionmustbemadeasa‘closed’system;inparticular,itmustnotaccesstheWebduringtheevaluationperiod.Allsubmissionsmustobeythefollowingexternalresourcerestrictions:
• Structuredknowledgebases(e.g.,Wikipediainfoboxes,DBPedia,Freebase)maynotbeusedtodirectlyfillslotsordirectlyvalidatecandidateslotfillers.
• Structuredknowledgebaseentriesfortargetentitiesmaynotbeedited,eitherduring,oraftertheevaluation.
Inaddition,becauseColdStartfocusesontheconditionwheretheknowledgebaseisinitiallyempty,weaskthateachparticipatingsitesubmitatleastonerunthatconsultsexternalentityknowledgebasesonlyafterentitiesandrelationshavebeenextractedfromthedocumentcollection.Detailsaboutsubmissionprocedureswillbecommunicatedtothetrackmailinglist.ToolstovalidateformatswillbeavailableontheTACWebsite(http://www.nist.gov/tac/2016/KBP/ColdStart/tools.html).
ChangeHistory• Version1.0
o Originalversion,basedonthe2015specificationo Addeddescriptionofmulti-lingualtaskso AligneddefinitionofentitytypesandmentiontypesintheKBConstructiontask,
withthoseinthe2016EntityDiscoveryandLinkingtracko Addeddescriptionofnominalentitymentionsandslotfillers