Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ExploringInformationSeekingandSearchingIntentions:AnOverviewofRecentResearchatRutgersUniversity
NicholasJ.BelkinSchoolofCommunication&Information
RutgersUniversityNewBrunswick,[email protected]
InformationSeekingandSearchingSituation
• Aperson,facingaproblematicsituation,withrespecttosometaskorgoal,decidesthatinteractingwithinformationcouldhelptoachievethegoaloraccomplishthetask.• Thatpersonmakesadecisionabouthowtobestcarryoutthatinteraction.ThisistheSeeking decision(Wilson,1999)• Whenthedecisionismadetointeractwithinformationthroughthemeansofsomesystem,Searching commences
TheGoal(s)ofInformationRetrieval
• Tosupporttheperson(s)inachievingthethegoalortaskwhichmotivatedthemtoengageininformationseekingandsearching• Todothisthroughhelpingtheperson(s)toresolvetheirproblematicsituation• TodothisbysupportingeffectiveinteractionwiththeIRsystemandtheinformationobjectswithinthatsystem• Todothisbyrespondingappropriatelytotheinformationsearchingintentionsoftheperson(s)duringthecourseofaninformationsearchingsession
MovingfromSystem-CenteredtoPerson-CenteredInformationRetrieval• Recognizethatinformationretrievalisinherentlyinteractive• Recognizethattheinformationretrievalsituationisinherentlydynamic• Recognizethatpeopleengageininformationseekingandsearchingsessions• MakethepersonintheIRsystemthecentralactor• Makeinteractionwithinformationobjectsthecentralprocess
AModelofInteractionwithInformation(Belkin,1996)
TakingAccountoftheInteractiveNatureofIR
• AresearchprogramatRutgersUniversityDepartmentofLibraryandInformationScience• PersonalizationoftheDigitalLibraryExperience(POoDLE)_IMLS• AutomaticIdentificationofInformationSearcherIntentionsDuringanInformationSeekingSession– Google• CharacterizingandEvaluatingWholeSessionInteractiveInformationRetrieval(CHEWS-IIR)– NSF(inprogress,describedtoday).
GeneralPatternofourStudies
• Constructworktasksofdifferenttypes,withassociatedinformationsearchingtasks• Haveparticipantsconductsearchforoneworktask
• Logbehaviors• Recordsearchsession
• Playbackinformationsearchsessionforparticipantannotation• Iteratefornextworktask,tofinalworktask• Exitinterview
WorkTasksandInformationSearchTasks
• JournalismDomain• Anytopic• Severalwell-definedtypesofworktasks,e.g.
• Advanceobituary;Copyediting;Prepareforinterview;Storypitch;Preparestory
• Constructedworkandsearchtasksdifferonvaluesofspecificfacets• Facetedclassificationoftask(Li&Belkin,2008)
Li&Belkin(2008)FacetAnalysisofTask(modified)
• SourceofTask• Self,Group,Assigned
• TaskDoer• Individual,Group
• Time• Frequency• Length• Stage
• Product• Physical, Intellectual,Decision,Factual
• Process• One-time,Multiple
• Items• NamedorNot• WholeorPart
• Goal• Quality
• Specific,Amorphous,Mixed• Quantity
• Singleormultiplegoals• Commonattributesoftask,e.g.
• Objective/Subjectivetaskcomplexity,Urgency,Salience,Difficulty,…
ExampleTaskandClassification
Assignment1.CopyEditing(CPE)YourAssignment:Youareacopyeditoratanewspaperandyouhaveonly20minutestochecktheaccuracyofsixitalicizedstatementsintheexcerptofapieceofnewsstorybelow.YourTask:Pleasefindandsaveanauthoritativepagethateitherconfirmsordisconfirmseachstatement.Product:Fact;Items:Named/Part;Goal:Specific
ExampleTaskandClassification
Assignment2.StoryPitch(STP)YourAssignment:Youareplanningtopitchasciencestorytoyoureditorandneedtoidentifyinterestingfactsaboutthecoelacanth(“see-la-kanth”),afishthatdatesfromthetimeofdinosaursandwasthoughttobeextinct.YourTask:Findandsavewebpagesthatcontainthesixmostinterestingfactsaboutcoelacanthsand/orresearchabouttheirpreservation.Product:Fact;Items:NotNamed/Part;Goal:Specific
ExampleTaskandClassification
Assignment3.Relationships(REL)YourAssignment:Youarewritinganarticleaboutcoelacanthsandconservationefforts.Youhavefoundaninterestingarticleaboutcoelacanthsbutinordertodevelopyourarticleyouneedtobeabletoexplaintherelationshipbetweenkeyfactsyouhavelearned.YourTask: Inthefollowingtherearefiveitalicizedpassages,findanauthoritativewebpagethatexplainstherelationshipbetweentwooftheitalicizedfacts.Product:Intellectual;Items:Named/Part;Goal:Mixed(Specific+Amorphous)
ExampleTaskandClassification
Assignment4.InterviewPreparation(INT)YourAssignment:Youarewritinganarticlethatprofilesascientistandtheirresearchwork.YouarepreparingtointerviewMarkErdmann,amarinebiologist,aboutcoelacanthsandconservationprograms.YourTask:Identifyandsaveauthoritativewebpagesforthefollowing:Identifytwo(living)peoplewholikelycanprovidesomepersonalstoriesaboutDr.Erdmannandhiswork.FindthethreemostinterestingfactsaboutDr.Erdmann’sresearch.FindaninterestingpotentialimpactofDr.Erdmann’swork.Product:Intellectual;Items:Not-Named/Whole;Goal:Amorphous
ParticipantsandProcedure• Journalismundergraduateuniversitystudents• Entryquestionnaire– demographics• Searchesfortwo(offour)tasksconductedinlabwitheyetracker (20minuteseach)• Pre-searchquestionnaire(whenpresentedwithtaskdescription)
• Familiaritywithtask,topic• Expecteddifficulty
• SearchconductedonWeb ,anysearchsystem,throughCoagmento• Post-searchquestionnaire
• Experienceddifficulty• Confidenceintasksuccess
• Playbacksearchforannotation,byQuerySegment• QSisqueryn,allthathappensuptoandincludingqueryn+1(orend)
• Exitinterview• Comparisonoftwotasksandtwosearchsessions
Annotation
• PlaybackQSn• Whatwereyouintendingtoaccomplishduringthisperiod
• Choiceofintentions,canbemultiple• Foreachintention:Wasthisintentionsatisfied?Ifno,whynot
• [textentry]• Whatwereyouhopingtoaccomplishwith[queryn+1]
• [textentry]• PlaybackQSn+1
Xie’s (2002)Interactive[Search]Intentions
• Identifysearchinformation(Somethingtostart;Somethingmoretosearch)• Learn(Domainknowledge;Databasecontent)• Find(Knownitem;Specificinformation;Sharingnamedcharacteristic;Withoutpredefinedcriteria)• Keeprecord• Accessitemorsetofitems• Evaluate(Correctness;Usefulness;Best;Specificity;Duplication)• Obtain(Specificinformation;Partofitem;Wholeitem)
DataAnalyses(SoFar)
• Queryingbehaviorandsearchintentions• Relationshipsbetweenqueryreformulation“types”andsearchintentions• Effectofintentionsatisfactiononqueryreformulationtype• Classificationofreasonsforqueryreformulation
• Intentionsandsearchbehaviors• AretheXie searchintentionsnecessaryandsufficient• Sequencesofsearchintentions• Predictionofsearchintentionbasedonsearchbehavior
QueryReformulationTypes(Liuetal.2010,modified)Type Definition Examples
Generalization Atleastonetermincommonintwoqueries;secondquerycontainsfewertermsthanfirstquery
worldeconomicimpactonglobalwarmingonArcticregionà globalwarmingonArcticregion
Specialization Atleastonetermincommonintwoqueries;secondquerycontainsmoretermsthanfirstquery
impactDr.Erdmannà impactDr.MarkErdmann
WordSubstitution Atleastonetermincommonintwoqueries;secondqueryhasthesamelengthasfirstquery,butcontainssometermsnotinthefirstquery
IgorSemiletov researchà igorsemiletov methane
Repeat Exactlythesameterm(s)repeatedfromanypreviousquerieswithinthesession
Coelacanths(1stquery)àCoelacanths(5thquery)
New Nocommontermsintwoqueries whereismadagascaràcoelacanthsliveyoung
SpellingCorrection
Thesecondquerycorrectsmisspellingofthepreviousquery
methaneclarites articeconomicimmpactà methaneclarites arcticeconomicimpact
StemIdentical Twoquerieswiththesamemorphologicalroot methanekmà methanekilometers
QueryAnalyses
• Datafor24participants,48searchsessions• 434queries• 383queryreformulations,therefore383instancesofreasonsforqueryreformulation• 1824searchintentions
• medianperQS4,range1-16• 1575satisfied,249unsatisfied
Totalcountsforeachintention
QueryReformulationsandSearchIntentions
• RQ1:Whattypesofreformulationsareusedfollowinganysearchintention• RQ2:Whattypesofreformulationsareusedwhenanintentioniseithersatisfiedornotsatisfied?• RQ3:Whatarethesubsequentintentionsofreformulations
Rha,E.Y.,Belkin,N.J.,Mitsui,M.&Shah,C.(2016)Exploringtherelationshipsbetweensearchintentionsandqueryreformulations.In:Proceedingsofthe79thAnnualMeetingoftheAssociationforInformationScienceandTechnology,(9pp.).SilverSpring,MD:AssociationforInformationScienceandTechnology
Frequencyofsatisfiedandunsatisfiedintentionsleadingtoeachreformulationtype
Mostfrequentintentions,mostfrequentfollowingreformulations,andmostfrequentsubsequentintentions
PreviousIntention Satisfaction
Mostfrequentreformulation Subsequentintention(s)
Secondmostfrequentreformulation
Subsequentintentions(s)
FindspecificY Specialization Findspecific Generalization Findspecific
N Specialization Findspecific Generalization Findspecific
ObtainspecificY Specialization Findspecific Generalization Obtainspecific
N Specialization Obtainspecific Generalization Findspecific
IdentifymoreY Repeat Identifymore Specialization Identifymore
N Specialization Learndomain Repeat Identifymore
Learndomain
Y Specialization Findspecific Generalization Identifymore
N Specialization Learndomain GeneralizationLearndomain,Learndatabase
Identifystart
Y Specialization Findspecific Repeat Identifymore
N Specialization FindspecificObtainspecific GeneralizationIdentifystartFindknown
find specific
obtain specific
identify more
learn domain
identify start
evaluate correctnesskeep link
find common
evaluate usefulness
access item
learn database
access common
evaluate specificity
find known
evaluate best
access area
obtain part
obtain whole
find without
evaluate duplication
find specific
obtain specific
identify more
learn domain
identify start
evaluate correctnesskeep link
find common
evaluate usefulness
access item
learn database
access common
evaluate specificity
find known
evaluate best
access area
obtain part
obtain whole
find without
evaluate duplication
generalization
specialization
repeat
word substitution
new
spelling correction
stem identical
FIRST INTENTION SUBSEQUENT INTENTION
QUERY REFORMULATION
find specific
obtain specific
identify more
learn domain
identify start
evaluate correctnesskeep link
find common
evaluate usefulness
access item
learn database
access common
evaluate specificity
find known
evaluate best
access area
obtain part
obtain whole
find without
evaluate duplication
find specific
obtain specific
identify more
learn domain
identify start
evaluate correctnesskeep link
find common
evaluate usefulness
access item
learn database
access common
evaluate specificity
find known
evaluate best
access area
obtain part
obtain whole
find without
evaluate duplication
generalization
specialization
repeat
word substitution
new
spelling correction
stem identical
FIRST INTENTION SUBSEQUENT INTENTION
QUERY REFORMULATION
Reformulation&IntentionsDiscussion1• RQ1:Whattypesofreformulationsareusedfollowinganysearchintention• Specialization isthemostcommonreformulationfollowing12ofthe20intentions,thenRepeat,thenGeneralization
• Intentionshavedifferentpatternsofsubsequentreformulations• RQ2:Whattypesofreformulationsareusedwhenanintentioniseithersatisfiedornotsatisfied?• Inconclusive;toofewunsatisfied
• RQ3:Whatarethesubsequentintentionsofreformulations• Inconclusivebutpromising;eachsubsequentintentionhasadifferentpatternofprecursorreformulations,despitethedominationofSpecialization
Reformulation&IntentionsDiscussion2
• Despitethenatureoftheworkandsearchtasks,participantshadnodifficultyidentifyingdifferentintentionsassociatedwithdifferentquerysegments• Giventhedifferentnatureofthevariousintentions,thissuggeststhatsearchsupporttechniquesotherthanqueryreformulationcouldbeusefulinsupportingeffectiveinteraction• Thedegreeofsatisfactionofintentionsmaybeduetoeitherlowexpectations,orinventiveuseofreformulation
ReasonsforQueryReformulation
• Peoplereformulatequeries,butwedon’tknowwhattheyaretryingtoaccomplishbydoingthis;• RQ1:Whatarereasonsforqueryreformulation
• Peoplereformulatequeries,butwedon’tknowhowreformulationtypes relatetoreasons forreformulation;• RQ2:Howaretypesofqueryreformulationrelatedtousers’reasonsforqueryreformulations
• Peopleattempttoaccomplishdifferentsearchintentions,butwedon’tknowhowtheygoaboutdoingthatthroughqueryreformulation.• RQ3:Howdopreviousinteractivesearchintentionsrelatetoreasonsoffollowingqueryreformulations
Rha,E.Y,Wei,S.&Belkin,N.J.(2017)Anexplorationofreasonsforqueryreformulation.In:Proceedingsofthe80th AnnualMeetingoftheAssociationforInformationScienceandTechnology,(11pp.).SilverSpring,MD:AssociationforInformationScienceandTechnology
ProcedureforAddressingRQs
• Opencodingof383textswritteninresponsetothequestion:Pleaseexplainwhyyouenteredthisnewquery,andwhatyouwerehopingtoaccomplishbydoingso
• Identificationofcommonstructureofreasons,andcommonelementsinthatstructure• Developmentofafacetedclassificationbasedonstructureandelements• Analysisoftypesofreasonsinrelationshiptotypesofreformulationsandtypesofsearchintentions
ReasonsandCodingExamplesReason OpenCoding
“Tryingtofindinformationthatistruthful,andmorespecifictothesubject.” FindtruthfulinformationFindspecificinformation“Clarifymyoriginalsearch” Clarifyoriginalsearch
“LookedupforanyrecentnewsregardingArcticoilandgastoseeifIcouldbolstermyargumentwithanyrecentfactsthatwereperhapsinthenews.”
LookforrecentnewsBolstermyargument
“IenteredthisnewquerybecauseIfeltIdidnotusetherightwordinmyfirstqueryreferringtopeoplethescientistwouldhavehadrelationswithtoprovidetheanswertothefirstquestionoftheassignment.”
Userightword
“Iusedamoregeneralphrasetogetmorebackgroundinformationonthetopicandhopefullyfindauthoritativesourcesthatsupportedthefacts.”
GetbackgroundinformationFindauthoritativesources
ExamplesofNormalizationofOpenCoding
OpenCoding FinalCombination
Findtruthfulinformation Find-accurate-informationClarifyoriginalsearch Clarify-previous-search
LookforrecentnewsBolstermyargument
Find- up-to-date-publicationVerify-specific-knowledge
Userightword Correct-previous-query
Getbackgroundinformation
Findauthoritativesources
Obtain-background-information
Find-credible-source
FacetedClassificationBasedonReasonStructureFacet Sub-facets Values
ProcessOperational Find;Obtain;Access;Expand; Combine;Correct;Change; Narrowdown;Start
Interpretive Evaluate;Verify;Focuson;Learn;Clarify;Use;Understand
Aspect
Depth General;Specific;Background; Basic;Detailed
Time New;Previous;Up-To-DateQuality Interesting;Accurate;Credible;Better;UsefulQuantity Multiple;Single
Relationship Similar;Different;Relevant; More
EntityContent Knowledge;Information;Topic;Definition;Fact;DomainResource Source;Website;PublicationSearch Searchresult;Query;Search
DistributionofReasonsforReformulation
MappingReasonstoSearchIntentionsXie’s (2002)SearchIntention ReasonCombination
Findspecificinformation
Find-specific-informationFind-specific-publicationFind-specific-sourceFind-specific-website
Identifymoretosearch Find-more-information
EvaluatecorrectnessVerify-specific-factVerify-specific-information
Obtainspecificinformation Obtain-specific-informationFinditemswithoutpre-definedcriteria
Find-interesting-factFind-different-information
Learndomainknowledge Learn-specific-topic
RelationshipofReasonstoReformulations
ReasonsandIntentionsDiscussion1
• RQ1:Whatarereasonsforqueryreformulation• Afacetedclassificationschemeprovideswaystocharacterizereasonsatdifferentlevelsofgranularity,buttherearemanypossiblecombinations
• Manyofthereasons(butnotall)maptoXie’s (2002)interactivesearchintentions
• RQ2:Howaretypesofqueryreformulationrelatedtousers’reasonsforqueryreformulations• Participantsuseddifferentqueryreformulationtypestoaccomplishthesamereasons,and
• Thesamereformulationtypeswereusedtoaccomplishmultiplereasons
ReasonsandIntentionsDiscussion2
• RQ3:Howdopreviousinteractivesearchintentionsrelatetoreasonsoffollowingqueryreformulations• Inconclusive;dominanceoffind-specific-informationasareason,andlackofunsuccessfulintentions,didnotallowmeaningfulanalysis
• Overallconclusion:People,duringthecourseofaninformationsearchsession,attempttodomorethanjust“makeabetterquery”;itseemsclearthatmanyofthereasonsforqueryreformulationwouldbebetterachievedthroughothermeans.
SearchIntentionsandSearchBehaviors
Giventhatpeopleattempttoaccomplishdifferentintentionsduringthecourseofaninformationsearchsession,canasystemidentifywhatthoseintentionsare,withoutintervention?• RQ1:Howisauser’sWebsearchbehaviorassociatedwithhisorherinformationseekingintentionsinthesamequerysegment• RQ2:Howisauser’sWebsearchbehaviorinthecurrentquerysegmentassociatedwithhisorherinformationseekingintentionsinthesubsequentquerysegment
Procedure,DataandMethods
• Procedureaspreviouslydescribed,butwithdatafor40participants• 80searchsessions,693querysegments• Observedsearchbehaviorstreatedasgroups• Twodifferentanalyses,usingtwoslightlydifferentbehaviorgroups
• Identifyingintentionsasabinaryclassificationproblem– logisticregression• Identifyingandpredictingintentionsthroughsignificantlydifferentcorrelationsofbehaviorswhenintentionispresent
ObservedBehaviors,perQuerySegment
• Saveditem(binary)• Numberofsaveditems• Dwelltimesoncontentpages• DwelltimesonSERPviewports• Querylength• Queryreformulationtype• Numberofclicks• Numberofsourcesvisited• Numberofpagesviewed
• Dwelltimesare:• totaldwelltime• totaldwelltimeuntilapageissaved
• totalopentime• totalopentimeuntilapageissaved
• firstdwelltime• meanofalldwelltimes
BehavioralGroupsforBinaryClassification(Testedsinglyandincombinations)
• Savingfeatures• Saveditem(binary)• Numberofsaveditems
• Contentpagefeatures• Dwelltimes• Numberofcontentpages,bytypes:saved,notsaved,unsaved,total
• SERP(i.e.viewportonSERP)features• Dwelltimes
• Queryfeatures• Querylength• Queryreformulationtype
MeasuringPerformance
• MeasuresTP=TruePositive;FP=FalsePositive;TN=TrueNegative;FN=FalseNegative
• Accuracy:ACC= TP+TN /TP+TN+FP+FN
• Precisionforintentionpresent:P1 =TP/TP+FP
• Precisionforintentionabsent:P0 =TN/TN+FN
• Baselines• Stratifiedsamplingofpositive/negativelabelsproportionaltotheirdistributionintrainingdata
• Assigningthemostfrequentlabelinthetrainingdata
• TestsforIdentification• Improvementoverthebetterofthetwobaselines,Kolmogorov-Smirnoff
ResultsforIdentificationbyClassification(1)
ResultsforIdentificationbyClassification(2)• Accuracy
• Significant(p<.01)butnotlargeimprovementinACCoverbetterbaselineforallintentionsbutone.Formostintentions,usingallfeaturegroupswasbest
• Precisionpresent• Significant(p<.01)andmeaningfulimprovementinPpres forallintentions;Formostintentions,one,oracombinationoftwofeaturegroupsperformedbest,ratherthancombiningall.
• Precisionabsent• Slightimprovements,mostnon-significant,overbestbaseline.Scoreswereuniformlyfairlyhighforbothbaseline
ClassificationDiscussion
• Doingbetterthanrandomwithaverysimpleclassifierfortwooutofthreemeasures• DoingverywellinPositiveidentification,likelybecauseit’saconservativealgorithm• Identifyingfewerintentions,withmorecertainty,isprobablyawingiventheproblem
• Negativeidentificationmaybeuninteresting,giventheproblem• Interestingstartontheproblem;nextstepsare:
• Moreanddifferentfeatures• Prediction,ratherthanjustidentification
BehavioralGroupsforPrediction
• Overallsearchbehavior• Querylength• Numberofsourcesvisited• Numberofpagesviewed
• Dwelltimefeatures• MeandwelltimeoneachSERPviewport
• Meandwelltimeoncontentpages
• Usefulnessjudgment• Saveditem(binary)• Numberofsaveditems
MeasuringStrengthofRelationship
• RQ1:Howisauser’sWebsearchbehaviorassociatedwithhisorherinformationseekingintentionsinthesamequerysegment• Meanvalueofeachsearchbehaviorforallquerysegments• Meanvalueofeachsearchbehaviorforquerysegmentwithgivenintention• Degreeofdifferencebetweenthetwoindicatesstrengthofrelationship
• RQ2:Howisauser’sWebsearchbehaviorinthecurrentquerysegmentassociatedwithhisorherinformationseekingintentionsinthesubsequentquerysegment• Meanvalueofeachsearchbehaviorforallquerysegments• Meanvalueofeachsearchbehaviorforquerysegmentprecedingquerysegmentwithgivenintention
• Degreeofdifferencebetweenthetwoindicatesstrengthofrelationship
Methods
• Correlationanalysisforeachbehavior-intentionpair• Doneforallcurrent,andallsubsequent,pairs• Behaviorsdistributednon-normally• Mann-Whitneytestsforsignificantdifferences
ResultsforIdentificationandPredictionbyDeviationfromMean• Ingeneral,differentbehaviors,andpatternsofbehaviors,areassociatedwithdifferentintentionsinthecurrentquerysegment.Manysignificantsuchassociations• Ingeneral,differentbehaviors,andpatternsofbehaviors,inthecurrentquerysegmentareassociatedwithdifferentintentionsinthesubsequentquerysegment.Fewersignificantsuchassociationsthanforcurrentintention,butstillsomeforalmostallsubsequentintentions• Nexttwoslidesshowtheseresultsfor(1)identificationand(2)prediction.Blackissignificantlyabovethemean;greyissignificantlybelowthemean
NextSteps
• Addanalysisofeyefixationbehaviorstotheidentificationandpredictionmodels• BasedondissertationworkbyMichaelCole,andrelatedtoresultsreportedinCole,M.J.,Hendahewa,C.,Belkin,N.J.&Shah,C.(2015)Useractivitypatternsduringinformation search.ACMTransactionsonInformationSystems,33(1):ArticleNo.1(39p.)
• Carryoutanalyseswithrespecttotasktypesandfacetvalues• Substantialevidencethattasktypeinfluencessearchbehaviorssignificantly• Strongsuspicionthattasktypeinfluencespatternsofintentions
• Carryoutinsitustudyofsearchbehaviorsandsearchintentions• Thirty“professional”participants,searchesloggedandannotatedbyintentions,foroneweek.
ThanksforYourAttention
• AcknowledgementsduetoallofthemembersofthePOoDLE andCHEWS-IIRprojects,andtoourfunders• WorkreportedherewassupportedthroughtheNationalScienceFoundation,grant#IIS-1423239.• WorkreportedherewassupportedbyaGoogleFacultyResearchAwardtoN.J.Belkin&C.Shah• SomeworkreportedherewassupportedbyIMLSgrantLG#06-07-0105-07