4
Information Diffusion in Online Social Networks Keywords: Social networks (online), diffusion, virality, communities, topology, polarization, fake-news, patterns, Twitter. Context: Understanding the diffusion mechanisms of speeches, opinions, fake news, and rumors in online social networks is a crucial societal issue. Virality and bots detection have been extensively studied for major political events such as the American presidential elections in 2016 [Kollanyi 2016], the Brexit [Howard 2016] or the French presidential election in 2017 [Ferrara 2017]. Understanding these mechanisms raises several questions: - What is the impact of the network topology in the diffusion of a viral message (and its importance with regard to the content of the messages)? - What types of relationships in the network are most likely to increase diffusion? - How do communities, and especially polarized communities, are engaged in the diffusion process? - How do influential or opinion leaders impact the diffusion? - What is the role of bots, their behavior? - Which typical patterns, structures or processes take part in or govern, control, affect large-scale diffusion? Subject: To answer these questions, several algorithms have been developed and validated on specific datasets. For examples to detect communities [Drif 2014, Orman 2012], influential users [Riquelme 2016, Ibnoulouafi 2018], bots [Ferrara 2016], events [Atefeh 2015], viral messages, rumours [Sela 2017, Zubiaga 2018, Hoang 2011], etc. In this work we plan to address two mains issues: - The first one is related to data modeling. We need to study and develop data models [Leclercq 2018] based on a multi-layer network approach [DeDomenico 2013, Kivela 2014]. Indeed, it allows representing the various types of users and relationships and to give them an appropriate semantic. For example, Twitter, through the richness of relationships produced by the various types of operators (follow, retweet, mention, etc.) generates monolayer networks with a hidden semantic. It is therefore a good field of experimentation in order to investigate the efficiency of the multilayer models. Model needs to be validated on various real-world big datasets (from 50Go to 5To). - The second issue concerns the design of the multilayer network analyses in order to answer the questions raised by social science researchers. Results from different inter-related algorithms need to be combined in order to provide end-users (social scientist) with the most complete view of a phenomenon. For example, the study of influence cannot be dissociated from the notion of community [Weng 2013, Kumar 2018, Gupta 2016, Gupta 2015]. Indeed, viral messages can spread differently within a community, than outside of it They can impact and modify the global community structure. The link between the two issues is the computability of such algorithms on real data, algebraic data structure such as tensor and graph embedding techniques [Hongyun 2017, Goyal 2017] are promising researches. From an experimental point of view, you will have to study and implement model extraction algorithms using data sets already collected by the team that have been used to:

Information Diffusion in Online Social Networkssubjectdirs.pdf · Information Diffusion in Online Social Networks Keywords: Social networks (online), diffusion, virality, communities,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information Diffusion in Online Social Networkssubjectdirs.pdf · Information Diffusion in Online Social Networks Keywords: Social networks (online), diffusion, virality, communities,

InformationDiffusioninOnlineSocialNetworks

Keywords:Socialnetworks(online),diffusion,virality,communities,topology,polarization,fake-news,patterns,Twitter.Context:

Understanding thediffusionmechanismsof speeches, opinions, fakenews, and rumors inonline socialnetworks isa crucial societal issue.Viralityandbotsdetectionhavebeenextensively studied formajorpoliticalevents suchas theAmericanpresidentialelections in2016 [Kollanyi2016], theBrexit [Howard2016]ortheFrenchpresidentialelectionin2017[Ferrara2017].

Understandingthesemechanismsraisesseveralquestions:-Whatistheimpactofthenetworktopologyinthediffusionofaviralmessage(anditsimportancewithregardtothecontentofthemessages)?-Whattypesofrelationshipsinthenetworkaremostlikelytoincreasediffusion?-Howdocommunities,andespeciallypolarizedcommunities,areengagedinthediffusionprocess?-Howdoinfluentialoropinionleadersimpactthediffusion?-Whatistheroleofbots,theirbehavior?- Which typical patterns, structures or processes take part in or govern, control, affect large-scalediffusion?Subject:Toanswer thesequestions, severalalgorithmshavebeendevelopedandvalidatedonspecificdatasets.For examples to detect communities [Drif 2014, Orman 2012], influential users [Riquelme 2016,Ibnoulouafi2018],bots[Ferrara2016],events[Atefeh2015],viralmessages,rumours[Sela2017,Zubiaga2018,Hoang2011],etc.

Inthisworkweplantoaddresstwomainsissues:

-The firstone is related todatamodeling.Weneedtostudyanddevelopdatamodels [Leclercq2018]basedonamulti-layernetworkapproach[DeDomenico2013,Kivela2014].Indeed,itallowsrepresentingthe various types of users and relationships and to give them an appropriate semantic. For example,Twitter,throughtherichnessofrelationshipsproducedbythevarioustypesofoperators(follow,retweet,mention, etc.) generates monolayer networks with a hidden semantic. It is therefore a good field ofexperimentation in order to investigate the efficiency of the multilayer models. Model needs to bevalidatedonvariousreal-worldbigdatasets(from50Goto5To).

- The second issue concerns the design of the multilayer network analyses in order to answer thequestionsraisedbysocialscienceresearchers.Resultsfromdifferentinter-relatedalgorithmsneedtobecombinedinordertoprovideend-users(socialscientist)withthemostcompleteviewofaphenomenon.For example, the studyof influence cannotbedissociated from thenotionof community [Weng2013,Kumar2018,Gupta2016,Gupta2015].Indeed,viralmessagescanspreaddifferentlywithinacommunity,thanoutsideofitTheycanimpactandmodifytheglobalcommunitystructure.Thelinkbetweenthetwoissues is thecomputabilityof suchalgorithmson realdata,algebraicdata structure suchas tensorandgraphembeddingtechniques[Hongyun2017,Goyal2017]arepromisingresearches.

Fromanexperimentalpointofview,youwillhavetostudyandimplementmodelextractionalgorithmsusingdatasetsalreadycollectedbytheteamthathavebeenusedto:

Page 2: Information Diffusion in Online Social Networkssubjectdirs.pdf · Information Diffusion in Online Social Networks Keywords: Social networks (online), diffusion, virality, communities,

1°)measure the audience/impact of a topic or event [Atefeh 2015]. One point to be addressed is theviralityofthediscourseandtheimpactofbotsinthepropagationofthediscourse[Ferrara2016];2°) show the existence of communities in which the discourse circulates. These communitiesmust becharacterized[Basaille2018,Jebabli2014,Jebabli2015]tohighlighttheirspecificity.Butcommunitiesareintertwined,intersectandtheirmutualinfluenceonthecirculationofdiscoursemustalsobestudied;3°)studytheinfluenceofonecommunitytowardsanotherandthepeoplewhomakethelinks(elasticityof borders) and consequently review the notion of influencers [Azaza 2016, Jebabli 2015a], of opinionleadersaccordingtothecirculationofinformation.

Basedon the experimental results,weplan todevelop appropriatediffusionmodels and test themonnewdatasetsusingtheirpredictiveaspecttovalidatetheirexplanatoryaspect[Jebabli2018].

This thesis focusesonthe fundamentalaspectsofdatamodelandanalysis tools forcomplexnetworks.Fromatheoreticalpointofview,oneoftheissuesaddressedinthisworkdealswiththeadaptationandcombination of traditional monolayer algorithms to multi-layer networks and the extension of graphembeddingtechniques,theproofofthepropertiesoftheproposedalgorithmsshouldbeaddressedsuchasconvergence,qualitymeasurementsinrelationtogroundtruth,etc.

From an experimental point of view, the real-data sets definition and interpretation is based well-establishedinstitutionalcollaborations(since2013)withtwolaboratoriesoftheUniversityofBurgundyinHuman and Social Sciences (TIL and CIMEOS) throughmultiple interdisciplinary projects TEE 2014, TEP2017,PEPSCNRSMOMIS,iSiteCOCKTAIL.

QualificationsThe candidate should be able to complete high quality and innovative research. He/She will developresearchwiththegoalofboth:(i)addressingfundamentaltheoreticalproblems,and(ii)designingmodelsandalgorithmsthatcanbeusedtounderstanddiffusionprocessesinmultilayernetworks.

Theidealcandidatehasgoodknowledgeinappliedmathematics, linearalgebra,statisticsandcomputersciences (algorithm). He/she holds aMaster’s degree (or equivalent) in the field of computer science,appliedmathematicsElectricalEngineering,physicsorarelateddisciplineobtainedwithverygoodfinalgrade(withanaveragegradeofBorbetter).

• She/he has experience (or interest) in modeling, quantitative empirical data analysis, andpreferablyworkingwithlarge-scalenetworks.

• Goodcommandofprogramminglanguagesanddatabasesaccessandstorage.• Experience(orinterest)indevelopingefficienttechniquesforlarge-scaledataanalysis.• VerygoodcommandofEnglish(oralandwritten)andexcellentcommunicationskills.• Curiosity,self-reliance,integrity,andcreativity.

DataScienceTeamofthe“Laboratoired’informatiquedeBourgogne”

ThesisDirector:HocineCherifiCo-supervisor:EricLeclercqandMarinetteSavonnetThe Data science team has a strong emphasis on quantitative yet applied research. It has strongcompetenciesinDataManagement,SocialNetworks,DataMining,andNetworkscience.Itoffers:

• TheopportunitytocompleteaPhDintheareaofNetworkScienceandBigDatausing complexsystemstoolsandthemodelingofsocio-economicinteractions.

Page 3: Information Diffusion in Online Social Networkssubjectdirs.pdf · Information Diffusion in Online Social Networks Keywords: Social networks (online), diffusion, virality, communities,

• Abroad-range,independentworkaspartofadynamicteaminapositiveworkingatmosphere.• Athoroughcareerdevelopmentprogram(participationinsummerschools,conferences,etc.).

Salary

Thedoctoralcontract isthemainformofsupportthatcanbeawardedtoPhDstudents. It isofferedtodoctoralstudents,foraperiodofthreeyears. Itprovidesallthesocialguaranteesofaworkcontract inaccordancewithFrenchpubliclaw.Thedoctoralcontractsetsaminimumremuneration,indexedontheevolution of the remuneration of the public service: It amounts to 1758 euros gross monthly for aresearchactivityalone.Itcanbehigherwithteachingactivities.

Application

Tobeconsidered,applicationsmustbesentbyemail,enclosingthefollowing:• ACurriculumVitae• Transcriptsforhigher-educationstudiesuntilmostrecentavailable• Acoverletterstatingthepurposeoftheapplicationandabriefstatementofwhyyoubelievethat

yourbackgroundandgoalsarewell-matchedwiththegoalsofthisposition• Contactinformationforthreereferencepersons.

Listofqualificationsandotherdocumentsthattheapplicantwishestorefertoshouldbeenclosedwiththeapplication.ApplicationDeadlineJune04,2019Addressyourcorrespondencewithsubject“ApplicationThesisDIRS”toallthefollowingcontacts:Contacts:[email protected][email protected][email protected]’InformatiquedeBourgogne(LIB)–EA7534ÉquipeSciencedeDonnéesUniversitédeBourgogne9,AvenueAlainSavary21078Dijon–FranceBibliography(inboldpublicationsofthesupervisingteammembers):[Atefeh2015]Atefeh,Farzindar,andWaelKhreich."Asurveyoftechniquesforeventdetectionintwitter."ComputationalIntelligence,31.1(2015):132-164.[Azaza2016]Azaza,Lobna,Kirgizov,Sergey,Savonnet,Marinette,etal.Informationfusion-basedapproachforstudyinginfluenceontwitterusingbelieftheory.ComputationalSocialNetworks,2016,vol.3,no1,p.5-25.[Basaille2018]Basaille Ian,Plateformepour la gestiondesdonnées issuesdes réseaux sociauxdans lecadredelagestiondelarelationclient,Thèsededoctorat,UniversitédeBourgogne,2018.[Jebabli 2018] Jebabli M., Cherifi H., Cherifi C., Hammouda A., “Community detection algorithmevaluation with ground-truth data, Physica A: Statistical Mechanics and its Applications 492, 651-706Elsevier2018[Jebabli2015]JebabliM.,CherifiH.,CherifiC.,HammoudaA.,"Overlappingcommunitydetectionversusground-truth in AMAZON co-purchasing network" in 11th International Conference on Signal Image

Page 4: Information Diffusion in Online Social Networkssubjectdirs.pdf · Information Diffusion in Online Social Networks Keywords: Social networks (online), diffusion, virality, communities,

Technology&Internet-BasedSystems,ProceedingsofIEEE2015[Jebabli2015a]JebabliM.,CherifiH.,CherifiC.,HammoudaA.,“UserandgroupnetworksonYouTube:Acomparativeanalysis, in12th InternationalConferenceonComputerSystemsandApplications (AICCSA),ProceedingsofIEEE2015[Jebabli2014]JebabliM.,CherifiH.,HammoudaA.,"OverlappingCommunityStructure inCo-authorshipNetworks:ACaseStudy," in7th InternationalConferenceonu-ande-Service, ScienceandTechnology(UNESST),ProceedingsofIEEEpp.26,29,2014[Leclercq2018]Leclercq,Eric,Savonnet,Marinette,"Modèletensorielpourl’entreposageetl’analysedesréseauxsociaux–Applicationàl’étudedelaviralitésurTwitter",INFORSID2018.[DeDomenico 2013] De Domenico, Manlio, et al. "Mathematical formulation of multilayer networks."PhysicalReviewX3.4(2013):041022.[Drif2014]Drif,Ahlem,andAbdallahBoukerram."Taxonomyandsurveyofcommunitydiscoverymethodsincomplexnetworks."InternationalJournalofComputerScienceandEngineeringSurvey5.4(2014):1.[Ferrara2016]Ferrara,Emilio,etal."Theriseofsocialbots."CommunicationsoftheACM59.7(2016):96-104.[Ferrara2017]Ferrara,Emilio."Disinformationandsocialbotoperationsintherunuptothe2017Frenchpresidentialelection."(2017).[Goyal2017]PGoyal,EFerrara,"Graphembeddingtechniques,applications,andperformance:Asurvey",arXivpreprintarXiv:1705.02801,(2017).[Gupta 2016] Gupta N., Singh A., Cherifi H. , “Centrality measures for networks with communitystructure”,PhysicaA:StatisticalMechanicsanditsApplications452,46-59,Elsevier2016[Gupta 2015] Gupta N., Singh A., Cherifi H., “Community-based Immunization Strategies for EpidemicControl” in International Conference on Communication Systems and Networks, Proceedings of IEEE,2015[Hoang2011]Hoang,Tuan-Anh,etal."Onmodelingviralityoftwittercontent."InternationalConferenceonAsianDigitalLibraries.Springer,Berlin,Heidelberg,2011.[Hongyun2017]Cai,Hongyun,VincentW.Zheng,andKevinChen-ChuanChang."AComprehensiveSurveyofGraphEmbedding:Problems,TechniquesandApplications."arXivpreprintarXiv:1709.07604(2017).[Howard 2016] Howard, Philip N., and Bence Kollanyi. "Bots,# strongerin, and# brexit: Computationalpropagandaduringtheuk-eureferendum."BrowserDownloadThisPaper(2016).[Ibnoulouafi2018] IbnoulouafiA.ElHassouniM.,Cherifi,“M-Centrality: Identifyingkeynodesbasedonglobal position and local degree variation” In revision for Journal of StatisticalMechanics: Theory andExperiment[Kivela2014]Kivelä,Mikko,etal."Multilayernetworks."Journalofcomplexnetworks2.3(2014):203-271.[Kollanyi 2016] Kollanyi, Bence, Philip N. Howard, and Samuel C. Woolley. "Bots and automation overTwitterduringthefirstUSPresidentialdebate."Compropdatamemo1(2016):1-4.[Kumar2018]KumarM.SinghA.,CherifiH.,“AnefficientImmunizationStrategyusingOverlappingNodesand its neighborhoods” to appear in the proceedings of the Web Conference 2018 (WWW18), Lyon,France[Orman 2012]OrmanG.K, Labatut V., and Cherifi H., “Comparative evaluation of community detectionalgorithms: a topological approach”, Journal of Statistical Mechanics: Theory and Experiment, P08001,august2012.[Riquelme2016]Riquelme,Fabián,andPabloGonzález-Cantergiani."MeasuringuserinfluenceonTwitter:Asurvey."InformationProcessing&Management52.5(2016):949-975.[Sela 2017] Sela, Alon, et al. "Increasing the Flowof Rumors in SocialNetworks by SpreadingGroups."arXivpreprintarXiv:1704.02095(2017).[Varol2018]Varol,Onur.AnalyzingSocialBigDatatoStudyOnlineDiscourseandItsManipulation.Diss.IndianaUniversity,2017.[Weng 2013] Weng, Lilian, Filippo Menczer, and Yong-Yeol Ahn. "Virality prediction and communitystructureinsocialnetworks."Scientificreports3(2013):2522.[Zubiaga2018]Zubiaga,Arkaitz,etal."DetectionandResolutionofRumours inSocialMedia:ASurvey."ACMComputingSurveys(CSUR)51.2(2018):32.