Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
When Communication Meets Computation: The Demands and Benefits of Interdisciplinary
Approach to Communication Research
Tai-Quan (Winson) Peng 1,2
1. WKW School of Communication & Information, NTU, Singapore2. Web Mining Lab, City University of Hong Kong, Hong Kong
Seven years ago Four years ago
What is communication?
Conceptually,1. A process2. Contextualized in
physical/social structures
Empirically,1. Abundant snapshot
studies2. Decontextualized from
structures
How to bridge the gap?
What is computation?
• Computation is any type of calculation that follows a well-defined model understood and expressed as, for example, an algorithm. The study of computation is paramount to the discipline of computer science (Wikipedia).
“Computation +”
An emerging field that leverages the capacity to collect and analyze data at a scale that may reveal patterns of individual and group behaviors (Lazer et al., 2009)
Thinking like a computer scientist means more than being able to program a computer. It requires thinking at multiple levels of abstraction. (Wing, 2006)
“Computation +” is not new!
When communication meets computation
• Just big data?• Hypes, Myths, Realities about Big Data (Zhu, 2013)
• DRIP (Data Rich, but Information Poor) syndrome
• The biggest impact of big data emerges from the tripartite combination of innovative statistical methods, novel computer science, AND original theories in a field of substantive application (King, 2016)
• Interdisciplinary approach is inevitable• Interdisciplinary perspectives
• Interdisciplinary collaboration
Interdisciplinary research is on the rise (Larivière & Gingras, 2014)
Geriatrics and Gerontology
Science Studies
Sociology
International Relations
Social Psychology
Behavioral Science and Complementary
Psychology
Jump on the bandwagon!
• Why?• Grant opportunities?
• Greater impact of your work?
• Appointment, promotion and tenure?
• The most powerful catalysts of interdisciplinary research are scientific questions which cannot be adequately addressed by just one discipline.
Interdisciplinary approach is viable
• Revisit old questions
• Measure established concepts with new devices (e.g., Qin & Peng, 2016)
• Revisit or refine classical theories with new data and new models (e.g., Sun et al., 2014)
• Ask new questions
• Observe concepts that have been rarely measured (e.g. Peng et al., 2016; Sun et al., 2016; Wang et al., 2016)
• Ask questions that have never or rarely been conceptually discussed or empirically examined (e.g., Peng et al., in press; Wang et al., in press)
Measurement: New devices, old concepts; New devices, “new” concepts
Public attention and its measurement
• Public attention and its relevant concepts (e.g., public interest, public concern) are key variables in political communication research and policy studies
• Opinion polls: Criticisms of opinion polls• Quality of responses solicited in public opinion polls
• Limited number of predetermined issues that respondents are forced to choose from
• Static nature of responses
• Theoretical concerns about construct validity and reliability
• Media coverage as a proxy?
Three milestone studies (2009, 2013, 2014)
Ginsberg et al., 2009, Nature
Butler, 2013, Nature
Lazer et al., 2014, Science
Exemplar research questions
• Health• “Prediction of dengue incidence using search query surveillance” (Althouse et al., 2011)
• Technology• “How do the trends of searching queries by users from peer-to-peer (P2P) networks change over
time? ” (Kwok, 2006)
• Social• “How to detect search queries regarding breaking news? ” (Murata, 2008)
• Economic• “Can TV commercials or sponsorships trigger Internet searches by consumers? ” (Zigmond & Stipp, 2010)
• “Can search queries predict consumers’ collective future behavior days or even weeks in advance?” (Goel, Hofman, Lahaie, Pennock, & Watts, 2010)
• “How to use search queries to predict trading volumes?” (Bordino et al., 2012)
• Research Methods• “How to measure public attention with queries? How to validate the results? ” (Ripberger, 2011)
Our concern: Are web queries a valid measure of public attention? (Qin & Peng, 2016)
• Validity needs to be empirically assessed• Face Validity
• Convergent validity (Campbell & Fiske, 1959)
• Predictive validity (Cronbach & Meehl, 1955)
• Where is the ground truth?• Gallup opinion poll on the Most Important Problem (MIP) (McCombs & Zhu,
1995)
Research design
• Research context: United States
• Time span: January 2008 – June 2013 (monthly)
• Study issues: Environment and energy
• Empirical Challenges• How to identify valid queries for each issue?
• How to retrieve search trends of queries from Google Trends?
A bottom-up way to achieve face validity
Eight seed queries on two issues based on Gallup MIP original codes (McCombs & Zhu, 1995)
693 correlated queries for eight seed queries produced by Google Correlate
55 target queries with 39 for environment issue and 16 for energy issue
Automatic Process
Manual Coding
A benchmark method to retrieve search trends• Briefly speaking, we use one irrelevant query (i.e., “IT Job” in the study) as
a benchmark query and submit the benchmark query as well as 55 target queries to Google Trends in pairs.
• To “control for artificial trends in substantive queries under study” (Zhu et al., 2012, p. 3), a weekly ratio is calculated for each target query by dividing the magnitude of the target query by that of the benchmark query, as shown in the equation below:
𝑅𝑞,𝑤 =𝑀𝑞,𝑤
𝑀𝑏,𝑤
Where Rq,w denotes the magnitude ratio for target query q at week w, Mq,wdenotes the search magnitude of target query q at week w, and Mb,w denotes the search magnitude for the benchmark query at week w.
Convergent & predictive validity assessment
• The public attention on environmental and energy issues measured by Google Trends, will be compared with that measured by the “most important problem” (MIP) question in Gallup opinion polls.
• Vector autoregressive (VAR) modeling approach, a macroeconometricframework introduced by Christopher Sims (1980), is adopted. • Granger-causality analysis and impulse response function to assess the
convergent validity
• forecast error variance decomposition analysis to assess the predictive validity
Conclusions and discussion
• Convergent and predictive validity are achieved on environment issue, not on the energy issue.
• Researchers should not unconditionally accept web queries as a new device to observe public attention on various issue domains. • Use web queries with caution by taking issue peculiarity and characteristics of
web queries into account (Mellon, 2011).
• Web queries data is a complementary to, rather than a substitute of, opinion polls in measuring public attention.
Measuring interactions, flow, and diffusion
• Time-stamped digital footprints provide an unambiguous recording of 5Ws’ elements in communication (Who-What-Whom-When-Where)• Social connection and interaction in special social groups (e.g., members of
congress, people living with HIV/AIDS) (Peng et al., in press; Wang et al., in press)
• Dynamic sentiments towards social issues (Peng et al., 2016)
• Information flow (Wang et al., 2016)
• Spatio-temporal diffusion of information (Sun et al., 2016)
Attention and communication networks between PLWHA in a Weibo group (Wang et al., in press)
Follower-followee network between congress members in U.S.A. (Peng et al., in press)
Politically polarized use of Twitter by the 113th
congress members
Follow D I R
D 30.9% 0.2% 5.8%
I 0.2% 0.0% 0.1%
R 7.1% 0.1% 55.7%
Retweet D I R
D 35.3% 0.4% 5.9%
I 0.3% 0.0% 0.0%
R 4.6% 0.1% 53.4%
Mention D I R
D 32.6% 0.3% 11.4%
I 0.2% 0.0% 0.1%
R 10.9% 0.1% 44.5%
The assortative coefficients (Newman, 2002, 2003a) by partisanship for follower-followee, retweet, and mention networks are 0.71, 0.76, and 0.54.
Lead-lag relationship in congressional communication (Wang et al., 2016)• Which party would lead a discussion on an issue in congressional
communication? And how and when the other party would respond?
• How the lead-lag relationship would evolve with the unfolding of the discussion on an issue?
Spatio-temporal diffusion of information (Sun et al., 2016)• How and to what extent will the information on social media achieve widespread
diffusion across the world?
• How can we quantify the interaction between users from different geolocations in the diffusion process?
• How will the spatial patterns of information diffusion change over time?
Sentiment valence towards social issues (Peng et al., 2016)
Measurement fallacies of digital footprints
• Nobody is barefoot (Lewis, 2015)• Digital footprint = Natural behaviour?
• Expression and interactions are mediated by technologies
• Different beaches have different rules (Golder & Macy, 2014; Lewis, 2015)• “Permanent” friendship on Facebook vs. Dissolving friendship in real lives
• Expression = Experience? (Golder & Macy, 2014)
• ……
Dynamic Modelling:Issues coopetition on Twitter
Environment
Job
WelfareHealthLaw/Order
Foreign Affairs
EconomyGovernment-
Politics
31
Issues coopetition: a portmanteau of cooperation and competition• What matters with the dynamics of issues coopetition?
• How to quantify the coopetition power of social issues?
• How to visualize the dynamics of issue coopetition and the roles of different groups of individuals?
32
Issues coopetition: A simple scenario
33
• Five possible directions of attention flow:• A proportion of attention stayed in
issue i (carry-over)• A proportion of attention shifted from
issue j and k to i (competitive recruitment)
• A proportion of attention shifted from issue k to issues i and j due to similarity between issues i and j(cooperative recruitment)
• A proportion of attention shifted from issue i to issues j and k(competitive distraction)
• A proportion of attention shifted from issue i to issues j and k due to similarity between issues j and k(cooperative distraction)
𝑃𝑖𝑡−1
𝑃𝑗𝑡−1
𝑃𝑖𝑡
𝑃𝑗𝑡
𝑃𝑘𝑡−1 𝑃𝑘
𝑡
Who matters with the dynamics of issues coopetition?
Social Elites Grassroots on Social Media
• We are more susceptible to the localsocial structure composed of grassroots than global social environment
34
Issue publics and issue leaders (Krosnick, 1990)
• Information generalists vs. issue publics (Converse, 1964)
• A user is considered a member of k-issue users if 0.75/k or more of his/her tweets focus on each of the kissues. • 1-issue users: 75%+ of tweets on 1 issue• 2-issue users: 37.5%+ of their tweets on
each of 2 issues• ….
• 500 issue leaders are extracted from each type of issue publics based on their Klout scores, a popular measure of influence on social media.
Single-issue Publics, 15%
Multi-issue publics, 4%
No-focus publics, 81%
35
Competition recruitment effect: Cooperation recruitment effect:Competition distraction effect: Cooperation distraction effect:
Mathematical model of issues coopetition
Cooperation Effects
Competition Effects
36
Coopetition power of social issues
• The competition and cooperation power of a social issue is defined as the magnitude of how competitive/cooperative an issue is in recruiting attention from the public. • 2nd term and 4th term in the model
• 𝐶𝑜𝑜𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛𝑖 = 𝐶𝑜𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑖 − 𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑖𝑜𝑛𝑖
• Squared semi-partial correlation (sr2)• Additive, normalized, and comparable
• Pair-wise examination of issues coopetition
• Comparison of different groups of issue leaders on coopetition power
37
Data
• A keyword-based approach to retrieve tweets• 3,300+ keywords to retrieve 641 million tweets posted in 2013
• Gallup MIP Google Correlate Manual Coding Twitter Firehose
• Support vector machine (SVM) classification method to clean up the irrelevant tweets• Manual Coding Model Training Classification
• Final dataset: 449 million tweets (70%) on 10 issues
38
Measurement
• Public attention on an issue i at time t (𝒑𝒊𝒕) is measured as the ratio
between the number of the tweets of issue i at time t and the number of all the tweets at time t.
• Issue similarity between issue i and j (𝜽𝒊,𝒋) is measured by the semantic similarity that is weighted by the temporal correlation between two issues.
• The coverage of issue i by issue leaders g at time t (𝒎𝒊,𝒈𝒕 ) is measured
as the ratio between the number of tweets of issue i by the total number of tweets posted by issue leader g at time t.
39
Model estimation and evaluation
• Vector auto-regression (VAR) is adopted in the study to estimate the model.• 4-hour interval to aggregate the data
• A moving window estimation wherein the size of each time window spanned two weeks and include 84 time points
• Evaluation by three measures: • the overall goodness of fit (R2) of the regression model,
• the standard error of the estimates (𝑠𝑒 𝑌), and
• the presence of autocorrelation (Durbin-Watson d)
40
EvoRiver: Visualizing dynamics of issues coopetition
41
EvoRiver: Visualizing roles of issue leaders
42
Network modelling:Social connection, communication, and vote agreement
Generative mechanisms underlying follower-followee and communication networks on Twitter
• Homophily Attributes• Same Party• Shared concern• Same state• Same chamber• Same committee
• The power of homophily mechanism will be overestimated when other endogenous networking mechanisms are not taken into account (Kossinets & Watts, 2009; Wimmer & Lewis, 2010). • Reciprocity• Triadic closure
Network multiplexity
• Congress members are linked with one another through multiple social ties, namely, • Social connection (e.g., follower-followee relationships on Twitter),
• Communication (e.g., retweet and mention on Twitter),
• Behavioral interactions (e.g., co-sponsorship or co-voting of legislative bills)
• The structures of these networks can influence each other “as networks of one type may act as a constraint, an inhibitor, or a catalyst on networks of another type of relations” (Szell et al., 2010, p. 13636).
Data source: Twitter and Offline Records (Peng et al., in press)• 165,000+ tweets posted by 527
members of 113th congress from April 2013 to November 2013
• Follower-followee network• directed and unweighted
• Retweet and mention networks• directed and weighted
• Shared concern: co-hashtags
• Individual and Homophily Attributes• Partisanship (I & H)• Chamber affiliation (I & H)• State (I & H)• Committee Membership (H)• Committee Chairship (I)• Seniority (I)
• Roll-call vote data (H.J. Res. 59 and H.R. 2775)• Voted by both chambers of the congress• Voted at the end of the study period• Voting Matrix Vote Agreement Matrix
Exponential random graph modeLling (ERGM)
• Four dependent variables are probabilities • for two congress members to form a tie in the follower-followee network,
• for a congress member to retweet or be retweeted by another congress member in the retweet network,
• for a congress member to mention or be mentioned by another congress member in the mention network, and
• for two congress members to make the same voting decision on a legislative bill.
𝒫 𝑌, 𝜃 =exp 𝜃′Γ(𝑌
𝑌∗∈𝑌 exp 𝜃′Γ(𝑌∗
ERGM of generative mechanismsThe magnitudes of homophily attributes (i.e., same-party, same-state, same-chamber, and shared concern) attenuate from 13% to 65% in Model 2.
The influential mechanism of follower-followee network in the retweet network is different from that in the mention network.
Different combinations of endogenous balancing mechanisms.
ERGM of vote agreement
Conclusions
• Bounded homophily effects and different balancing mechanisms in connection and communication Networks
• Follower-followee relationships on Twitter can facilitate political discourse and information exchange among congress members
• These online ties can likewise increase the likelihood for the votes of these congress members to agree with one another by breeding familiarity and trust between these officials (Carpenter et al., 2004; Krackhardt, 1992)
(Biased) Concluding Remarks
We need to sing our old songs!
Design
Measurement
Modeling and Analysis
Presentation
Interdisciplinary research: A joy or a curse?
• Become 'T-shaped' researchers (Hansen & von Oetinger, 2001)• Able to cultivate both their own discipline, and to look beyond it.
• Breadth and depth of knowledge
• Set up shared goals• Link theories to practice
• Engage in constructive discussions• Find a standard language for communication (e.g., conceptual diagrams and
formulas)
• Develop realistic expectations from each other
Unrealistic expectations in interdisciplinary research• Over-estimation of others
• Data can be easily accessed and retrieved?
• Everything can be automatically implemented?
• Social patterns can emerge from big data?
• Under-estimation of ourselves• Sophisticated sampling designs!
• Human intervention is necessary! Rigorous content analytical procedure.
• Our training on statistical modelling is solid
• Theoretical interpretation of analytical results!
Personal risks: Who are you?
• Disciplinary identity
• Research independence
When communication meets computation
• True interdisciplinary science cannot be rushed (see Nature 525, 289-290; 2015)• Time cost is really high!
• Reciprocal communication and mutual understanding are crucial• Keep your mind focused!
• Keep your eyes open!
• Interdisciplinary research is a painful but rewarding journey• Enjoy it!
Thanks!Email: [email protected]
Join our new IG on Computational Methods at ICA!• 50+ founding members from 13 countries/regions
• 1st business meeting at Fukuoka Hilton, Sakura on June 12, 11:00am to 12:15pm