Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
User 1
User 2
User 3
Individual Models:
• CSM Built For every user
References
User 1 User 2 User 3
Aggregated Models:
• All users data of a level is aggregated together
• CSM built for each level
Detecting and Combating
Social Media Influencers
Agent Based Model
Data
Anthony Weishampel
Department of Statistics
William Rand
Poole College of Management
This project has the following three objectives:
• Develop methods to assist in the identification of influencers and bots with malicious
intent, through analyzing the user’s Twitter behavior.
• Two approaches: Causal State Models (CSM) & Functional Data Analysis (FDA)
• Explore ways to prevent or reduce the power of negative influence campaigns.
• Construct the framework for a platform to simulate and test out theories of influence
and influence combat through using Agent Based Models
Data Sources & User Types:
• Labeled data provided on various types of users
Cresci, Stefano et.al. “The Paradigm-shift of social spambots” Proceedings of the 26th
International Conference on World Wide Web Companion, 2017
• Genuine Users vs E-commerce Bots (advertisements for mobile apps, amazon
products)
•Also have data and analyzed: Spambots, Political Influencers
• For each user needed to have the users Twitter Timeline (tweet history up to 3200
tweets)
• Social Media provides a popular platform for marketers and organizations to diffuse
content.
• The potential fear of malicious influencers (e.g., bots, trolls, and extremists) spreading
falsehoods or attempting to radicalize the general public on social media has been
validated in recent years.
• Some of this influence is perpetrated not directly by humans but by social media bots,
which are artificial agents whose behavior is controlled by predetermined algorithms,
often in an attempt to imitate authentic users, and have a particular effect on social
media public discourse.
• These automated users have impacted modern political campaigns, and have attacking
brands, including Nestle, Harley-Davidson, and AMD.
Alessandro Bessi and Emilio Ferrara. Social bots distort the 2016 U.S. Presidential election online discussion. First Monday, 21(11). 2016.
David Darmon, Jared Sylvester, Michelle Girvan, and William Rand. Predictability of user behavior in social media: Bottom-up v. top-down modeling. In Social
Computing (SocialCom), 2013 International Conference on, pages 102107. IEEE, 2013.
Davis, Clayton Allen, et al. "Botornot: A system to evaluate social bots." Proceedings of the 25th International Conference Companion on World Wide Web.
International World Wide Web Conferences Steering Committee, 2016.
Jeff Goldsmith, Vadim Zipunnikov, and Jennifer Schrack. “Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis”. Biometrics, 71:
344-353. 2015.
Eric Johsnon, How social media bots could tank your stock price. Recode. https://www.recode.net/2018/7/16/17574512/
S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. The paradigm-shift of social spambots: Evidence theories and tools for the arms race. WWW '17
Proceedings of the 26th International Conference on World Wide Web Companion, 963-972, 2017
Figure 3: Example of how the CSMs are made for a level of an Aggregated model.
Time Series Data Structure:
•Data was converted into a time series structure (time intervals of 30 minutes)
Binary Time Series Data:
0 – User doesn’t tweet
1 – User tweets
Figure 1: Binary Time Series Data Structure
Objective
Background• Widely applicable to high-resolution individual-level data
• Harada, et al. 2015 showed that Causal State Models can then be
used to forecast a user’s activity
• CSMs found by the Causal State Splitting Reconstruction (CSSR)
algorithm (Shalizi and Shalizi, 2004)
• CSMs are minimally complex and maximally predictive for social
media (Darmon et al., 2013)
• Assume time series data generated by conditionally stationary
stochastic process
Classification:
• Distant Metric:
• Let be the probability of 0000 occurring in model a
• Distance between two models a and b
User Type Number of Users
Genuine Users 3,474
E-commerce Bots 464
Figure 2: Sample daily time series for 50 genuine users and 50 e-commerce bots
Sample of Genuine Users Sample of Bots
Figure 4: Example of how the CSMs are made for the Individual Models• Group Distance Metric: Create aggregated models for every
category, then classify the user by closest CSM
• Individual k-NN: Build a CSM for every user in the testing set, find
the k closest users’ CSMs
Bot Combat• Identified 72 active bots from the dataset
• Collected information on the American users who
have interacted with these bots.
• 307 users who have interacted with the bots
• We reached out to some of these users
informing them of the bot interaction.
• It is currently too early to determine whether
reaching out and informing the user about the
bot alters the users interaction with the bots.
CSM: Results Functional Data Analysis
•Functional Principal Component Analysis to extract
the functions and scores for various types of users
• Calculate scores for new users & classify user
based on scores from training set
• 𝜇𝑖𝑗 𝑡 is the behavior for user i, on day j, at time t.
• Can only work with users who have a time zone
listed
Figure 6: Number of States in Individual CSMs
Individual CSM
Bots Genuine Users
Classification Results Precision Recall
Aggregated Models 0.349 1.0
Individual Models 0.982 0.947
Figure 5: Aggregated CSMs for the two types of users
Detection Method: Causal State Models
• From the aggregate CSMs it is clear that
the bot’s behavior requires more states
and consequently more complicated than
the typical genuine user.
• Similar conclusions can be drawn from
the individual CSMs
Results & Discussion
Aggregate Model
• Individual Models proved to be much better at classification
• FDA model requires high computation time but proved to be the most
accurate classification method. FDA results were excluded because the model
has multiple restrictions
• Results form the Bot combat will be used to determine how agents in the
Agent Based Model will react. • Agents – Various Twitter Users
• Actions – Follow/Unfollow a user, Tweet,
Retweet or no Tweet
• Both the FDA and the CSMs can be used to
model the agents behavior
• The framework provided in the project can create
an ABM platform to explore both the effects of
influence campaigns on social media as well as
the results of counter-narrative campaigns