A Tale of Two Tools: Reliability and Feasibility of Examining Twitter Mentions about E-Cigarettes...

Preview:

Citation preview

A Tale of Two Tools: Reliability and Feasibility of Examining Twitter MentionsPresentation at Society for Behavioral Medicine 2016

Amelia Burke-Garcia, MACassandra Stanton, PhDNicole Soufi

Westat Center for Digital Strategy & Research

April 1, 2016

“47.6% of current cigarette smokers & 55.4% of recent former cigarette smokers have tried an e-cigarette.”

~CDC, 2015

“Various terms are used to refer to e-cigarettes, e.g. “e-

hookahs” and “vaporizers”.

~New York Times, 2014

“The failure to equate vaping products generally

with e-cigarettes underscores how successful

the tobacco industry has been in reinventing a

popular “smoking” trend.” ~Gostin & Glasner, 2014

“Understanding…human interaction on the web is a valuable source of sensing health trends.”― Achrekar, Gandhe, Lazarus, Yu and Liu, 2011

7

“89% of all Americans are online.”― International Telecommunication Union (ITU), United Nations Population Division, Internet & Mobile Association of India (IAMAI), World Bank, 2016

“With hundreds of millions of people spending countless hours

on social media to share, communicate, connect, interact,

and create user-generated data at an unprecedented rate, social media has become one unique

source of big data.” ~Zafarani, Abbasi, & Liu, 2014

Options for Mining Social Data

“Social media data is noisy, free-format, of

varying length, and multimedia.”

~Zafarani, Abbasi & Liu, 2014

More Issues• There is a lack of documentation about how the

data is identified and sampled (Morstatter et al., 2013; Valkanas et al., 2014).

• Twitter’s free sample provides less representative data (Morstatter et al., 2013; Valkanas et al., 2014).– This may hold true for samples drawn from other data

mining tools. • Data come with accessing, storing & analyzing

costs (Morstatter et al., 2013; Valkanas et al., 2014).

Research Question

How does Twitter coverage of e-cigarette-related conversations differ

by data source (e.g. Radian6 vs. GNIP)?

Methods• Compared tweets from two tools:– Twitter’s GNIP “Firehose” service – Saleforce’s Radian6 tool

• Key words included: – “e-cigarettes OR vaping” OR “e-cigarettes

health” OR “vaping health”• A total of 1000 mentions were collected– 500 mentions were collected from each tool

over a 30 second period of time (12:57pm EST on August 7, 2015)

Methods• Six measures were proposed to be

used in this analysis:– Tools • Cost, Feasibility & Ease of Use

– Themes• Poster (individual/organization)• Context (12 themes, combined to 9)• Valence (positive/negative)

– Interrater reliability was 94%

FINDINGS

Tool ComparisonRadian6 GNIP

Cost Tiered pricingCost based on number of

mentions

Tiered pricing based on sources and amount of

content

Ease of UseOffers a visual dashboard Easy to pull content and

analyze it

Requires storage capacity to store data

Requires programming knowledge to access the

dataRequires computing power

to analyze the data

Feasibility ?? ???? ??

Poster Type

Radian6 GNIP

Individual 55% 50%

Marketing/Promotion 44% 50%

Non-profit/Gov’t 1% 0%

ThemesRadian6 GNIP

Health/Consequence 6% 3%Cessation 1% 0%

Prod Characteristics 12% 21%Marketing/Sales 18% 23%

Consumer Purchases 1% 1%Utilization Patterns 12% 4%

Policy 4% 4%Endorsement 29% 24%

Other 17% 16%

Valence

Radian6 GNIP

Positive 6% 6%

Negative 4% 5%

Neutral 90% 88%

Word CloudsRadian6 GNIP

Feasibility• Across most measures, these tools delivered

similar results.– Specifically, both demonstrated the overwhelming

presence marketing content and individual conversations about e-cigarettes.

• A key difference was in the level of sales and marketing content that GNIP pulled.

• Based on this analysis, either tool may be a viable option for researchers seeking to analyze Twitter data.– Radian6 may be a better option from a cost and ease-

of-use standpoint.

Conclusions• Researchers seeking to understand social media

conversations have a number of options for data mining.

• Given similarity in content collected across both tools, cost and ease-of-use should be primary considerations when selecting a data mining tool. – GNIP offers quality data (and is well-referenced in

literature) but requires resources to work with its data.– Radian6 provides an alternative when resources and

computing power are limited.

Conclusions• In terms of content,

results demonstrated a gap in conversations around health consequences of vaping.

• Moreover, this study revealed that industry and marketing are using this medium exceedingly more than the public health community.

~500 e-cigarette marketing

tweets in 30 seconds.

Future Directions• Analyze these data in greater detail,

e.g. which flavors and which brands.• Compare data collected using other

tools.• Examine other forms of tobacco use (e.g., hookah, cigars, snus).• Further examine characteristics of

the posters.

Amelia Burke-Garciaameliaburke-

garcia@westat.com

Cassandra A. Stantoncassandrastanton@westat.com

Nicole Soufinicolesoufi@westat.com

Thank you!

Recommended