17
How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter Han-Teng Liao http://people.oii.ox.ac.uk/hanteng/ @hanteng King-wa Fu https://sites.google.com/site/fukingwa/ kwfu@hku.hk Scott A. Hale http://www.scotthale.net/ @computermacgyve 1 July 2015 Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Embed Size (px)

Citation preview

Page 1: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

How much is said in a microblog?A multilingual inquiry based on Weibo and Twitter

Han-Teng Liaohttp://people.oii.ox.ac.uk/hanteng/

@hanteng

King-wa Fuhttps://sites.google.com/site/fukingwa/

[email protected]

Scott A. Halehttp://www.scotthale.net/

@computermacgyve

1 July 2015

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 2: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Background, Motivations

Length limits and user experience

Early length restrictions for technical reasons (e.g., 140 bytes for SMS)

Continuing enforcement is often more for user experiencelower time and thought required to create new contentand allow for faster and more timely exchange of information(Ebner & Schiefner, 2008; Java, Song, Finin, & Tseng, 2007)“creativity loves constraints and simplicity is at our core”(Twitter, 2013)

Most studies on length limits and user experience only consider English

Popular interest

Lots of informal/anecdotal work on how much can fit in a x-language tweet

BBC: 140 Chinese characters �70 to 80 English words

Atlantic: Japanese tweets could contain information that would take upto 260 English characters (Rosen, 2011; Summers, 2010)

Eonomist/Ai Weiwei: “In the Chinese language, 140 characters is anovella” (Economist, 2011)

Bloggers:

140 Chinese characters could contain five times more content than thesame number of English characters (Ruby, 2012)“140 Chinese characters is more like 500 characters on Twitter.com”(Dugan, 2011)

Practical/marketing interest

What is the “ideal” message length for strong engagement?

Ideal length is 100 characters (Lee, 2014)

Ideal length is 71–100 characters (Track Social, 2012)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 3: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Background, Motivations

Length limits and user experience

Early length restrictions for technical reasons (e.g., 140 bytes for SMS)

Continuing enforcement is often more for user experiencelower time and thought required to create new contentand allow for faster and more timely exchange of information(Ebner & Schiefner, 2008; Java et al., 2007)“creativity loves constraints and simplicity is at our core”(Twitter, 2013)

Most studies on length limits and user experience only consider English

Popular interest

Lots of informal/anecdotal work on how much can fit in a x-language tweet

BBC: 140 Chinese characters �70 to 80 English words

Atlantic: Japanese tweets could contain information that would take upto 260 English characters (Rosen, 2011; Summers, 2010)

Eonomist/Ai Weiwei: “In the Chinese language, 140 characters is anovella” (Economist, 2011)

Bloggers:

140 Chinese characters could contain five times more content than thesame number of English characters (Ruby, 2012)“140 Chinese characters is more like 500 characters on Twitter.com”(Dugan, 2011)

Practical/marketing interest

What is the “ideal” message length for strong engagement?

Ideal length is 100 characters (Lee, 2014)

Ideal length is 71–100 characters (Track Social, 2012)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 4: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Background, Motivations

Length limits and user experience

Early length restrictions for technical reasons (e.g., 140 bytes for SMS)

Continuing enforcement is often more for user experiencelower time and thought required to create new contentand allow for faster and more timely exchange of information(Ebner & Schiefner, 2008; Java et al., 2007)“creativity loves constraints and simplicity is at our core”(Twitter, 2013)

Most studies on length limits and user experience only consider English

Popular interest

Lots of informal/anecdotal work on how much can fit in a x-language tweet

BBC: 140 Chinese characters �70 to 80 English words

Atlantic: Japanese tweets could contain information that would take upto 260 English characters (Rosen, 2011; Summers, 2010)

Eonomist/Ai Weiwei: “In the Chinese language, 140 characters is anovella” (Economist, 2011)

Bloggers:

140 Chinese characters could contain five times more content than thesame number of English characters (Ruby, 2012)“140 Chinese characters is more like 500 characters on Twitter.com”(Dugan, 2011)

Practical/marketing interest

What is the “ideal” message length for strong engagement?

Ideal length is 100 characters (Lee, 2014)

Ideal length is 71–100 characters (Track Social, 2012)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 5: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Related work

Neubig and Duh (2013)

information-theoretic approach (entropy)

Chinese and Japanese are the most expressive languages per character

Not based on parallel corpora(i.e., not the exact same information in multiple languages)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 6: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Motivations

What are the effects of character limits on communication and how do theseeffects differ across languages?

Research questions

How do length/space restrictions affect. . .

. . . how much can be said. . .

. . . how much space is used. . .

. . . how much is actually said. . .

. . . across different languages?

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 7: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Data

(Multilingual) Parallel corpora

Universal Declaration of Human Rights (UDHR)

“World’s most translated text” (United Nations, 2007)semilegal in nature

Subtitles of TED talks

Crowd-sourced with professional oversightMore informal

Microblog data

Twitter and Sina Weibo postsfrom 54 news and embassy organizationsin English, Japanese, and Chinese

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 8: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Data

(Multilingual) Parallel corpora

Universal Declaration of Human Rights (UDHR)

“World’s most translated text” (United Nations, 2007)semilegal in nature

Subtitles of TED talks

Crowd-sourced with professional oversightMore informal

Microblog data

Twitter and Sina Weibo postsfrom 54 news and embassy organizationsin English, Japanese, and Chinese

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 9: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Information per character

eng jpn cmn_hant

cmn_hans

0

1

2

3

4

5

6UDHR

eng jpn cmn_hant

cmn_hans

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0TED Talks

Relative ratio of characters in English (eng), Japanese (jpn), and Chineseusing simplified characters (cmn hans) required to express the same contentcompared to Chinese using traditional characters (cmn hant) as the baseline

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 10: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Information per character

eng jpn cmn_hant

cmn_hans

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0TED Talks

ratio(jpn,cmn_hans)=1.30

ratio(eng,cmn_hans)=3.21

Relative ratio of characters in English (eng), Japanese (jpn), and Chineseusing simplified characters (cmn hans) required to express the same contentcompared to Chinese using traditional characters (cmn hant) as the baseline

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 11: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Character lengths of microblog posts

020406080

100120140

Ave

rage

Len

gth

platform = Twitter platform = Weibo

eng jpn cmn_hansLanguage

020406080

100120140

Ave

rage

Len

gth

eng jpn cmn_hansLanguage

type = embassy

type = news

Length of microblog posts in characters (excluding URLs) in English (eng),Japanese (jpn) and Simplified Chinese (cmn hans)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 12: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Relative information content of microblog posts

020406080

100120140 platform = Twitter platform = Weibo

eng jpn cmn_hansLanguage

020406080

100120140

RIC

cm

n_ha

ns

eng jpn cmn_hansLanguage

type = embassy

type = news

RIC

cm

n_ha

ns

Relative information content (RIC) of microblog posts in English (eng),Japanese (jpn) and Simplified Chinese (cmn hans). RIC is shown here as theequivalent number of Simplified Chinese characters (RICcmn hans)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 13: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Concluding thoughts

User experience

140-character limit is superficially similar across languagesbut the very definition of what is “micro” varies with language onmicroblogging platformswhich may result in very different user experiences in differentlanguagesSina Weibo’s limit is not 140 characters!

Cross language research

Different parallel corpora give different information per character ratios.Important to use a corpus similar to the text being studied.Importance of multilingual studies

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 14: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

How much is said in a microblog?A multilingual inquiry based on Weibo and Twitter

Han-Teng Liaohttp://people.oii.ox.ac.uk/hanteng/

@hanteng

King-wa Fuhttps://sites.google.com/site/fukingwa/

[email protected]

Scott A. Halehttp://www.scotthale.net/

@computermacgyve

1 July 2015

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

We would like to thank our colleges and the anonymous reviewers for their helpful

comments.

Page 15: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Dugan, L. (2011, July). 140 Characters On Chinese Twitter Is More Like500 Characters On Twitter.com. AllTwitter.(http://www.mediabistro.com/alltwitter/140-characters-on-chinese-twitter-is-more-like-500

-characters-on-twitter-com b11951)Ebner, M., & Schiefner, M. (2008). Microblogging - more than fun. In

Proceedings of the iadis international conference on mobile learning(pp. 155–159). IADIS.

Economist. (2011). Ai weiwei’s blog: A digital rallying cry.(http://www.economist.com/blogs/prospero/2011/04/ai weiweis blog)

Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we Twitter:Understanding microblogging usage and communities. In Proceedingsof the 9th webkdd and 1st sna-kdd 2007 workshop on web mining andsocial network analysis (pp. 56–65). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1348549.1348556

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 16: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Lee, K. (2014, April). The proven ideal length of every tweet, Facebookpost, and headline online. Fast Company.(http://www.fastcompany.com/3028656/work-smart/the-proven-ideal-length-of-every-tweet-facebook-post-and

-headline-online)Neubig, G., & Duh, K. (2013). How much is said in a tweet? A

multilingual, information-theoretic perspective. In AAAI SpringSymposium on Analyzing Microtext. Available from http://

www.aaai.org/ocs/index.php/SSS/SSS13/paper/view/5698

Rosen, R. J. (2011, September). How much can you say in 140 characters?A lot, if you speak Japanese. The Atlantic.(http://www.theatlantic.com/technology/archive/2011/09/how-much-can-you-say-in-140-characters-a-lot-if-you

-speak-japanese/245199/)Ruby, B. (2012, November). Twitter versus Weibo: What You Need To

Know. The Fearless Group. (http://thefearlessgroup.com/twitter-versus-weibo-what-you-need-to-know/)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .

Page 17: How much is said in a microblog? A multilingual inquiry based on Weibo and Twitter (Slides)

Summers, B. (2010, February). What’s the equivalent of Twitter’s 140character limit for non-Latin character sets? [Informationmanagement].(http://bens.me.uk/2010/twitter-charset-experiment)

Track Social. (2012, October). Track social blog: Optimizing Twitterengagement – part 3: Tweet length [Web log post].(http://tracksocial.com/blog/2012/10/optimizing-twitter-engagement-part-3-tweet-length/)

Twitter. (2013). Best practices: Be the best at what, when and how youtweet. (https://web.archive.org/web/20131020033342/https://business.twitter.com/best-practices)

United Nations. (2007, December). The World’s Most TranslatedDocument. Human Rights Day. (http://www.un.org/en/events/humanrightsday/2007/worldtransdoc.shtml)

Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .