View
25
Download
0
Category
Tags:
Preview:
Citation preview
How much is said in a microblog?A multilingual inquiry based on Weibo and Twitter
Han-Teng Liaohttp://people.oii.ox.ac.uk/hanteng/
@hanteng
King-wa Fuhttps://sites.google.com/site/fukingwa/
kwfu@hku.hk
Scott A. Halehttp://www.scotthale.net/
@computermacgyve
1 July 2015
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Background, Motivations
Length limits and user experience
Early length restrictions for technical reasons (e.g., 140 bytes for SMS)
Continuing enforcement is often more for user experiencelower time and thought required to create new contentand allow for faster and more timely exchange of information(Ebner & Schiefner, 2008; Java, Song, Finin, & Tseng, 2007)“creativity loves constraints and simplicity is at our core”(Twitter, 2013)
Most studies on length limits and user experience only consider English
Popular interest
Lots of informal/anecdotal work on how much can fit in a x-language tweet
BBC: 140 Chinese characters �70 to 80 English words
Atlantic: Japanese tweets could contain information that would take upto 260 English characters (Rosen, 2011; Summers, 2010)
Eonomist/Ai Weiwei: “In the Chinese language, 140 characters is anovella” (Economist, 2011)
Bloggers:
140 Chinese characters could contain five times more content than thesame number of English characters (Ruby, 2012)“140 Chinese characters is more like 500 characters on Twitter.com”(Dugan, 2011)
Practical/marketing interest
What is the “ideal” message length for strong engagement?
Ideal length is 100 characters (Lee, 2014)
Ideal length is 71–100 characters (Track Social, 2012)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Background, Motivations
Length limits and user experience
Early length restrictions for technical reasons (e.g., 140 bytes for SMS)
Continuing enforcement is often more for user experiencelower time and thought required to create new contentand allow for faster and more timely exchange of information(Ebner & Schiefner, 2008; Java et al., 2007)“creativity loves constraints and simplicity is at our core”(Twitter, 2013)
Most studies on length limits and user experience only consider English
Popular interest
Lots of informal/anecdotal work on how much can fit in a x-language tweet
BBC: 140 Chinese characters �70 to 80 English words
Atlantic: Japanese tweets could contain information that would take upto 260 English characters (Rosen, 2011; Summers, 2010)
Eonomist/Ai Weiwei: “In the Chinese language, 140 characters is anovella” (Economist, 2011)
Bloggers:
140 Chinese characters could contain five times more content than thesame number of English characters (Ruby, 2012)“140 Chinese characters is more like 500 characters on Twitter.com”(Dugan, 2011)
Practical/marketing interest
What is the “ideal” message length for strong engagement?
Ideal length is 100 characters (Lee, 2014)
Ideal length is 71–100 characters (Track Social, 2012)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Background, Motivations
Length limits and user experience
Early length restrictions for technical reasons (e.g., 140 bytes for SMS)
Continuing enforcement is often more for user experiencelower time and thought required to create new contentand allow for faster and more timely exchange of information(Ebner & Schiefner, 2008; Java et al., 2007)“creativity loves constraints and simplicity is at our core”(Twitter, 2013)
Most studies on length limits and user experience only consider English
Popular interest
Lots of informal/anecdotal work on how much can fit in a x-language tweet
BBC: 140 Chinese characters �70 to 80 English words
Atlantic: Japanese tweets could contain information that would take upto 260 English characters (Rosen, 2011; Summers, 2010)
Eonomist/Ai Weiwei: “In the Chinese language, 140 characters is anovella” (Economist, 2011)
Bloggers:
140 Chinese characters could contain five times more content than thesame number of English characters (Ruby, 2012)“140 Chinese characters is more like 500 characters on Twitter.com”(Dugan, 2011)
Practical/marketing interest
What is the “ideal” message length for strong engagement?
Ideal length is 100 characters (Lee, 2014)
Ideal length is 71–100 characters (Track Social, 2012)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Related work
Neubig and Duh (2013)
information-theoretic approach (entropy)
Chinese and Japanese are the most expressive languages per character
Not based on parallel corpora(i.e., not the exact same information in multiple languages)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Motivations
What are the effects of character limits on communication and how do theseeffects differ across languages?
Research questions
How do length/space restrictions affect. . .
. . . how much can be said. . .
. . . how much space is used. . .
. . . how much is actually said. . .
. . . across different languages?
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Data
(Multilingual) Parallel corpora
Universal Declaration of Human Rights (UDHR)
“World’s most translated text” (United Nations, 2007)semilegal in nature
Subtitles of TED talks
Crowd-sourced with professional oversightMore informal
Microblog data
Twitter and Sina Weibo postsfrom 54 news and embassy organizationsin English, Japanese, and Chinese
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Data
(Multilingual) Parallel corpora
Universal Declaration of Human Rights (UDHR)
“World’s most translated text” (United Nations, 2007)semilegal in nature
Subtitles of TED talks
Crowd-sourced with professional oversightMore informal
Microblog data
Twitter and Sina Weibo postsfrom 54 news and embassy organizationsin English, Japanese, and Chinese
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Information per character
eng jpn cmn_hant
cmn_hans
0
1
2
3
4
5
6UDHR
eng jpn cmn_hant
cmn_hans
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0TED Talks
Relative ratio of characters in English (eng), Japanese (jpn), and Chineseusing simplified characters (cmn hans) required to express the same contentcompared to Chinese using traditional characters (cmn hant) as the baseline
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Information per character
eng jpn cmn_hant
cmn_hans
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0TED Talks
ratio(jpn,cmn_hans)=1.30
ratio(eng,cmn_hans)=3.21
Relative ratio of characters in English (eng), Japanese (jpn), and Chineseusing simplified characters (cmn hans) required to express the same contentcompared to Chinese using traditional characters (cmn hant) as the baseline
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Character lengths of microblog posts
020406080
100120140
Ave
rage
Len
gth
platform = Twitter platform = Weibo
eng jpn cmn_hansLanguage
020406080
100120140
Ave
rage
Len
gth
eng jpn cmn_hansLanguage
type = embassy
type = news
Length of microblog posts in characters (excluding URLs) in English (eng),Japanese (jpn) and Simplified Chinese (cmn hans)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Relative information content of microblog posts
020406080
100120140 platform = Twitter platform = Weibo
eng jpn cmn_hansLanguage
020406080
100120140
RIC
cm
n_ha
ns
eng jpn cmn_hansLanguage
type = embassy
type = news
RIC
cm
n_ha
ns
Relative information content (RIC) of microblog posts in English (eng),Japanese (jpn) and Simplified Chinese (cmn hans). RIC is shown here as theequivalent number of Simplified Chinese characters (RICcmn hans)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Concluding thoughts
User experience
140-character limit is superficially similar across languagesbut the very definition of what is “micro” varies with language onmicroblogging platformswhich may result in very different user experiences in differentlanguagesSina Weibo’s limit is not 140 characters!
Cross language research
Different parallel corpora give different information per character ratios.Important to use a corpus similar to the text being studied.Importance of multilingual studies
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
How much is said in a microblog?A multilingual inquiry based on Weibo and Twitter
Han-Teng Liaohttp://people.oii.ox.ac.uk/hanteng/
@hanteng
King-wa Fuhttps://sites.google.com/site/fukingwa/
kwfu@hku.hk
Scott A. Halehttp://www.scotthale.net/
@computermacgyve
1 July 2015
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
We would like to thank our colleges and the anonymous reviewers for their helpful
comments.
Dugan, L. (2011, July). 140 Characters On Chinese Twitter Is More Like500 Characters On Twitter.com. AllTwitter.(http://www.mediabistro.com/alltwitter/140-characters-on-chinese-twitter-is-more-like-500
-characters-on-twitter-com b11951)Ebner, M., & Schiefner, M. (2008). Microblogging - more than fun. In
Proceedings of the iadis international conference on mobile learning(pp. 155–159). IADIS.
Economist. (2011). Ai weiwei’s blog: A digital rallying cry.(http://www.economist.com/blogs/prospero/2011/04/ai weiweis blog)
Java, A., Song, X., Finin, T., & Tseng, B. (2007). Why we Twitter:Understanding microblogging usage and communities. In Proceedingsof the 9th webkdd and 1st sna-kdd 2007 workshop on web mining andsocial network analysis (pp. 56–65). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1348549.1348556
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Lee, K. (2014, April). The proven ideal length of every tweet, Facebookpost, and headline online. Fast Company.(http://www.fastcompany.com/3028656/work-smart/the-proven-ideal-length-of-every-tweet-facebook-post-and
-headline-online)Neubig, G., & Duh, K. (2013). How much is said in a tweet? A
multilingual, information-theoretic perspective. In AAAI SpringSymposium on Analyzing Microtext. Available from http://
www.aaai.org/ocs/index.php/SSS/SSS13/paper/view/5698
Rosen, R. J. (2011, September). How much can you say in 140 characters?A lot, if you speak Japanese. The Atlantic.(http://www.theatlantic.com/technology/archive/2011/09/how-much-can-you-say-in-140-characters-a-lot-if-you
-speak-japanese/245199/)Ruby, B. (2012, November). Twitter versus Weibo: What You Need To
Know. The Fearless Group. (http://thefearlessgroup.com/twitter-versus-weibo-what-you-need-to-know/)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Summers, B. (2010, February). What’s the equivalent of Twitter’s 140character limit for non-Latin character sets? [Informationmanagement].(http://bens.me.uk/2010/twitter-charset-experiment)
Track Social. (2012, October). Track social blog: Optimizing Twitterengagement – part 3: Tweet length [Web log post].(http://tracksocial.com/blog/2012/10/optimizing-twitter-engagement-part-3-tweet-length/)
Twitter. (2013). Best practices: Be the best at what, when and how youtweet. (https://web.archive.org/web/20131020033342/https://business.twitter.com/best-practices)
United Nations. (2007, December). The World’s Most TranslatedDocument. Human Rights Day. (http://www.un.org/en/events/humanrightsday/2007/worldtransdoc.shtml)
Han-Teng Liao, King-wa Fu, & Scott A. Hale How much is said in a microblog? A multilingual inquiry. . .
Recommended