142
Krist Wongsuphasawat / @kristw visualization at Twitter logs &

Logs & Visualizations at Twitter

Embed Size (px)

Citation preview

Krist Wongsuphasawat /@kristw

visualizationat Twitter

logs&

Krist Wongsuphasawat /@kristw

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

Chulalongkorn University

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

Programming + Soccer

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

Programming + Soccer

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

Programming + Soccer

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

M.S. in Computer ScienceUniv. of Maryland

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

PhD in Computer ScienceUniv. of MarylandInformation Visualization

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

PhD in Computer ScienceUniv. of MarylandInformation Visualization

IBMMicrosoft

Krist Wongsuphasawat /@kristw

Computer EngineerBangkok, Thailand

PhD in Computer ScienceUniv. of MarylandInformation Visualization

IBMMicrosoft

Data Visualization ScientistTwitter

internal tools

public-facing

public-facing

interactive.twitter.com

internal tools

Krist Wongsuphasawat & Jimmy Lin@kristw

Using visualizations to monitor changes and harvest insights

from log data at Twitter

@lintool

IEEE VAST 2014

Logging user activities & data analysis

UsersUseTwitter

UsersUse

Product Managers

Curious

Twitter

UsersUse

Curious

Engineers

Log datain Hadoop Write Twitter

Instrument

Product Managers

What are being logged?

tweetactivities

What are being logged?

tweet from home timeline on twitter.com tweet from search page on iPhone

activities

What are being logged?

tweet from home timeline on twitter.com tweet from search page on iPhone

sign up log in

retweet etc.

activities

Organize?

log event a.k.a. “client event”

[Lee et al. 2012]

log event a.k.a. “client event”

client : page : section : component : element : actionweb : home : timeline : tweet_box : button : tweet

1) User ID 2) Timestamp 3) Event name

4) Event detail

[Lee et al. 2012]

Log data

UsersUse

Curious

Engineers

Log datain Hadoop Twitter

Instrument

Write

Product Managers

bigger than Tweet data

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Ask

Twitter

Instrument

Write

Product Managers

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find

Ask

Twitter

Instrument

Write

Product Managers

Log data

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find, Clean

Ask

Twitter

Instrument

Write

Product Managers

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find, Clean

Ask

Monitor

Twitter

Instrument

Write

Product Managers

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find, Clean, Analyze

Ask

Monitor

Twitter

Instrument

Write

Product Managers

Log data

EngineersData Scientists

Usersin Hadoop

Find, Clean, Analyze

Use

Monitor

Ask

Curious

1 2

Twitter

Instrument

Write

Product Managers

Part I Find & Monitor Client Events

Motivation

Log datain Hadoop

Engineers & Data Scientists

billions of rows

Log datain Hadoop

Aggregate

10,000+ event types

date client page section comp. elem. action count

20141011 web home home - - impression 100

20141011 web home wtf - - click 20

Engineers & Data Scientists

Client event collection

Log datain Hadoop

Aggregate

10,000+ event types

date client page section comp. elem. action count

20141011 web home home - - impression 100

20141011 web home wtf - - click 20

Engineers & Data Scientists

Client event collection

(Who-to-Follow)

Log datain Hadoop

AggregateClient event collection

Engineers & Data Scientists

Log datain Hadoop

Aggregate

Find

client page section component element action

Search

Client event collection

Engineers & Data Scientists

Log datain Hadoop

Aggregate

Find

client page section component element action

Search

Client event collection

Engineers & Data Scientists

section? component?

element?

client page section component element action

Search

Find

Log datain Hadoop

Aggregate

web home * * impression*

Client event collection

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

web home * * impression*

Client event collection

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

search can be better

Client event collection

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

10,000+ event types

search can be better

Client event collection

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

search can be better

10,000+ event types

not everybody knowsWhat are all sections under web:home?

Client event collection

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

Aggregate

search can be better

one graph / event

10,000+ event types

not everybody knowsWhat are all sections under web:home?

Client event collection

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

Aggregate

search can be better

one graph / eventx 10,000

10,000+ event types

not everybody knowsWhat are all sections under web:home?

Client event collection

Engineers & Data Scientists

!

• Search for client events

• Explore client event collection

• Monitor changes

Goals

• Session analysis

!

• Monitor network logs, not user activity logs

Related work

[Lam et al. 2007, Shen et al. 2013]

[Ghoniem et al. 2013]

Design

Client event collection

Engineers & Data Scientists

See

Client event collection

Engineers & Data Scientists

See

Interactions search box => filter

Client event collection

narrow down

Engineers & Data Scientists

See

How to visualize?

narrow down

Client event collection

Engineers & Data Scientists

Interactions search box => filter

See

How to visualize?

narrow down

Client event collection

Engineers & Data Scientists

client : page : section : component : element : actionInteractions search box => filter

Client event hierarchy

iphone home -

- - impression

tweet tweet click

iphone:home:-:-:-:impressioniphone:home:-:tweet:tweet:click

Detect changes

iphone home -

- - impression

tweet tweet click

iphone home -

- - impression

tweet tweet click

TODAY

7 DAYS AGO

compared to

Calculate changes

+5% +5% +5%

+10% +10% +10%

-5% -5% -5%

DIFF

Display changes

iphone home -

- - impression

tweet tweet click

Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]

Display changes

home -

- - impression

tweet tweet click

iphone

Demo Scribe Radar

Twitter for Banana

• Since Dec 2013

• 500 unique users, 10 users / day

!

• No training

Deployment

Users: PMs, Data Scientists, Engineers

• Search

• Monitor

• See effects after major product launch

Use cases

more information in the paper

Part II Analysis

Count page visits

banana : home : - : - : - : impressionhome page

Funnel

home page

profile page

Funnel analysis

banana : home : - : - : - : impression

banana : profile : - : - : - : impression

1 jobhome page

profile page

1 hour

Funnel analysis

banana : home : - : - : - : impression

banana : profile : - : - : - : impression banana : search : - : - : - : impression

home page

profile page search page

2 jobs2 hours

Funnel analysis

banana : home : - : - : - : impression

banana : profile : - : - : - : impression banana : search : - : - : - : impression

home page

profile page search page

Specify all funnels manually!

n jobsn hours

Goal

banana : home : - : - : - : impression

… ……

1 job => all funnels, visualized

home page

• Visualize an overview of event sequences

!

Related work

[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]

• Visualize an overview of event sequences

!

• Big data? eBay checkout sequences

!

One funnel at a time Checkout > Payment > Confirm > Success

Related work

[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]

[Shen et al. 2013]

LifeFlow [CHI2011]

!

(simplified)

User sessionsSession#1

A

B

start

end

Session#4

start

end

A

Session#2

B

start

end

A

Session#3

C

start

end

A

Aggregate4 sessions

A

BB C

start

end endend

A A

end

A

Aggregate

A

BB C

start

end endend

end

4 sessions

Aggregate

C

start

end endend

end

A

B

4 sessions

Aggregate

C

start

end endend

end

A

B

4 sessions

Aggregate

C

start

end endend

A

B end

4 sessions

Aggregate

C

start

endend

A

B end

4 sessions

Aggregate

C

start

endend

A

B end

4 sessions

Aggregate

start

endend

A

CB end

4 sessions

Aggregate

4,000,000 sessions

endend

A

CB end

start

try with “sample” data (~millions sessions, 10,000+ event types)

!

original paper (100,000 sessions, ~10 event types)

not meaningful !

small slice of data but huge file

How to make it work?

# of unique sequences

1. Reduce event types

Reduce # of unique sequences

1. Reduce event types

Reduce # of unique sequences

10,000 types select

tweet sign up log out

1. Reduce event types

Reduce # of unique sequences

10,000 types select

tweet sign up log out

1. Reduce event types

Reduce # of unique sequences

10,000 types select merge

tweet from home timeline tweet from search page tweet …

= tweet

1. Reduce event types

2. Reduce sequence length

Reduce # of unique sequences

1. Reduce event types

2. Reduce sequence length

Reduce # of unique sequences

session

1000 events

1. Reduce event types

2. Reduce sequence length

Reduce # of unique sequences

session

10 events after (window size & direction)

1000 events

visit home page (alignment)

1. Reduce event types

2. Reduce sequence length

Reduce # of unique sequences

Ask users for input}

1. Reduce event types

2. Reduce sequence length

3. More aggregation on Hadoop

Reduce # of unique sequences

Ask users for input}

Collapse eventsSequence ABBBCCCC ABBCC ABC ABCCCC ABCD ABCCCD ABCCE ABCDF ABCDG ABCDH

e.g. tweet, tweet, tweet, … = tweet

Sequence ABC ABC ABC ABC ABCD ABCD ABCE ABCDF ABCDG ABCDH

Collapse events

Group & CountSequence ABC ABCD ABCE ABCDF ABCDG ABCDH …

Count 2000 80 20 1 1 1 …

Group & CountSequence ABC ABCD ABCE ABCDF ABCDG ABCDH ABCDI ABCDJK ABCDJL

Count 2000 80 20 1 1 1 1 1 1

rare sequences (count < threshold)

TruncateSequence ABC ABCD ABCE ABCDx ABCDx ABCDx ABCDx ABCDJx ABCDJx

Count 2000 80 20 1 1 1 1 1 1

Replace last event with x (…)

Sequence ABC ABCD ABCE ABCDx ABCDJx

Count 2000 80 20 4 2

Group & Count

Truncate moreSequence ABC ABCD ABCE ABCDx ABCDx

Count 2000 80 20 4 2

Group & CountSequence ABC ABCD ABCE ABCDx

Count 2000 80 20 6

1. Define set of events

2. Pick alignment, direction and window size

3. Run Hadoop job (with more aggregation)

4. Wait for it… (2+ hrs)

5. Visualize

Final process

~100,000 patterns (10MB)

gazillion patterns (TBs)

Demo Flying Sessions

• Since Jan 2013

• Fewer users, but more in-depth ad-hoc analysis

• Initial meeting to provide support

Deployment

• What did users do when they visit Twitter? (in demo)

• Where did users give up in the sign up process?

• more in the paper

Case studies

• Large-scale User Activity Logs + Visual Analytics

Conclusions & Future work

• Large-scale User Activity Logs + Visual Analytics

• Find, Monitor & Explore + Anomaly detection & automatic alert

• Funnel Analysis + More interactivity & data / reduce wait time

• Used in day-to-day operations at Twitter

Conclusions & Future work

Conclusions & Future workChallenge

big data

small data

visualize & interact

• Large-scale User Activity Logs + Visual Analytics

• Find, Monitor & Explore + Anomaly detection & automatic alert

• Funnel Analysis + More interactivity & data / reduce wait time

• Used in day-to-day operations at Twitter

aggregate & sacrifice

• Large-scale User Activity Logs + Visual Analytics

• Find, Monitor & Explore + Anomaly detection & automatic alert

• Funnel Analysis + More interactivity & data / reduce wait time

• Used in day-to-day operations at Twitter

• Generalize to smaller systems

Conclusions & Future workChallenge

big data

small data

visualize & interact

aggregate & sacrifice

• Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu

• Feedback from reviewers, Ben Shneiderman & Catherine Plaisant

Acknowledgement

• Large-scale User Activity Logs + Visual Analytics

• Find, Monitor & Explore + Anomaly detection & automatic alert

• Funnel Analysis + More interactivity & data / reduce wait time

• Used in day-to-day operations at Twitter

• Generalize to smaller systems

Conclusions & Future workChallenge

big data

small data

visualize & interact

[email protected] / @kristw

aggregate & sacrifice

One more thing …

I�0NY/CMKPI�F��EJCTVU�LWUV�C�NKVVNG�DKV�GCUKGT�

%PWNXY\ %YWJGTW

H[SODQDWLRQV�LQVLGH

-T\$

I�0NY�MJQUX�^TZ

Ɣ FQ�DQTKPI�UGVWR�VCUMU�OCTIKPU��GVE��

Ɣ ETGCVG�TGURQPUKXG�EJCTVU�JCPFNG�TGUK\G�

Ɣ ETGCVG�TGWUCDNG�EQORQPGPVU

Ɣ OCPCIG�NC[GTU

I�0NY�MJQUX�^TZ

Ɣ FQ�DQTKPI�UGVWR�VCUMU

Ɣ ETGCVG�TGURQPUKXG�EJCTVU

Ɣ ETGCVG�TGWUCDNG�EQORQPGPVU

Ɣ OCPCIG�NC[GTU

F�-KV�%JCTVNGV

F�-KV�5MGNGVQP

F�-KV�.C[GT1TICPK\GT

F�-KV�HCEVQT[

2FWLNSX� I�èX�RFWLNS�HTS[JSYNTS�

KWWS���EO�RFNV�RUJ�NULVWZ��HHI�FE��I�GIF�F�D�F

2FWLNSX� I�èX�RFWLNS�HTS[JSYNTS�

ÔNQBÕ��ÔB�OM<INAJMHÓ�����ÕÔ�BÕÔ�NQBÕ

7UWCNN[�[QW�YKNN�JCXG�VQ�ETGCVG�CP��UXI �CPF�C��I � KPUKFG�YKVJ�UQOG� VTCPUNCVKQP� VQ� CFF�OCTIKPU� HQT� VJG� CZGU�� F�� JCU��VJKU� RCIG� VJCV� GZRNCKPU� VJG�EQPXGPVKQP��*QYGXGT�� VJGTG�CTG�UGXGTCN� UVGRU� VJCV� [QW� JCXG� VQ�FQ�GXGT[�VKOG�

+V� CNUQ� FQGU� PQV� NGV� [QW� GCUKN[�EJCPIG�VJG�OCTIKP�NCVGT�

KWWS���EO�RFNV�RUJ�NULVWZ��HHI�FE��I�GIF�F�D�F

1JY�I�0NY�8PJQJYTS�MFSIQJ�YMFY

'FW�HMFWY

KWWS���EO�RFNV�RUJ�NULVWZ��HFF�G�����FDGEH����

'FW�HMFWY.GVũU� EQORCTG� JQY� VQ� KORNGOGPV� VJKU�UKORNG�DCT�EJCTV�

KWWS���EO�RFNV�RUJ�NULVWZ��HFF�G�����FDGEH����

7UKPI�F�-KV1TKIKPCN�F��GZCORNG

7UKPI�F�-KV1TKIKPCN�F��GZCORNG

7UKPI� F�-KV�� [QW� ECP� ETGCVG� C� UMGNGVQP��RCUUKPI�KP�VJG�EQPVCKPGT�G�I��DQF[��CPF�NGV�KV�ETGCVG� VJG� �UXI �� �I �� CPF� ECNEWNCVG� VJG�OCTIKPU�� 6JGP� [QW� ECP� QDVCKP� VJG� �I � WUKPI�UMGNGVQP�IGV4QQV)��

#NYC[U� WUG� UMGNGVQP�IGV+PPGT9KFVJ�� VQ� IGV�EQPVGPV� CTGC�� +H� [QW� EJCPIG� VJG� OCTIKP� XKC�UMGNGVQP�OCTIKP�� NCVGT�� ;QW� FQ� PQV� JCXG� VQ�YQTT[�CDQWV�WRFCVKPI�YKFVJ�ECNEWNCVKQP�CV�CNN��UMGNGVQP�IGV+PPGT9KFVJ�� YKNN� TGVWTP� VJG�WRFCVGF�KPPGT�YKFVJ�

7JXUTSXN[J�HMFWY

KWWS���EO�RFNV�RUJ�WUHERUHVTXH�I���������FI�����GI�

7JXUTSXN[J�HMFWYF�-KV�5MGNGVQP� CNUQ� JGNR� [QW� ECVEJ� TGUK\G�GXGPVU� CPF� TGUK\G� VJG� UMGNGVQP� CEEQTFKPI� VQ�[QWT�PGGF�HWNN�YKFVJ��MGGR�CURGEV�TCVKQ��

+P� VJKU� GZCORNG�� VJG� ,CRCP� HNCI� YKNN�ITQY�UJTKPM�YJGP�[QW�TGUK\G�VJG�YKPFQY��DWV�CNYC[U�MGGR�VJG�UCOG�CURGEV�TCVKQ�

5MGNGVQP� FKURCVEJGU� ŬTGUK\Gŭ� GXGPV�� UQ� [QW�YKNN�MPQY�YJGP�VQ�TGFTCY�[QWT�XKU�

KWWS���EO�RFNV�RUJ�WUHERUHVTXH�I���������FI�����GI�

7JZXFGQJ�HMFWY

KWWS���EO�RFNV�RUJ�NULVWZ�G�E��GG��D�F�������F

7JZXFGQJ�HMFWYF�-KV� CNUQ� RTQXKFGU� C� NKIJVYGKIJV� HCEVQT[� VQ�JGNR� [QW� ETGCVG� TGWUCDNG� EJCTV� QP� VQR� QH� C�UMGNGVQP��

9G�CTG�PQV� VT[KPI� VQ�FGHKPG�C�EQORNGZ� HTCOG�YQTM�JGTG��DWV�YG�CKO�VQ�UGV�VJG�UVCIG�CPF�IGV�QWV�QH�VJG�YC[��

KWWS���EO�RFNV�RUJ�NULVWZ�G�E��GG��D�F�������F

(MFWYQJY

KWWS���EO�RFNV�RUJ�WUHERUHVTXH��FF�G���EH����G�����

(MFWYQJY%JCTVNGV� JGNRU� [QW� ETGCVG� TGWUCDNG�EQORQPGPVU�YKVJKP�EJCTV��(QT�GZCORNG��VJGUG�HCEGU�DGNQY�CTG� KORNGOGPVGF�WUKPI�%JCTVNGV��;QW�ECP� KPVGTCEV�YKVJ� VJGO� KPFKXKFWCNN[��9G�ECP� GCUKN[� TGWUG� VJKU� HCEG� XKU� KP� CPQVJGT�EJCTV��

KWWS���EO�RFNV�RUJ�WUHERUHVTXH��FF�G���EH����G�����

&[FNQFGQJ�YTIF^�,WUV�QRGP�UQWTEGF�(TGUJ�HTQO�VJG�QXGP�

DQYGT�KPUVCNN�F�MKVIKVJWD�EQO�VYKVVGT�F�MKV

-TKUV�9QPIUWRJCUCYCV

"MTKUVY4QDGTV�*CTTKU

"VTGDQT

IKVJWD�EQO�VYKVVGT�F�MKV

Questions?

Thank you