23
Profiling visitors of vangoghletters.org [email protected] Investigating Usage and Users of Digital Resources

0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Profiling visitors of vangoghletters.org

[email protected]

Investigating Usage and Users of Digital Resources

Page 2: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Content

2

Why analyze log data Log data's possibilities and limitations Tools for log analysis Edition of Van Gogh letters Findings

Visits, pages, duration Where from Use of navigational facilities Characterizing users' interests

Conclusions

Page 3: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Why analyze log data

3

Study users Usability problems Check usage of site components Customise site to user Visitor characteristics and propensities

Collecting info Observation Questionnaire Log data

Page 4: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Log data available

4

95.114.173.255 - - [10/Jun/2011:11:37:22 +0200] "GET /vg/index.html HTTP/1.1" 200 11031 "http://es.wikipedia.org/wiki/Vincent_van_Gogh" "Mozilla/5.0 (Windows; U; Windows NT 6.1; es-ES; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13"

IP address

Timestamp

Response code, page size Referring page

Requested page

User agent

Page 5: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Caveats

5

Every object request recorded in log large amount of data No request no data recorded No identification of session Lots of traffic from robots

Page 6: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Log analyzers (AWStats)

Page 7: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Log analyzers (AWStats)

Page 8: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Google Analytics

Page 9: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions
Page 10: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Taming log data Period February 6– September 5, 2011

Number of log lines Ca. 8,053,000

Number of pages (after cleaning) 946,000

IP numbers 'Computers' 'Sessions' Sessions >1 page Return visits

51473 58491 85409 55218 26918

Pages per session 11.1

Pages per session (> 1 page) 16.6

Time per session (sec.) 292

Time per session (> 1 page) (sec.) 452

Returning visitors 7714

Avg. sessions of returning visitors 4.5

Page 11: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

4356

2341

587 268 106 35 14 7

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

2 3-5 6-10 11-20 21-50 51-100 101-200 201+

Returning visitors and number of sessions

Page 12: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Logfile turned into spreadsheet Timestamp   To  page   From  page  

5-­‐apr-­‐10   15:01:38      entry      extern  5-­‐apr-­‐10   15:04:27      search   simple   "many  a  flower  is  trampled"      entry  5-­‐apr-­‐10   15:04:28      letter   ?   let574      search   simple  5-­‐apr-­‐10   15:04:29      letter   translation   let574      letter   top   let574  5-­‐apr-­‐10   15:04:29      letter   notes   let574      letter   top   let574  5-­‐apr-­‐10   15:04:29      letter   original_text   let574      letter   top   let574  5-­‐apr-­‐10   15:08:30      toc   period      search   simple  5-­‐apr-­‐10   15:08:50      letter   ?   let588      toc   period  5-­‐apr-­‐10   15:08:51      letter   translation   let588      letter   top   let588  5-­‐apr-­‐10   15:08:51      letter   notes   let588      letter   top   let588  5-­‐apr-­‐10   15:08:51      letter   original_text   let588      letter   top   let588  5-­‐apr-­‐10   15:09:16      letter   ?   let589      toc   period  5-­‐apr-­‐10   15:09:18      letter   translation   let589      letter   top   let589  5-­‐apr-­‐10   15:09:18      letter   notes   let589      letter   top   let589  5-­‐apr-­‐10   15:09:18      letter   original_text   let589      letter   top   let589  5-­‐apr-­‐10   15:16:38      letter   ?   let683      extern  5-­‐apr-­‐10   15:16:41      letter   notes   let683      letter   top   let683  5-­‐apr-­‐10   16:02:52      toc   complete      letter   top   let751  5-­‐apr-­‐10   16:15:50      entry      extern  5-­‐apr-­‐10   16:16:18      essay   about   1      entry  6-­‐apr-­‐10   12:04:46      entry      extern  6-­‐apr-­‐10   12:12:52      essay   correspondents   1      entry  6-­‐apr-­‐10   12:12:54      toc   correspondent      essay   correspondents   1  

6-­‐apr-­‐10   13:48:56      toc   period      entry  

Page 13: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Where do people come from? Sites Number www.google.com 25989 www.vangoghmuseum.nl 4359 any.wikipedia.org 2870 www.googleartproject.com 2106 translate.googleusercontent.com 1042 www.facebook.com 906 www.vggallery.com 839 www.bing.com 735 www.vangoghreproductions.com 463 search.yahoo.com 395 www.vangoghsblog.com 353 webcache.googleusercontent.com 262 www.stumbleupon.com 247 painting.about.com 236 twitter.com 215 ... … Total 49271

External sessions

Yes 41161

No 44244

Total 85405

Page 14: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Landing pages of external visitors

border 386 essay 2078

home 19355

illustration 594

letter 16453

list 1171 search 6242

toc 2540

Page 15: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Access to letters

border 408 essay 686 extern 16451

home 1346

letnext 51815

letother 4277 letref 13292 list 1073

search 30377

toc 58990

Page 16: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Searches Search term Searches

keyword  or  number(s)   2384  paris   510  sunflowers   219  starry  night   202  ?   144  japanese   110  japan   108  ear   103  gauguin   93  zola   87  theo   85  arles   75  bedroom   69  sower   67  colour   66  

Page 17: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

Searches for works of art Total Ref. Adv. Index Artwork

126   107   19   0  Vincent  van  Gogh    -­‐  The  bedroom  (F  482  /  JH  1608)  

90   76   12   2  Vincent  van  Gogh    -­‐  Starry  night  (F  612  /  JH  1731)  

72   69   3   0  Vincent  van  Gogh    -­‐  The  night  café  (F  463  /  JH  1575)  

59   49   10   0  Vincent  van  Gogh    -­‐  Starry  night  over  the  Rhône  (F  474  /  JH  1592)  

42   40   2   0  Vincent  van  Gogh    -­‐  The  Tarascon  diligence  (F  478a  /  JH  1605)  

41   37   3   1  Vincent  van  Gogh    -­‐  Self-­‐portrait  (F  476  /  JH  1581)  

39   33   6   0  Vincent  van  Gogh    -­‐  The  potato  eaters  (F  82  /  JH  764)  

34   20   14   0  Vincent  van  Gogh    -­‐  The  potato  eaters  (F  1661  /  JH  737)  

32   32   0   0  Jean-­‐François  Millet    -­‐  The  two  diggers  

31   4   25   2  Vincent  van  Gogh    -­‐  Starry  night  (F  1540  /  JH  1732)  

31   30   1   0  Vincent  van  Gogh    -­‐  Sunflowers  in  a  vase  (F  454  /  JH  1562)  

30   30   0   0  Emile  Bernard    -­‐  Christ  in  the  Garden  of  Olives  

29   18   11   0  Vincent  van  Gogh    -­‐  Pink  peach  trees  (‘Souvenir  de  Mauve’)  (F  394  /  JH  1379)  

28   26   2   0  Ferdinand  Victor  Eugène  Delacroix    -­‐  Christ  asleep  during  the  tempest  

26   25   0   1  (Eugène  Henri)  Paul  Gauguin    -­‐  Human  miseries  

Page 18: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions
Page 19: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions
Page 20: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions
Page 21: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions
Page 22: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions
Page 23: 0'('&)1*2$%.3-4$% * · Taming log data Period February 6– September 5, 2011 Number of log lines Ca. 8,053,000 Number of pages (after cleaning) 946,000 IP numbers 'Computers' 'Sessions

To conclude For this project

Fair amount of serious usage Lots of automated access No fixed point of entry Sequential access ('reading') is very important

In general We can get a reasonable idea of what users do Could build a nice application for studying user behaviour Access patterns are highly irregular In depth understanding needs more than log data

Thank you!