40
Geo-Temporal-(Social?) Data? Social Events in Social Media

Trondheim bigdata Talk

Embed Size (px)

DESCRIPTION

Talk at Trondheim Big Data meetup - "Geotemporal Social Data and Events in Multimedia"

Citation preview

Page 1: Trondheim bigdata Talk

Geo-Temporal-(Social?) Data?!Social Events in Social Media!

!

Page 2: Trondheim bigdata Talk

Massimiliano Ruocco!

!

@ruoccoma!ruoccoma dot gmail dot com!Telenor Digital (SWEng), NTNU (PhD)!

Who am I?

Page 3: Trondheim bigdata Talk

Digital Footprint

Social

Geographical

Temporal

Page 4: Trondheim bigdata Talk

Scenario 1 User visiting a touristic spot. Takes a picture of it. Posts it (+ comments) on FB/Flickr/Twitter.

Page 5: Trondheim bigdata Talk

Scenario 2 User watching a football match at the stadium. Takes a picture of the match (+ comments). Posts it on FB/Flickr/Twitter

Page 6: Trondheim bigdata Talk

Scenario 3 User reading newspaper. Comments some trending facts (i.e.: crisis in Middle East). Posts it Twitter.

Page 7: Trondheim bigdata Talk

Scenario 1 User visiting a touristic spot. Takes a picture of it. Posts it (+ comments) on FB/Flickr/Twitter.

Scenario 2 User watching a football match at the stadium. Takes a picture of the match (+ comments). Posts it on FB/Flickr/Twitter

Scenario 3 User reading newspaper. Comments some trending facts Comments some trending facts (i.e.: crisis in Middle East). Posts it Twitter.

Event! <<my trip in Naples>>

Event! <<semifinal CL>>

Event! <<crisis in middle east>>

Page 8: Trondheim bigdata Talk

Events in Social Media From raw data to events  

Page 9: Trondheim bigdata Talk
Page 10: Trondheim bigdata Talk
Page 11: Trondheim bigdata Talk

Flickr as data source +250M geotagged

3.5M uploaded/day 87M users

6.000M pics

Page 12: Trondheim bigdata Talk

POI-related Tag Extraction

Page 13: Trondheim bigdata Talk

POI-related Tag Extraction Tag Point Pattern Geo distribution of pictures tagged with a certain term

Point Process Theory Extended rigorous statistic

Page 14: Trondheim bigdata Talk

POI-related Tag Extraction

Point Pattern Analysis Objective Determine  If  a  given  set  of  spa1al  points  (Spa1al  Point  Pa6ern)  exhibits  clustering,  regularity  or  are  randomly  distributed  within  an  area  A

Page 15: Trondheim bigdata Talk

POI-related Tag Extraction Ripley’s K-function Summarizing  a  spa1al  point  pa6ern  over  a  scale  h    

CSR Test -­‐  K(h)  >πh2  clustering  at  scale  h    -­‐  K(h)  <πh2  dispersion  at  scale  h    

Page 16: Trondheim bigdata Talk

POI-related Tag Extraction Ripley’s K-function Summarizing  a  spa1al  point  pa6ern  over  a  scale  h    

CSR Test -­‐  D(h)  >h  clustering  at  scale  h    -­‐  D(h)  <h  dispersion  at  scale  h    

Page 17: Trondheim bigdata Talk

POI-related Tag Extraction Ripley’s Cross-K-function Summarizing  a  spa1al  correla1on  over  two  tag  point  pa6ern  over  a  scale  h    

Spa1al  distribu1on  of  the  Tag  Point  Pa6erns  related  to  the  tag  Old Naval College    and  the  tag  University of Greenwich at  two  different  zooming    

Page 18: Trondheim bigdata Talk

POI-related Tag Extraction Ripley’s Cross-K-function Summarizing  a  spa1al  correla1on  over  two  tag  point  pa6ern  over  a  scale  h    

Spa1al  distribu1on  of  the  Tag  Point  Pa6erns  related  to  the  tag  Old Naval College    and  the  tag  University of Greenwich at  two  different  zooming    

Page 19: Trondheim bigdata Talk

POI-related Tag Extraction Ripley’s Cross-K-function Summarizing  a  spa1al  correla1on  over  two  tag  point  pa6ern  over  a  scale  h     CSR Test

-­‐  L12(h)  >0  a6rac1on  at  scale  h    -­‐  L12(h)  <0  repulsion  at  scale  h    

���� ���� ���� ���� �����

���

���

����

���������� �� �����

����������������������

Spa1al  distribu1on  of  the  Tag  Point  Pa6erns  related  to  the  tag    Old Naval College  and  the  tag  University of Greenwich !

 

Page 20: Trondheim bigdata Talk

POI-related Tag Extraction Ripley’s Cross-K-function Summarizing  a  spa1al  correla1on  over  two  tag  point  pa6ern  over  a  scale  h     CSR Test

-­‐  K12(h)  >πh2  a6rac1on  at  scale  h    -­‐  K12(h)  <πh2  repulsion  at  scale  h    

���� ���� ���� ���� �����

���

���

����

���������� �� �����

����������������������

Spa1al  distribu1on  of  the  Tag  Point  Pa6erns  related  to  the  tag    Old Naval College  and  the  tag  University of Greenwich !

 

Page 21: Trondheim bigdata Talk

POI-related Tag Extraction

Objective  

Derive  indicators  es1ma1ng  clustering  tendency    of  Tag-­‐point  pa6ern  

Applications  

1  -­‐  Extrac2ng/Ranking  social  tags  indica1ng  geographical  POI  2  -­‐Enhance  query  expansion  in  combina1on  with  other  metadata  

Page 22: Trondheim bigdata Talk

Size

Inhomogeneity  

POI-related Tag Extraction Real Data: Challenges

Page 23: Trondheim bigdata Talk

Example of point pattern of the tag night !

Data Inhomogeneity

Page 24: Trondheim bigdata Talk

Data Inhomogeneity

Related underlying Picture Point Pattern !

Page 25: Trondheim bigdata Talk

Data Inhomogeneity

Example of point pattern of the tag night over the underlying distribution!

Page 26: Trondheim bigdata Talk

Size

1  -­‐  Subsampling2-­‐  bigmatrix*  and  biganalytics** (R)  

Inhomogeneity  

Case-­‐Control  Analysis  

POI-related Tag Extraction Real Data: Challenges

*Kane  M.,  Emerson  J.,  “The  R  Package  bigmemory:  Suppor2ng  Efficient  Computa2on  and  Concurrent  Programming  with  Large  Data  Sets”  (2010).  Journal  of  Sta1s1cal  SoVware.    

**Kane  M.  et  al.,  “Scalable  Strategies  for  Compu2ng  with  Massive  Data”  (2013),  Journal  of  Sta1sc1cal  SoVware.  

Page 27: Trondheim bigdata Talk

POI-related Tag Extraction

2  -­‐  Maximum  func1on  value  K(h)  over  the  scale  

1  -­‐  Area  underlying  K(h)  in  the  considered  scale  

Derived Geo-Features

Set  1  

Set  2  

���� ���� ���� ���� �����

���

���

����

����

������ �� �����

����������������������

Page 28: Trondheim bigdata Talk

wwt! grdstreeteatportlan! britishlibrary! astoria!

POI-related Tag Extraction

Table - Top-5 tags extracted ranked by MaxValue and Area

Page 29: Trondheim bigdata Talk

Event-related Image Search Geo(Temporal)-tagged resources supporting IR

Page 30: Trondheim bigdata Talk

Event-related Image Search Geo(Temporal)-tagged resources supporting IR

Page 31: Trondheim bigdata Talk

Event-related Image Search Expansion terms selection over three dimensions  

Text  Features  (baseline)  -­‐  TF,  IDF,  DF  

Time  Features  -­‐  Kurtosis:  Peakdness  -­‐  Autocorrela-on:  Randomness  -­‐  Cross-­‐Correla-on    

Geo  Features  -­‐  Good  expansion  =  spa1ally  correlated  with  qi    

-­‐  Calculated  for  each  1le  Tqi  ,e    

Derived  from  q-­‐point  pa6er  &  e-­‐point  pa6ern  &  (q+e)-­‐point  pa6ern  

Page 32: Trondheim bigdata Talk

Event-related Image Search Scalability?  

Page 33: Trondheim bigdata Talk

Event-related Image Search Scalability?  

Which Tile? 1 - Best tile + calculate confidence value 2 - Confidence values combination from different tiles:

Map Reduce fashion + Solr Search engine  

Page 34: Trondheim bigdata Talk

Event-related Image Search Scalability?  

Page 35: Trondheim bigdata Talk

Event-related Image Search Results  

Table – Comparison of the classification performances. The best scores in each column are type-set boldface.

Page 36: Trondheim bigdata Talk

Event-related Image Search Results  

Fig – Comparison of MAP improvements as function of number of feedback docs

Page 37: Trondheim bigdata Talk

Yes! But…BigData? •  Bigmatrix + Biganalytics in R •  Subsampling •  World map divided in tiles •  Map-Reduce fashion algorithm

Page 38: Trondheim bigdata Talk

Cool stuff! Increasing volume of Geo-Temporal Data from Social Media ++

Amazing things! –  Visualization –  Location-based recommendation –  Dicovering trends!

Page 39: Trondheim bigdata Talk

Thanks! Questions?

M.  Ruocco  and  H.  Ramampiaro,  (2014),  "Geo-­‐Temporal  Distribu2on  of  Tag  Terms  for  Event-­‐Related  Image  Retrieval".  In  Informa1on  Processing  &  Management  Journal  (IPM).  Elsevier.  

M.  Ruocco  and  H.  Ramampiaro,  (2014),  "A  Scalable  Algorithm  for  Extrac2on  and  Clustering  of  Event-­‐Related  Pictures".  In  Mul1media  Tools  and  Applica1ons  Journal  (MTAP),  Springer.  

M.  Ruocco  and  H.  Ramampiaro,  (2013),  "Exploring  Temporal  Proximity  and  Spa2al  Distribu2on  of  Terms  in  Web-­‐based  Search  of  Event-­‐Related  Images".  In  Proc.  of  the  24th  ACM  Conference  on  Hypertext  and  Social  Media  (HT  2013).  ACM  Press.  

M.  Ruocco  and  H.  Ramampiaro,  (2012),  "Exploratory  Analysis  on  Heterogeneous  Tag-­‐Point  PaQerns  for  Ranking  and  Extrac2ng  Hot-­‐Spot  Related  Tags".  Proceedings  of  the  5th  ACM  SIGSPATIAL  Interna1onal  Workshop  on  Loca1on-­‐Based  Social  Networks  (LBSN  2012).  ACM  Press.  

[email protected]!

Page 40: Trondheim bigdata Talk

POI-related Tag Extraction

Evaluation of top-100 extracted tags: P@n