35
Data Visualisation: Other data visualisation tools & Twitter data 08.12.2016 Dr. Elena Demidova Data Visualisation (COMP6234)

Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Data Visualisation: Other data visualisation tools &

Twitter data08.12.2016

Dr. Elena Demidova

Data Visualisation (COMP6234)

Page 2: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Session Aims and Learning Outcomes

• The aim of the session is to demonstrate the applications of visualisation techniques for data analysis at the example of Twitter data and to develop practical skills to create such visualisations.

• By the end of this session, the students should be able to:

• LO1: Identify research questions that can be answered with visualisations at the example of Twitter datasets.

• LO2: Identify relevant attributes and operators and select suitable visualisations and tools for each research question.

• LO3: Create and discuss visualisations for the identified research questions using Tableau and other tools.

2

Page 3: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Case Study: Analysing Brexit in Twitter

• 3 datasets: Tweets containing tags #brexit, #leave and #stay collected in a few days around Brexit event in June 2016.

3

User name

Tweet time

Page 4: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Exploring Datasets using Visualisations

1. Identify the question(s) to be answered by the visualisation

2. Identify attributes, relations and operators required to create the visualisation; if needed perform pre-processing.

3. Select suitable visualisation

4. Visualise the data

5. Adjust visualisation if required (e.g. handle dominating values, outliers, etc.)

6. Discuss the results (i.e. what can be observed)

4

Page 5: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Case Study: Analysing Brexit in Twitter

• Datasets: #brexit, #leave, #stay

– https://drive.google.com/open?id=0B6dJh1dMpPRSVWMtdVYwNE15Mzg

– https://secure.ecs.soton.ac.uk/noteswiki/w/COMP6234/1617

– http://www.edshare.soton.ac.uk/17977/

• Data and schema examples

5

Page 6: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Identifying Research Questions

• Datasets: #brexit, #leave, #stay

• Which research questions can be answered using these data?

– Look at the relations between the attributes in one dataset

– Compare data across different datasets

• Discussion in groups (10 minutes):

– Formulate 5 research questions to be answered using these datasets, specify how (by which attributes/operators) can these questions be answered and what the most suitable visualisation would be.

6

Page 7: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Example Research Questions

• Q1: How many different tweets and users does the dataset contain?

• Q2: What is the distribution of “likes” in the dataset?

• Q3: From which locations were the tweets posted?

• Q4: When was the majority of tweets posted?

• Q5: Who posts most frequently with both, #brexit and #stay tags?

• Q6: A question from your group discussion

7

Page 8: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Next Steps

1. Identify relevant attributes and operators.

2. Select suitable visualisation, e.g.

• Maps

• Pie charts

• Bar charts

• Box plots

• Treemaps

• Other (?)

8

Page 9: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q1: How many different tweets and users does the Brexit dataset contain?

• Distinct count of Tweetid

• Distinct count of Userid

• Attention: we use count distinct, not just count! Why?

• Select visualisation(s).

9

Page 10: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q1: How many different tweets and users does the Brexit dataset contain? (Practical)

• Open Tableau. Connect->text file ->brexit.csv

• Open a new worksheet

– Add a title: e.g. “Number of tweets per user in the Brexit dataset.” (Pay attention to the descriptive titles!)

– Columns: CNTD(Tweetid), Rows: Name

– Sort: Vertical axis by descending Tweetid Count(Distinct)

– What can be observed in the visualisation?

10

Page 11: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q1: Example visualisation with horizontal bars

11What can be observed in the visualisation?

Page 12: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q2: What is the distribution of “likes” in the Brexit dataset?

• Count #tweets having different number of likes in the Brexit dataset.

– Visualisation?

12

Page 13: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q2: What is the distribution of “likes” in the Brexit dataset?(Practical)

• Open a new worksheet

– Add a title: e.g. “Number of tweets with likes in the Brexit dataset.”

– Columns: dimension “Likestr” (independent variable)

– Rows: measure “CNTD(Tweetid)” (dependent variable)

– Sort rows by descending CNTD(Tweetid)

– What can be observed in the visualisation?

– Handle dominating values in bar charts using log scale axis: “Edit axis”, “Logarithmic”.

13

Page 14: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q2: Example visualisation with horizontal bars

14What can be observed in the visualisation?

Page 15: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q3: From which locations were the tweets in the Brexit dataset posted?

• Group tweets by geostring (and count)

– Visualisation?

15

Page 16: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q3: From which locations were the tweets in the Brexit dataset posted? (Practical)

• Open a new worksheet, add a title: e.g. “Locations of Tweets in the Brexit dataset”

• Adjust type of geostring: Geographic role “Country/Region”

• Columns: Geostring, sort descending

• Rows: CNTD(Tweetid)

• Exclude null values, adjust description “Tweets with known locations”

16

Treemaps, filled maps, bar chartsWhat can be observed?

Page 17: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q3: Example visualisation with TreeMap

17What can be observed in the visualisation?

Page 18: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q4: When was the majority of tweets in the Brexit collection posted?

• Group tweets by date or datetime (granularity!), count.

– Visualisation?

18

Page 19: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q4: When was the majority of tweets in the Brexit collection posted? (Practical)

• Open a new worksheet, add a title, e.g. “#Tweets per day”.

– Columns: DAY (Datetime)

– Rows: CNTD(Tweetid)

– What does this visualisation show?

• Change granularity of “Datetime” to hour (also: adjust the title).

– What does this visualisation show?

19

Page 20: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q4: Example visualisation with TreeMap

20What can be observed in the visualisation?

Page 21: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

• Join #brexit and #stay datasets by the Userid, count/sort users.

– [Background knowledge]: What is the difference between the inner/left/right/full outer join operators?

– Visualisation?

Q5: Who posts most frequently with both, #brexit and #stay tags?

21

Page 22: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

• Open a new workbook. Connect->text file: brexit.cvs, stay.csv

• Inner join, specify attribute to join (Userid)

• Open a worksheet, add a title, e.g. “Users posting with #Brexit and #Stay hashtags”. Columns: Name, Rows: CNTD(Tweetid), CNTD(Twetid) (Stay.csv)

• Horizontal bars, sort user name by field “Twitterid, count distinct”

• What does this visualisation show?

Q5: Who posts most frequently with both, #brexit and #stay tags? (Practical)

22

Page 23: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Q5: Example visualisation with horizontal bars

23What can be observed in the visualisation?

Page 24: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Tableau Practice

• Use Tableau to create a visualisation for another Twitter dataset for 2-3 questions from the list above, or from your group discussion. Up to 15 Minutes.

• Load Twitter dataset(s).

– http://www.edshare.soton.ac.uk/17977/

• Check datatypes of the attributes.

• Apply visualisation. Check if post-processing is required.

• Discuss what the visualisation shows.

24

Page 25: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Summary Part I

• Starting from a dataset description, we identified several research questions.

• For each of these questions, we identified relevant attributes and operators.

• Then, we selected and created a visualisation using Tableu.

• We discussed the results.

• We will continue looking at data exploration with other visualisation tools (Raw and TagCrowd).

• Break (10 minutes)

25

Page 26: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Data Exploration with Raw

• Raw is an open web app with a simple interface.

• Open, customizable, and free to download and modify.

• Raw lets users create vector-based data visualizations.

• Data can be uploaded and exported as an SVG or PNG and embedded in your webpage.

• http://app.raw.densitydesign.org/

26

Page 27: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Raw Practice

• Create a visualisation of a twitter dataset using raw.

– Copy a subset of data to the web interface.

– Select a visualisation

– Configure the visualisation

• http://app.raw.densitydesign.org/

27

Page 28: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Raw Practice

28

• Example

Page 29: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Raw Practice

29

• Example

– 17 rows of the #brexit dataset (brexit_small_example_raw.txt)

– Clusters, colours and labels are mapped to the geostring

– What does this visualisation show?

Page 30: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Raw Practice

30

#brexit subset, 17 rows with non-zero

geostrings and likes (“brexit_with_locations_and_likes.txt”)

– Clusters and colours are mapped to the geostring, size is the number of likes, labels are tweets.

– What does this visualisation show?

“Omg I still don't know which way to vote tomorrow #vote #VoteLeave#VoteStay #EUreferendum#EUFacts#LabourInForBritain#LeaveCampaign”

Page 31: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Raw Practice

• X-axis: date

• Y-axis: number of likes

31

Page 32: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Raw Practice

32

• Create your own visualisation (s) of a (subset of) a twitter dataset with raw http://app.raw.densitydesign.org/

• Try other visualisations available in raw to:

• To answer a research question

• To explore the dataset

• If needed for the visualisation, adjust the input data

• 10 minutes. – Discuss the resulting visualisation(s) with your

neighbour. What does the visualisation show? Compare to traditional charts.

• 5 minutes.

Page 33: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Visualising Text with TagCrowd

• Tag clouds can provide a quick view on the frequent terms in the text collection

• Visualise tweet text with different tags using TagCrowdhttp://tagcrowd.com/ Vary configurations

• Compare and discuss visualisations of different datasets.

• 10 minutes in groups.

33

Page 34: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Summary: Designing a Visualisation

• Identify example research questions to be answered with visualisations / or the goal of data exploration.

• Identify relevant attributes and operators and select suitable visualisation(s) for each research question.

• Select suitable tools. Dependent on the tool functionality, specific data preparation may be required.

• Create visualisations for the identified research questions using selected tools.

• Critically analyse the result. What can be observed? Could the research question be answered?

34

Page 35: Data Visualisation: Other data visualisation tools & Twitter data · 2017-06-06 · • LO1: Identify research questions that can be answered with visualisations at the example of

Questions, Comments, Feedback?

Thank you for your attention

Contact:

Elena Demidova

[email protected]

35