39
DATA VISUALIZATION Dr.Woraphon Yamaka

Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

DATA VISUALIZATIONDr.Woraphon Yamaka

Page 2: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

OVERVIEW

• When you manage multiple content assets, such as social media or a blog, with multiple sources of data, it can get overwhelming. What should you be tracking? What actually matters? How do you visualize and analyze the data so you can extract insights and actionable information?

• More importantly, how can you make reporting more efficient when you're busy working on multiple projects at once?

Page 3: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BASIC PLOT •Column Chart•Bar Graph•Line Graph•Dual Axis Chart•Area Chart•Stacked Bar Graph•Mekko Chart•Pie Chart•Scatter Plot Chart•Bubble Chart•Waterfall Chart•Funnel Chart•Bullet Chart•Heat Map

Page 4: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

COLUMN CHART

• A column chart is used to show a comparison among different items, or it can show a comparison of items over time. You could use this format to see the revenue per landing page or customers by close date.

0e+00

1e+05

2e+05

3e+05

1960 1970 1980 1990 2000

year

GD

P

Page 5: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BAR GRAPH

1960

1970

1980

1990

2000

0e+00 1e+05 2e+05 3e+05

GDP

year

1960

1970

1980

1990

2000

year

A bar graph, basically a horizontal column chart, should be used to avoid clutter when one data label is long or if you have more than 10 items to compare. This type of visualization can also be used to display negative numbers.

Page 6: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

LINE GRAPH

2000

4000

6000

1960 1970 1980 1990 2000

year

GD

P

http://www.sthda.com/english/wiki/ggplot2-line-plot-quick-start-guide-r-

software-and-data-visualization

A line graph reveals trends or progress over time and can be used to show many different categories of data. You should use it when you chart a continuous data set.

Page 7: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

DUAL AXIS CHART

time

X

0 20 40 60 80 100

-2

-1

0

1

2

0

2

4

6

Y

A dual axis chart allows you to plot data using two y-axes and a shared x-axis. It's used with three data sets, one of which is based on a continuous set of data and another which is better suited to being grouped by category. This should be used to visualize a correlation or the lack thereof between these three data sets.

https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html

Page 8: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

AREA CHART

An area chart is basically a line chart, but the space between the x-axis and the line is filled with a color or pattern. It is useful for showing part-to-whole relations, such as showing individual sales reps' contribution to total sales for a year. It helps you analyze both overall and individual trend information.

Page 9: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

STACKED BAR CHART

• A stacked area chart is the extension of a basic area chart. It displays the evolution of the value of several groups on the same graphic. The values of each group are displayed on top of each other, what allows to check on the same figure the evolution of both the total of a numeric variable, and the importance of each group.

http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-

stacked-bar-plots.html

0

10000

20000

30000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

country

Japan

Thailand

Page 10: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BASIC PLOT • Mekko Chart

• Also known as a marimekko chart, this type of graph can compare values, measure each one's composition, and show how your data is distributed across each one.

• It's similar to a stacked bar, except the mekko's x-axis is used to capture another dimension of your values -- rather than time progression, like column charts often do. In the graphic below, the x-axis compares each city to one another.

Page 11: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

PIE CHART

• A pie chart shows a static number and how categories represent part of a whole -- the composition of something. A pie chart represents numbers in percentages, and the total sum of all segments needs to equal 100%.

25

50

75

0/100

value

x

group

Child

Female

Male

Page 12: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

SCATTER PLOT CHART

• A scatter plot or scattergramchart will show the relationship between two different variables or it can reveal the distribution trends. It should be used when there are many different data points, and you want to highlight similarities in the data set. This is useful when looking for outliers or for understanding the distribution of your data.

http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-

start-guide-r-software-and-data-visualization

thai

thai

thai

thai

thai

thai

thai

thai

thai thai

2000

4000

6000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

Page 13: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BUBBLE CHART

• A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There is a third data set, which is indicated by the size of the bubble or circle.

https://www.r-graph-gallery.com/320-the-

basis-of-bubble-plot.html

Page 14: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BASIC PLOT

https://www.data-to-viz.com/graph/stackedarea.html

A waterfall chart should be used to show how an initial value is affected by intermediate values -- either positive or negative -- and resulted in a final value. This should be used to reveal the composition of a number. An example of this would be to showcase how overall company revenue is influenced by different departments and leads to a specific profit number.

Page 15: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

FUNNEL CHART

• A funnel chart shows a series of steps and the completion rate for each step. This can be used to track the sales process or the conversion rate across a series of pages or steps.

Page 16: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

HEAT MAP

GDP

pop

Life

GDP pop Life

Var1

Var2

0.00

0.25

0.50

0.75

1.00value

A heat map shows the relationship between two items and provides rating information, such as high to low or poor to excellent. The rating information is displayed using varying colors or saturation.

Page 17: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

LET PRACTICE

Page 18: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

REQUIRED PACKAGES:

# ggplot2install.packages("")

library(ggplot2)

library(dplyr)

library(gapminder)

library(latticeExtra)

Page 19: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

COLUMN CHART

# Use the example data from gapminder

data = gapminder

# Prepare data

newdata=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))

# Run plot

ggplot(data=newdata, aes(x=year, y=gdpPercap)) +

geom_bar(stat="identity", fill="steelblue")

0e+00

1e+05

2e+05

3e+05

1960 1970 1980 1990 2000

year

GD

P

Page 20: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BAR GRAPH

1960

1970

1980

1990

2000

0e+00 1e+05 2e+05 3e+05

GDP

year

1960

1970

1980

1990

2000

year

ggplot(data=newdata, aes(x=year, y=gdpPercap,fill=year)) +geom_bar(stat="identity",color="blue")+theme_minimal()+ coord_flip()

Page 21: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

LINE GRAPH

ggplot(data=newdata, aes(x=year, y=gdpPercap)) +geom_line(linetype = "dashed",color="red", size=3)+geom_point(color="red", size=3)+labs(title="Thailand GDP",x ="year", y = "Value")

2000

4000

6000

1960 1970 1980 1990 2000

year

Valu

e

Thailand GDP

Page 22: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

DUAL AXIS CHART

library(latticeExtra)

obj1 =xyplot(gdpPercap~year,newdata1, type = "l", col=c("steelblue") , lwd=2)obj2 =xyplot(gdpPercap~year,newdata2, type = "l", col=c( "#69b3a2") , lwd=2)# Make the plot with second y axis:doubleYScale(obj1, obj2, add.ylab2 = TRUE, use.style=FALSE )

year

gd

pP

erc

ap

1960 1970 1980 1990 2000

1000

2000

3000

4000

5000

6000

5000

10000

15000

20000

25000

gd

pP

erc

ap

Page 23: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

AREA CHART

ggplot(newdata, aes(x=year, y=gdpPercap)) +geom_area( fill="blue", alpha=0.4) +geom_line(color="red", size=2) +geom_point(size=3, color="green") +ggtitle("Thailand GDP")

Page 24: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

STACKED BAR CHART

newdata1=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))

newdata2=data.frame(subset(data, country=="Japan"&year>1952&year<2007))

alldata=data.frame(rbind(newdata1,newdata2))

ggplot(aes(y = gdpPercap, x = year), data = alldata)+

geom_col(aes(fill = country ), width = 0.7) 0

10000

20000

30000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

country

Japan

Thailand

Page 25: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

PIE CHART

df <- data.frame(

group = c("Male", "Female", "Child"),

value = c(25, 25, 50)

)

ggplot(df, aes(x="", y=value, fill=group))+

geom_bar(width = 1, stat = "identity")+

coord_polar("y", start=0)+

scale_fill_manual(values=c("red", "blue", "green"))

25

50

75

0/100

value

x

group

Child

Female

Male

Page 26: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

SCATTER PLOT CHART

ggplot(newdata, aes(x= year, y=gdpPercap)) +

geom_point(size=2, shape=23, color="red")+

geom_text(label="thai")

thai

thai

thai

thai

thai

thai

thai

thai

thai thai

2000

4000

6000

1960 1970 1980 1990 2000

year

gdpP

erc

ap

Page 27: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BUBBLE CHART

ggplot() +

geom_point(data=data.frame(newdata), aes(x=gdpPercap, y=lifeExp, size=pop, color=continent), alpha=0.5) +

scale_size(range = c(.1, 24), name="Population (M)")

Page 28: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

BUBBLE CHARTnewdata1=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))

newdata2=data.frame(subset(data, country=="Japan"&year>1952&year<2007))

alldata=data.frame(rbind(newdata1,newdata2))

ggplot() +

geom_point(data=data.frame(alldata), aes(x=gdpPercap, y=lifeExp, size=pop, color= country), alpha=0.5) +

scale_size(range = c(.1, 24), name="Population (M)")

Page 29: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

HEAT MAP

GDP

pop

Life

GDP pop Life

Var1

Var2

0.00

0.25

0.50

0.75

1.00value

library(reshape2)Life=newdata$lifeExpGDP =newdata$gdpPercappop =newdata$pop

mydata=cbind(GDP,pop,Life)cormat <- round(cor(mydata),2)head(cormat)melted_cormat <- melt(cormat)

ggplot() + geom_tile(data = melted_cormat, aes(x=Var1, y=Var2,

fill=value))+scale_fill_gradient(low="white", high="red")

Page 30: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

SPECIAL PLOT: MAP PLOT

Page 31: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

REQUIRED PACKAGES:

# ggplot2install.packages("")

library(ggplot2)

library(ggmap)

library(maps)

library("mapdata")

library("maptools")

library("rworldmap")

library("raster")

library(sp)

Page 32: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

PLOT MAP

map('world', 'thai', fill = TRUE, col = "blue")

Page 33: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

PLOT MAP

Th <- map_data('world', 'thai')

ggplot() + geom_polygon(data = Th , aes(x=long, y = lat, group = group)) + coord_fixed(1.3)

5

10

15

20

98 100 102 104 106

long

lat

Page 34: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

DESIGN MAP# Adding point #https://geocode.xyz/

labs <- data.frame(

long = c(98.979263, 100.523186,99.83594,99.51453 ),

lat = c(18.796143, 13.736717, 19.90728,18.28471),

names = c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang"),

stringsAsFactors = FALSE

)

### Adding bubble

gg1 +

geom_point(data = labs, aes(x = long, y = lat), color = "black", size = 5) +

geom_point(data = labs, aes(x = long, y = lat), color = "yellow", size = 4)

5

10

15

20

98 100 102 104 106

long

lat

Page 35: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

COMBINE MAP AND ECONOMIC DATA## Prepare DATA

mapdf <- data.frame(region= c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang"),

value=c(1,2,0.5,1),

stringsAsFactors=FALSE)

# Building Thailand geocode

long = c(98.979263, 100.523186,99.83594,99.51453 )

lat = c(18.796143, 13.736717, 19.90728,18.28471)

Country=c("Thailand")

region=c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang")

GDP=c(2,1,0,1.5)

Thaidata=data.frame(long ,lat,Country,region)

# # Building Thailand geocode + GDP

Thaidata1=data.frame(long ,lat,Country,region,GDP)

Page 36: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

COMBINE MAP AND ECONOMIC DATA

## Plot GDP Bankok=2 million Baht , Chiang Mai 1 million Baht

gg1 +

geom_polygon(data = Thaidata, aes(x=long, y = lat, group = Country), fill="grey", alpha=0.3) +

geom_point( data=Thaidata1, aes(x=long, y=lat, size=GDP, color=GDP)) +

scale_size_continuous(range=c(1,12))+

scale_color_viridis(trans="log")

5

10

15

20

98 100 102 104 106

long

lat

GDP

0.0

0.5

1.0

1.5

2.0

1.00

1.25

1.50

1.75

GDP

Page 37: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

WORLD MAP

worldMap <- getMap()

world.points <- fortify(worldMap)

world.points$region <- world.points$id

world.df <- world.points[,c("long","lat","group", "region")]

worldmap <- ggplot() +

geom_polygon(data = world.df, aes(x = long, y = lat, group = group)) +

scale_y_continuous(breaks = (-2:2) * 30) +

scale_x_continuous(breaks = (-4:4) * 45)

worldmap

-60

-30

0

30

60

-180 -135 -90 -45 0 45 90 135 180

long

lat

Page 38: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

QUIZ

Page 39: Data visualization - wyamaka.files.wordpress.com · software-and-data-visualization A line graph reveals trends or progress over time and can be used to show many different categories

REFERENCES

• https://blog.hubspot.com/marketing/types-of-graphs-for-data-visualization

• Teutonico, D. (2015). ggplot2 Essentials. Packt Publishing Ltd.