Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
DATA VISUALIZATIONDr.Woraphon Yamaka
OVERVIEW
• When you manage multiple content assets, such as social media or a blog, with multiple sources of data, it can get overwhelming. What should you be tracking? What actually matters? How do you visualize and analyze the data so you can extract insights and actionable information?
• More importantly, how can you make reporting more efficient when you're busy working on multiple projects at once?
BASIC PLOT •Column Chart•Bar Graph•Line Graph•Dual Axis Chart•Area Chart•Stacked Bar Graph•Mekko Chart•Pie Chart•Scatter Plot Chart•Bubble Chart•Waterfall Chart•Funnel Chart•Bullet Chart•Heat Map
COLUMN CHART
• A column chart is used to show a comparison among different items, or it can show a comparison of items over time. You could use this format to see the revenue per landing page or customers by close date.
0e+00
1e+05
2e+05
3e+05
1960 1970 1980 1990 2000
year
GD
P
BAR GRAPH
1960
1970
1980
1990
2000
0e+00 1e+05 2e+05 3e+05
GDP
year
1960
1970
1980
1990
2000
year
A bar graph, basically a horizontal column chart, should be used to avoid clutter when one data label is long or if you have more than 10 items to compare. This type of visualization can also be used to display negative numbers.
LINE GRAPH
2000
4000
6000
1960 1970 1980 1990 2000
year
GD
P
http://www.sthda.com/english/wiki/ggplot2-line-plot-quick-start-guide-r-
software-and-data-visualization
A line graph reveals trends or progress over time and can be used to show many different categories of data. You should use it when you chart a continuous data set.
DUAL AXIS CHART
time
X
0 20 40 60 80 100
-2
-1
0
1
2
0
2
4
6
Y
A dual axis chart allows you to plot data using two y-axes and a shared x-axis. It's used with three data sets, one of which is based on a continuous set of data and another which is better suited to being grouped by category. This should be used to visualize a correlation or the lack thereof between these three data sets.
https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html
AREA CHART
An area chart is basically a line chart, but the space between the x-axis and the line is filled with a color or pattern. It is useful for showing part-to-whole relations, such as showing individual sales reps' contribution to total sales for a year. It helps you analyze both overall and individual trend information.
STACKED BAR CHART
• A stacked area chart is the extension of a basic area chart. It displays the evolution of the value of several groups on the same graphic. The values of each group are displayed on top of each other, what allows to check on the same figure the evolution of both the total of a numeric variable, and the importance of each group.
http://t-redactyl.io/blog/2016/01/creating-plots-in-r-using-ggplot2-part-4-
stacked-bar-plots.html
0
10000
20000
30000
1960 1970 1980 1990 2000
year
gdpP
erc
ap
country
Japan
Thailand
BASIC PLOT • Mekko Chart
• Also known as a marimekko chart, this type of graph can compare values, measure each one's composition, and show how your data is distributed across each one.
• It's similar to a stacked bar, except the mekko's x-axis is used to capture another dimension of your values -- rather than time progression, like column charts often do. In the graphic below, the x-axis compares each city to one another.
PIE CHART
• A pie chart shows a static number and how categories represent part of a whole -- the composition of something. A pie chart represents numbers in percentages, and the total sum of all segments needs to equal 100%.
25
50
75
0/100
value
x
group
Child
Female
Male
SCATTER PLOT CHART
• A scatter plot or scattergramchart will show the relationship between two different variables or it can reveal the distribution trends. It should be used when there are many different data points, and you want to highlight similarities in the data set. This is useful when looking for outliers or for understanding the distribution of your data.
http://www.sthda.com/english/wiki/ggplot2-scatter-plots-quick-
start-guide-r-software-and-data-visualization
thai
thai
thai
thai
thai
thai
thai
thai
thai thai
2000
4000
6000
1960 1970 1980 1990 2000
year
gdpP
erc
ap
BUBBLE CHART
• A bubble chart is similar to a scatter plot in that it can show distribution or relationship. There is a third data set, which is indicated by the size of the bubble or circle.
https://www.r-graph-gallery.com/320-the-
basis-of-bubble-plot.html
BASIC PLOT
https://www.data-to-viz.com/graph/stackedarea.html
A waterfall chart should be used to show how an initial value is affected by intermediate values -- either positive or negative -- and resulted in a final value. This should be used to reveal the composition of a number. An example of this would be to showcase how overall company revenue is influenced by different departments and leads to a specific profit number.
FUNNEL CHART
• A funnel chart shows a series of steps and the completion rate for each step. This can be used to track the sales process or the conversion rate across a series of pages or steps.
HEAT MAP
GDP
pop
Life
GDP pop Life
Var1
Var2
0.00
0.25
0.50
0.75
1.00value
A heat map shows the relationship between two items and provides rating information, such as high to low or poor to excellent. The rating information is displayed using varying colors or saturation.
LET PRACTICE
REQUIRED PACKAGES:
# ggplot2install.packages("")
library(ggplot2)
library(dplyr)
library(gapminder)
library(latticeExtra)
COLUMN CHART
# Use the example data from gapminder
data = gapminder
# Prepare data
newdata=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))
# Run plot
ggplot(data=newdata, aes(x=year, y=gdpPercap)) +
geom_bar(stat="identity", fill="steelblue")
0e+00
1e+05
2e+05
3e+05
1960 1970 1980 1990 2000
year
GD
P
BAR GRAPH
1960
1970
1980
1990
2000
0e+00 1e+05 2e+05 3e+05
GDP
year
1960
1970
1980
1990
2000
year
ggplot(data=newdata, aes(x=year, y=gdpPercap,fill=year)) +geom_bar(stat="identity",color="blue")+theme_minimal()+ coord_flip()
LINE GRAPH
ggplot(data=newdata, aes(x=year, y=gdpPercap)) +geom_line(linetype = "dashed",color="red", size=3)+geom_point(color="red", size=3)+labs(title="Thailand GDP",x ="year", y = "Value")
2000
4000
6000
1960 1970 1980 1990 2000
year
Valu
e
Thailand GDP
DUAL AXIS CHART
library(latticeExtra)
obj1 =xyplot(gdpPercap~year,newdata1, type = "l", col=c("steelblue") , lwd=2)obj2 =xyplot(gdpPercap~year,newdata2, type = "l", col=c( "#69b3a2") , lwd=2)# Make the plot with second y axis:doubleYScale(obj1, obj2, add.ylab2 = TRUE, use.style=FALSE )
year
gd
pP
erc
ap
1960 1970 1980 1990 2000
1000
2000
3000
4000
5000
6000
5000
10000
15000
20000
25000
gd
pP
erc
ap
AREA CHART
ggplot(newdata, aes(x=year, y=gdpPercap)) +geom_area( fill="blue", alpha=0.4) +geom_line(color="red", size=2) +geom_point(size=3, color="green") +ggtitle("Thailand GDP")
STACKED BAR CHART
newdata1=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))
newdata2=data.frame(subset(data, country=="Japan"&year>1952&year<2007))
alldata=data.frame(rbind(newdata1,newdata2))
ggplot(aes(y = gdpPercap, x = year), data = alldata)+
geom_col(aes(fill = country ), width = 0.7) 0
10000
20000
30000
1960 1970 1980 1990 2000
year
gdpP
erc
ap
country
Japan
Thailand
PIE CHART
df <- data.frame(
group = c("Male", "Female", "Child"),
value = c(25, 25, 50)
)
ggplot(df, aes(x="", y=value, fill=group))+
geom_bar(width = 1, stat = "identity")+
coord_polar("y", start=0)+
scale_fill_manual(values=c("red", "blue", "green"))
25
50
75
0/100
value
x
group
Child
Female
Male
SCATTER PLOT CHART
ggplot(newdata, aes(x= year, y=gdpPercap)) +
geom_point(size=2, shape=23, color="red")+
geom_text(label="thai")
thai
thai
thai
thai
thai
thai
thai
thai
thai thai
2000
4000
6000
1960 1970 1980 1990 2000
year
gdpP
erc
ap
BUBBLE CHART
ggplot() +
geom_point(data=data.frame(newdata), aes(x=gdpPercap, y=lifeExp, size=pop, color=continent), alpha=0.5) +
scale_size(range = c(.1, 24), name="Population (M)")
BUBBLE CHARTnewdata1=data.frame(subset(data, country=="Thailand"&year>1952&year<2007))
newdata2=data.frame(subset(data, country=="Japan"&year>1952&year<2007))
alldata=data.frame(rbind(newdata1,newdata2))
ggplot() +
geom_point(data=data.frame(alldata), aes(x=gdpPercap, y=lifeExp, size=pop, color= country), alpha=0.5) +
scale_size(range = c(.1, 24), name="Population (M)")
HEAT MAP
GDP
pop
Life
GDP pop Life
Var1
Var2
0.00
0.25
0.50
0.75
1.00value
library(reshape2)Life=newdata$lifeExpGDP =newdata$gdpPercappop =newdata$pop
mydata=cbind(GDP,pop,Life)cormat <- round(cor(mydata),2)head(cormat)melted_cormat <- melt(cormat)
ggplot() + geom_tile(data = melted_cormat, aes(x=Var1, y=Var2,
fill=value))+scale_fill_gradient(low="white", high="red")
SPECIAL PLOT: MAP PLOT
REQUIRED PACKAGES:
# ggplot2install.packages("")
library(ggplot2)
library(ggmap)
library(maps)
library("mapdata")
library("maptools")
library("rworldmap")
library("raster")
library(sp)
PLOT MAP
map('world', 'thai', fill = TRUE, col = "blue")
PLOT MAP
Th <- map_data('world', 'thai')
ggplot() + geom_polygon(data = Th , aes(x=long, y = lat, group = group)) + coord_fixed(1.3)
5
10
15
20
98 100 102 104 106
long
lat
DESIGN MAP# Adding point #https://geocode.xyz/
labs <- data.frame(
long = c(98.979263, 100.523186,99.83594,99.51453 ),
lat = c(18.796143, 13.736717, 19.90728,18.28471),
names = c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang"),
stringsAsFactors = FALSE
)
### Adding bubble
gg1 +
geom_point(data = labs, aes(x = long, y = lat), color = "black", size = 5) +
geom_point(data = labs, aes(x = long, y = lat), color = "yellow", size = 4)
5
10
15
20
98 100 102 104 106
long
lat
COMBINE MAP AND ECONOMIC DATA## Prepare DATA
mapdf <- data.frame(region= c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang"),
value=c(1,2,0.5,1),
stringsAsFactors=FALSE)
# Building Thailand geocode
long = c(98.979263, 100.523186,99.83594,99.51453 )
lat = c(18.796143, 13.736717, 19.90728,18.28471)
Country=c("Thailand")
region=c("Chiang Mai", "Bangkok", "Chiang rai", "Lampang")
GDP=c(2,1,0,1.5)
Thaidata=data.frame(long ,lat,Country,region)
# # Building Thailand geocode + GDP
Thaidata1=data.frame(long ,lat,Country,region,GDP)
COMBINE MAP AND ECONOMIC DATA
## Plot GDP Bankok=2 million Baht , Chiang Mai 1 million Baht
gg1 +
geom_polygon(data = Thaidata, aes(x=long, y = lat, group = Country), fill="grey", alpha=0.3) +
geom_point( data=Thaidata1, aes(x=long, y=lat, size=GDP, color=GDP)) +
scale_size_continuous(range=c(1,12))+
scale_color_viridis(trans="log")
5
10
15
20
98 100 102 104 106
long
lat
GDP
0.0
0.5
1.0
1.5
2.0
1.00
1.25
1.50
1.75
GDP
WORLD MAP
worldMap <- getMap()
world.points <- fortify(worldMap)
world.points$region <- world.points$id
world.df <- world.points[,c("long","lat","group", "region")]
worldmap <- ggplot() +
geom_polygon(data = world.df, aes(x = long, y = lat, group = group)) +
scale_y_continuous(breaks = (-2:2) * 30) +
scale_x_continuous(breaks = (-4:4) * 45)
worldmap
-60
-30
0
30
60
-180 -135 -90 -45 0 45 90 135 180
long
lat
QUIZ
REFERENCES
• https://blog.hubspot.com/marketing/types-of-graphs-for-data-visualization
• Teutonico, D. (2015). ggplot2 Essentials. Packt Publishing Ltd.