6
ggplot2 Project Title: Correlations, Trends, and Outliers in ggplot2 This project explains how ggplot2 can serve as an adequate instrument to visualize data; how in a fantastic world, a graph may construct its own identity outside of the rigid roles imposed upon itself by raw data. It discusses how through apt use of simple geometric measures, ggplot2 exposes the limits of data- driven products like Microsoft Excel and constructs a new world in which data may be free from these restrictions. My goal in this project was to quickly show correlations, trends, and outliers of time-based, transactional data.

Correlations, Trends, and Outliers in ggplot2

Embed Size (px)

Citation preview

Page 1: Correlations, Trends, and Outliers in ggplot2

ggplot2Project Title: Correlations, Trends, and Outliers in ggplot2

This project explains how ggplot2 can serve as an adequate instrument to visualize data; how in a fantastic world, a graph may construct its own identity outside of the rigid roles imposed upon itself by raw data. It discusses how through apt use of simple geometric measures, ggplot2 exposes the limits of data-driven products like Microsoft Excel and constructs a new world in which data may be free from these restrictions. My goal in this project was to quickly show correlations, trends, and outliers of time-based, transactional data.

Page 2: Correlations, Trends, and Outliers in ggplot2

ggplot2Project Title: Correlations, Trends, and Outliers in ggplot2

Two examples of the project used for this discussion are located in Figure 1 and Figure 2 of actual images provided to me for direction. You can see that ‘time’ is on the x-axis and ‘value’ is on the y-axis; therefore, I started with the concept of time series analysis in R programming language. Time series analysis is a suitable way to display trends over time particularly when fitted with a linear model (i.e., red line in my plot) highlighting a positive or negative association.

Firstly, I recognized the need for a discrete time variable because I was only looking for one month’s worth of data. ggplot2 handled the transaction date very well using aesthetic mappings that describe how correlations in the data are mapped to visual properties on the x-axis.

Secondly, another requirement was that I show daily transaction amounts per customer basis. Since there were ~850K unique transactions I deemed that it wouldn’t be feasible to plot nearly one million data points. So I used the SQL AVG() function when I pulled the data to look at just the mean per day. And after formatting the transaction amount in RStudio the quantitative variable fit nicely on the y-axis.

Page 3: Correlations, Trends, and Outliers in ggplot2

ggplot2

Figure 1

Page 4: Correlations, Trends, and Outliers in ggplot2

ggplot2

Figure 2

Page 5: Correlations, Trends, and Outliers in ggplot2

ggplot2Project Title: Correlations, Trends, and Outliers in ggplot2

Finally, ggplot2 has many geoms that are like mini-functions that allow you to control your data visualization with great detail and minimal energy. For instance, I used geom_point to highlight outliers in the transaction amount that fell below zero with larger red data points or circles. And I used geom_line to connect all of the data points together.

Overall, I would say ggplot2 is a succinct, comprehensive tool for data visualization when you are dealing with univariate or bivariate analysis and able to show correlations, trends, and outliers with relative ease.

See the final outcome below!

Page 6: Correlations, Trends, and Outliers in ggplot2