We aren’t computer scientists and that’s okay!
We make lots of mistakes. Mistakes are funny. You can laugh with us.
Let’s go, Simba, Pumbaa, and Timon!
Factors are categorical variables that can be either numeric or character.
They can be ordered and are very useful for statistical analysis and for plotting.
You use factor() to create a factor object
In data analysis, it’s super important to visualize your data early and often.
ggplot2 is a plotting package that makes it simple to create complex plots from data frame.
A ggplot has 3 ingredients: base, aesthetics, and layers
We load the package of ggplot2, but the function to make the plot is only ggplot(). No 2.
The aesthetics assign the columns from the data that you want to use in the chart. This is where you set the X-Y variables that determine the dimensions of the plot.
ggplot(airquality, aes(x = factor(Month), y = Ozone)) +
geom_boxplot()
airquality %>%
filter(Month == 8) %>%
ggplot(aes(x = Day, y = Ozone)) +
geom_point()
ggplot2 cheat sheet:
https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf
ggplot2 document:
https://cran.r-project.org/web/packages/ggplot2/index.html
# Change labels
ggplot(airquality, aes(x = factor(Month), y = Ozone)) +
geom_boxplot() +
labs(title = "Boxplot of Ozone in NY, May - Sep 1973",
x = "Month",
y = "Ozone(ppb)")
# add color
ggplot(airquality, aes(x = Day, y = Ozone, color = factor(Month))) +
geom_line()+
geom_point()
# create subplots (rows)
ggplot(airquality, aes(x = Day, y = Ozone, color = factor(Month))) +
geom_line() +
facet_grid(factor(Month) ~ .) # facet into rows based on month
# create subplots (columns)
ggplot(airquality, aes(x = Day, y = Ozone, color = factor(Month))) +
geom_line() +
facet_grid( . ~ factor(Month)) # facet into columns based on month
# Add lines
airquality %>%
group_by(Month) %>%
summarise(ozone_avg = mean(Ozone, na.rm = T)) %>%
ggplot(aes(x = factor(Month), y = ozone_avg)) +
geom_col(fill = "steel blue") +
geom_hline(yintercept = 50, color = 'orange', size =2)+ # add horizontal line
scale_y_continuous(breaks = seq(0, 75, by = 10)) +
labs(title = "Monthly ozone plot", x = "Month", y="Ozone(ppb)")
#Themes
airquality %>%
group_by(Month) %>%
summarise(ozone_avg = mean(Ozone, na.rm = T)) %>%
ggplot(aes(x = factor(Month), y = ozone_avg)) +
geom_col(fill = "steel blue") +
geom_hline(yintercept = 50, color = 'orange', size =2)+ # add horizontal line
scale_y_continuous(breaks = seq(0, 75, by = 10)) +
labs(title = "Monthly ozone plot", x = "Month", y="Ozone(ppb)")+
theme_classic()
You’ve made some plots you can be proud of, so let’s learn to save them so we can cherish them forever. There’s a function called ggsave to do just that. How do we ggsave our plots?
# It defaults to saving the last plot that you displayed.
ggsave("./output/plot_name.png")
# You can also save the named plot
my_boxplot <- ggplot(airquality, aes(x = factor(Month), y = Ozone)) +
geom_boxplot() +
labs(title = "Boxplot of Ozone in NY, May - Sep 1973",
x = "Month",
y = "Ozone(ppb)")
ggsave("./output/my_boxplot.png", plot = my_boxplot)
Exercise
Try to make a plot showing the daily ozone in August 2016 at sites 27-137-7001 and 27-137-7554 using the “ozone_samples_demo.csv” file
factor
ggplot()