Custom `ggplot2` themes

Custom theme

This post is inspired by Albert Rapp’s blogpost on the same topic, which goes into more detail about how to build a plot using ggplot2.

Creating a customised theme can help ensure all plots in a document have identical formatting. All you need to do is create a function that takes any theme() specifications.

First, let’s see what a basic plot looks like without any formatting:

Code

# install.packages("pacman") # if needed
pacman::p_load(tidyverse,
               janitor,
               languageR)
df_lexdec <- lexdec |> 
  clean_names()

Code

fig_lexdec <-
  df_lexdec |>
  arrange(correct) |> 
  ggplot() +
  aes(x = frequency, y = rt,
      fill = correct) +
  facet_grid(.~native_language) +
  labs(
    title = "Log reaction time by log word frequency for native (English) and non-native (Other) speakers",
    x = "Frequency (log)",
    y = "Reaction time (log)"
  ) +
  geom_point(shape = 21, position = position_jitter(.1), alpha = .8, size = 2, colour = "grey10")

fig_lexdec

Create theme

Below I create a custom theme called my_theme, which:

uses theme_bw()
removes x-axis title
sets plot title text size to 10
legend is:
- below the plot
- has no title
- displays points as .4cm
- displays lines as 1 cm long
- all legend text is size 8
- padding around the legend is minimised

Now we can add my_theme() to plots:

Set global theme

We can also set our custom theme (or any other built-in theme, such as theme_minimal()) as the default theme for all plots (so it doesn’t need to be called for every theme). Now when I produce plots

Now when we generate any plot, it will use my_theme() by default.

fig_lexdec

Function for multiple plots

In reading research, we often want to look at the reading time across sentence regions. Typically I would do this using facet_wrap() or facet_grid() from ggplot2. However, during we might want to instead produce separate plots per region, item, participant, etc., for example during exploratory data analysis This can result in repeating code multiple times in order to produce plots for each region. Alternatively, we create a function to produce multiple plots at once.

The code below is adapted for ggplot2 from Vivian Peng’s tutorial, which uses plotly and the penguins data set from palmerpenguins. All code not specific to ggplot2 is therefore identical to that in her tutorial.

Generate plot

First, we generate a plot for a single neologism from the selfPacedReadingHeid dataset from the languageR package.

Code

# install.packages("pacman") # if needed
pacman::p_load(tidyverse,
               languageR,
               janitor,
               plotly)

Code

# store selfPacedReadingHeid as 'df', to keep it short and sweet (you'll want to use a more meaningful name in real analyses)
# I also use clean_names() from the janitor package to keep variable names tidy
df <- selfPacedReadingHeid |> clean_names()

Code

# generate a scatterplot for the neologism 'blusbaarheid'
df |>
  filter(word == "blusbaarheid") |> 
  arrange(condition) |> 
  ggplot() +
  aes(x = rt4words_back, y = rt,
      fill = condition) +
  # facet_grid(.~native_language) +
  labs(
    title = "Log reaction time by log root frequency for the neologism 'blusbaarheid'",
    x = "i-4 Reaction time (log)",
    y = "Reaction time (log)"
  ) +
  geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10")

Create function

This function will use the same code that generated our plot above. However, instead of filtering for word == "blusbaarheid", it will be word == word_label.

plot_fx <- function(word_label){
  df |>
  filter(word == word_label) |> 
  arrange(condition) |> 
  ggplot() +
  aes(x = rt4words_back, y = rt,
      fill = condition) +
  # facet_grid(.~native_language) +
  labs(
    title = paste0("Neologism '", word_label, "'"),
    x = "i-4 Reaction time (log)",
    y = "Reaction time (log)"
  ) +
  geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10") 
  }

Now we can specify the word_label directly as an argument of the function we’ve created:

# Run function for word "blusbaarheid"
plot_fx("blusbaarheid")

Or, even better, we can create a list of our plots.

# Create an empty list for all your plots
plot_list = list()

# Run the plotting function for all the species
for (i in unique(df$word)){
    plot_list[[i]] = plot_fx(i)
}

# Now you have a list of plots - one for each neologism
# You can see the plots by changing the value within the square brackets. Let's see the plot for the 31st plot:
plot_list[[31]]

And now, let’s plot 2 neologisms of our choosing side-by-side using the patchwork package.

Code

pacman::p_load(patchwork)

(plot_list[["tilbaarheid"]] +
    plot_list[["blusbaarheid"]]) +
  plot_annotation(title = "Log reaction times by log rot frequency for neologisms") +
  plot_layout(guides = "collect")

Of course, if we already knew ahead of time that we wanted to print only these 2 plots, we could’ve first filtered the dataset to contain only these 2 words, and then produced the two plots using facet_wrap(). Whether or not this method is more straightforward depends on what exactly you want to do.

--- output: html: code-fold: true --- # Custom `ggplot2` themes {#sec-plots} ## Custom theme This post is inspired by Albert Rapp's [blogpost](https://alberts-newsletter.beehiiv.com/p/ggplot-theme) on the same topic, which goes into more detail about how to build a plot using `ggplot2`. Creating a customised theme can help ensure all plots in a document have identical formatting. All you need to do is create a function that takes any `theme()` specifications. First, let's see what a basic plot looks like without any formatting: ```{r} #| echo: true #| code-fold: true # install.packages("pacman") # if needed pacman::p_load(tidyverse, janitor, languageR) df_lexdec <- lexdec |> clean_names() ``` ```{r} #| echo: true #| code-fold: true fig_lexdec <- df_lexdec |> arrange(correct) |> ggplot() + aes(x = frequency, y = rt, fill = correct) + facet_grid(.~native_language) + labs( title = "Log reaction time by log word frequency for native (English) and non-native (Other) speakers", x = "Frequency (log)", y = "Reaction time (log)" ) + geom_point(shape = 21, position = position_jitter(.1), alpha = .8, size = 2, colour = "grey10") ``` ```{r} #| echo: true fig_lexdec ``` ### Create theme Below I create a custom theme called `my_theme`, which: + uses `theme_bw()` + removes x-axis title + sets plot title text size to 10 + legend is: + below the plot + has no title + displays points as .4cm + displays lines as 1 cm long + all legend text is size 8 + padding around the legend is minimised ```{r} # echo: true my_theme <- function() { theme_bw() + theme( # all text: size 8 axis.text = element_text(size=8), # plot title: size 10 plot.title=element_text(size=10), # legend legend.position = "bottom", legend.title = element_blank(), # legend.key.size = unit(.4, 'cm'), # legend.key.width = unit(1,"cm"), legend.text = element_text(size = 8), legend.box="vertical", legend.margin=margin(0,0,0,0), legend.box.margin=margin(-5,0,-5,-5) ) } ``` Now we can add `my_theme()` to plots: ```{r} # echo: true fig_lexdec + my_theme() ``` ### Set global theme We can also set our custom theme (or any other built-in theme, such as `theme_minimal()`) as the default theme for all plots (so it doesn't need to be called for every theme). Now when I produce plots ```{r} # echo: true theme_set(my_theme()) ``` Now when we generate any plot, it will use `my_theme()` by default. ```{r} #| echo: true fig_lexdec ``` ## Function for multiple plots In reading research, we often want to look at the reading time across sentence regions. Typically I would do this using `facet_wrap()` or `facet_grid()` from `ggplot2`. However, during we might want to instead produce separate plots per region, item, participant, etc., for example during exploratory data analysis This can result in repeating code multiple times in order to produce plots for each region. Alternatively, we create a function to produce multiple plots at once. The code below is adapted for `ggplot2` from Vivian Peng's [tutorial](https://towardsdatascience.com/how-to-write-a-custom-function-to-generate-multiple-plots-in-r-7ad24637e0dd), which uses `plotly` and the `penguins` data set from `palmerpenguins`. All code not specific to `ggplot2` is therefore identical to that in her tutorial. ### Generate plot First, we generate a plot for a single neologism from the `selfPacedReadingHeid` dataset from the `languageR` package. ```{r} #| echo: true #| code-fold: true # install.packages("pacman") # if needed pacman::p_load(tidyverse, languageR, janitor, plotly) ``` ```{r} #| echo: true #| code-fold: true # store selfPacedReadingHeid as 'df', to keep it short and sweet (you'll want to use a more meaningful name in real analyses) # I also use clean_names() from the janitor package to keep variable names tidy df <- selfPacedReadingHeid |> clean_names() ``` ```{r} #| echo: true #| code-fold: true # generate a scatterplot for the neologism 'blusbaarheid' df |> filter(word == "blusbaarheid") |> arrange(condition) |> ggplot() + aes(x = rt4words_back, y = rt, fill = condition) + # facet_grid(.~native_language) + labs( title = "Log reaction time by log root frequency for the neologism 'blusbaarheid'", x = "i-4 Reaction time (log)", y = "Reaction time (log)" ) + geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10") ``` ### Create function This function will use the same code that generated our plot above. However, instead of filtering for `word == "blusbaarheid"`, it will be `word == word_label`. ```{r} #| echo: true plot_fx <- function(word_label){ df |> filter(word == word_label) |> arrange(condition) |> ggplot() + aes(x = rt4words_back, y = rt, fill = condition) + # facet_grid(.~native_language) + labs( title = paste0("Neologism '", word_label, "'"), x = "i-4 Reaction time (log)", y = "Reaction time (log)" ) + geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10") } ``` Now we can specify the `word_label` directly as an argument of the function we've created: ```{r} #| echo: true # Run function for word "blusbaarheid" plot_fx("blusbaarheid") ``` Or, even better, we can create a list of our plots. ```{r} #| echo: true # Create an empty list for all your plots plot_list = list() ``` ```{r} #| echo: true # Run the plotting function for all the species for (i in unique(df$word)){ plot_list[[i]] = plot_fx(i) } ``` ```{r} #| echo: true # Now you have a list of plots - one for each neologism # You can see the plots by changing the value within the square brackets. Let's see the plot for the 31st plot: plot_list[[31]] ``` And now, let's plot 2 neologisms of our choosing side-by-side using the `patchwork` package. ```{r} #| echo: true #| code-fold: true pacman::p_load(patchwork) (plot_list[["tilbaarheid"]] + plot_list[["blusbaarheid"]]) + plot_annotation(title = "Log reaction times by log rot frequency for neologisms") + plot_layout(guides = "collect") ``` Of course, if we already knew ahead of time that we wanted to print only these 2 plots, we could've first filtered the dataset to contain only these 2 words, and then produced the two plots using `facet_wrap()`. Whether or not this method is more straightforward depends on what exactly you want to do.

Custom ggplot2 themes

Custom theme

Create theme

Set global theme

Function for multiple plots

Generate plot

Create function

Custom `ggplot2` themes