Code
# install.packages("pacman") # if needed
::p_load(tidyverse,
pacman
janitor,
languageR)<- lexdec |>
df_lexdec clean_names()
ggplot2
themesThis post is inspired by Albert Rapp’s blogpost on the same topic, which goes into more detail about how to build a plot using ggplot2
.
Creating a customised theme can help ensure all plots in a document have identical formatting. All you need to do is create a function that takes any theme()
specifications.
First, let’s see what a basic plot looks like without any formatting:
fig_lexdec <-
df_lexdec |>
arrange(correct) |>
ggplot() +
aes(x = frequency, y = rt,
fill = correct) +
facet_grid(.~native_language) +
labs(
title = "Log reaction time by log word frequency for native (English) and non-native (Other) speakers",
x = "Frequency (log)",
y = "Reaction time (log)"
) +
geom_point(shape = 21, position = position_jitter(.1), alpha = .8, size = 2, colour = "grey10")
Below I create a custom theme called my_theme
, which:
theme_bw()
Now we can add my_theme()
to plots:
We can also set our custom theme (or any other built-in theme, such as theme_minimal()
) as the default theme for all plots (so it doesn’t need to be called for every theme). Now when I produce plots
Now when we generate any plot, it will use my_theme()
by default.
In reading research, we often want to look at the reading time across sentence regions. Typically I would do this using facet_wrap()
or facet_grid()
from ggplot2
. However, during we might want to instead produce separate plots per region, item, participant, etc., for example during exploratory data analysis This can result in repeating code multiple times in order to produce plots for each region. Alternatively, we create a function to produce multiple plots at once.
The code below is adapted for ggplot2
from Vivian Peng’s tutorial, which uses plotly
and the penguins
data set from palmerpenguins
. All code not specific to ggplot2
is therefore identical to that in her tutorial.
First, we generate a plot for a single neologism from the selfPacedReadingHeid
dataset from the languageR
package.
# generate a scatterplot for the neologism 'blusbaarheid'
df |>
filter(word == "blusbaarheid") |>
arrange(condition) |>
ggplot() +
aes(x = rt4words_back, y = rt,
fill = condition) +
# facet_grid(.~native_language) +
labs(
title = "Log reaction time by log root frequency for the neologism 'blusbaarheid'",
x = "i-4 Reaction time (log)",
y = "Reaction time (log)"
) +
geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10")
This function will use the same code that generated our plot above. However, instead of filtering for word == "blusbaarheid"
, it will be word == word_label
.
plot_fx <- function(word_label){
df |>
filter(word == word_label) |>
arrange(condition) |>
ggplot() +
aes(x = rt4words_back, y = rt,
fill = condition) +
# facet_grid(.~native_language) +
labs(
title = paste0("Neologism '", word_label, "'"),
x = "i-4 Reaction time (log)",
y = "Reaction time (log)"
) +
geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10")
}
Now we can specify the word_label
directly as an argument of the function we’ve created:
Or, even better, we can create a list of our plots.
# Now you have a list of plots - one for each neologism
# You can see the plots by changing the value within the square brackets. Let's see the plot for the 31st plot:
plot_list[[31]]
And now, let’s plot 2 neologisms of our choosing side-by-side using the patchwork
package.
Of course, if we already knew ahead of time that we wanted to print only these 2 plots, we could’ve first filtered the dataset to contain only these 2 words, and then produced the two plots using facet_wrap()
. Whether or not this method is more straightforward depends on what exactly you want to do.
---
output:
html:
code-fold: true
---
# Custom `ggplot2` themes {#sec-plots}
## Custom theme
This post is inspired by Albert Rapp's [blogpost](https://alberts-newsletter.beehiiv.com/p/ggplot-theme) on the same topic, which goes into more detail about how to build a plot using `ggplot2`.
Creating a customised theme can help ensure all plots in a document have identical formatting. All you need to do is create a function that takes any `theme()` specifications.
First, let's see what a basic plot looks like without any formatting:
```{r}
#| echo: true
#| code-fold: true
# install.packages("pacman") # if needed
pacman::p_load(tidyverse,
janitor,
languageR)
df_lexdec <- lexdec |>
clean_names()
```
```{r}
#| echo: true
#| code-fold: true
fig_lexdec <-
df_lexdec |>
arrange(correct) |>
ggplot() +
aes(x = frequency, y = rt,
fill = correct) +
facet_grid(.~native_language) +
labs(
title = "Log reaction time by log word frequency for native (English) and non-native (Other) speakers",
x = "Frequency (log)",
y = "Reaction time (log)"
) +
geom_point(shape = 21, position = position_jitter(.1), alpha = .8, size = 2, colour = "grey10")
```
```{r}
#| echo: true
fig_lexdec
```
### Create theme
Below I create a custom theme called `my_theme`, which:
+ uses `theme_bw()`
+ removes x-axis title
+ sets plot title text size to 10
+ legend is:
+ below the plot
+ has no title
+ displays points as .4cm
+ displays lines as 1 cm long
+ all legend text is size 8
+ padding around the legend is minimised
```{r}
# echo: true
my_theme <- function() {
theme_bw() +
theme(
# all text: size 8
axis.text = element_text(size=8),
# plot title: size 10
plot.title=element_text(size=10),
# legend
legend.position = "bottom",
legend.title = element_blank(),
# legend.key.size = unit(.4, 'cm'),
# legend.key.width = unit(1,"cm"),
legend.text = element_text(size = 8),
legend.box="vertical",
legend.margin=margin(0,0,0,0),
legend.box.margin=margin(-5,0,-5,-5)
)
}
```
Now we can add `my_theme()` to plots:
```{r}
# echo: true
fig_lexdec + my_theme()
```
### Set global theme
We can also set our custom theme (or any other built-in theme, such as `theme_minimal()`) as the default theme for all plots (so it doesn't need to be called for every theme). Now when I produce plots
```{r}
# echo: true
theme_set(my_theme())
```
Now when we generate any plot, it will use `my_theme()` by default.
```{r}
#| echo: true
fig_lexdec
```
## Function for multiple plots
In reading research, we often want to look at the reading time across sentence regions. Typically I would do this using `facet_wrap()` or `facet_grid()` from `ggplot2`. However, during we might want to instead produce separate plots per region, item, participant, etc., for example during exploratory data analysis This can result in repeating code multiple times in order to produce plots for each region. Alternatively, we create a function to produce multiple plots at once.
The code below is adapted for `ggplot2` from Vivian Peng's [tutorial](https://towardsdatascience.com/how-to-write-a-custom-function-to-generate-multiple-plots-in-r-7ad24637e0dd), which uses `plotly` and the `penguins` data set from `palmerpenguins`. All code not specific to `ggplot2` is therefore identical to that in her tutorial.
### Generate plot
First, we generate a plot for a single neologism from the `selfPacedReadingHeid` dataset from the `languageR` package.
```{r}
#| echo: true
#| code-fold: true
# install.packages("pacman") # if needed
pacman::p_load(tidyverse,
languageR,
janitor,
plotly)
```
```{r}
#| echo: true
#| code-fold: true
# store selfPacedReadingHeid as 'df', to keep it short and sweet (you'll want to use a more meaningful name in real analyses)
# I also use clean_names() from the janitor package to keep variable names tidy
df <- selfPacedReadingHeid |> clean_names()
```
```{r}
#| echo: true
#| code-fold: true
# generate a scatterplot for the neologism 'blusbaarheid'
df |>
filter(word == "blusbaarheid") |>
arrange(condition) |>
ggplot() +
aes(x = rt4words_back, y = rt,
fill = condition) +
# facet_grid(.~native_language) +
labs(
title = "Log reaction time by log root frequency for the neologism 'blusbaarheid'",
x = "i-4 Reaction time (log)",
y = "Reaction time (log)"
) +
geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10")
```
### Create function
This function will use the same code that generated our plot above. However, instead of filtering for `word == "blusbaarheid"`, it will be `word == word_label`.
```{r}
#| echo: true
plot_fx <- function(word_label){
df |>
filter(word == word_label) |>
arrange(condition) |>
ggplot() +
aes(x = rt4words_back, y = rt,
fill = condition) +
# facet_grid(.~native_language) +
labs(
title = paste0("Neologism '", word_label, "'"),
x = "i-4 Reaction time (log)",
y = "Reaction time (log)"
) +
geom_point(shape = 21, alpha = .8, size = 2, colour = "grey10")
}
```
Now we can specify the `word_label` directly as an argument of the function we've created:
```{r}
#| echo: true
# Run function for word "blusbaarheid"
plot_fx("blusbaarheid")
```
Or, even better, we can create a list of our plots.
```{r}
#| echo: true
# Create an empty list for all your plots
plot_list = list()
```
```{r}
#| echo: true
# Run the plotting function for all the species
for (i in unique(df$word)){
plot_list[[i]] = plot_fx(i)
}
```
```{r}
#| echo: true
# Now you have a list of plots - one for each neologism
# You can see the plots by changing the value within the square brackets. Let's see the plot for the 31st plot:
plot_list[[31]]
```
And now, let's plot 2 neologisms of our choosing side-by-side using the `patchwork` package.
```{r}
#| echo: true
#| code-fold: true
pacman::p_load(patchwork)
(plot_list[["tilbaarheid"]] +
plot_list[["blusbaarheid"]]) +
plot_annotation(title = "Log reaction times by log rot frequency for neologisms") +
plot_layout(guides = "collect")
```
Of course, if we already knew ahead of time that we wanted to print only these 2 plots, we could've first filtered the dataset to contain only these 2 words, and then produced the two plots using `facet_wrap()`. Whether or not this method is more straightforward depends on what exactly you want to do.