Daniela Palleschi
  • D. Palleschi
  • Source Code
  1. Basics
  2. R for Reproducibility
  • Preface
  • Basics
    • Quarto
    • R for Reproducibility
    • UpdateR and (re-)Install Packages
    • Data dictionary
    • Zotero
  • Next level
    • Git and GitHub
    • Docker
    • Quarto and Git Pages
    • Writing
    • Custom ggplot2 themes

On this page

  • R for Reproducibility
    • R Packages for Reproducibile Workflow
  • here package
  • pacman package
  • rbbt package
    • bbt_update_bib() issues
  • renv package
    • Activate or init
    • Snapshot
    • Restore
    • Upgrade
    • Update and Hydrate
    • Updating R
  • usethis package
  • gitcreds package
  1. Basics
  2. R for Reproducibility

R for Reproducibility

R Packages for Reproducibile Workflow

This is an overview of R packages that I use to maintain a reproducible workflow.

here package

The here package uses file path relative to the current working directory (Müller 2020). If you’re working within a RProject (please do so!) then this working directory is the folder containing the .RProj file.

# install here package
install.packages("here")

# read in a csv file
readr::read_csv(here::here("folder", "subfolder", "subsubfolder", "file.csv"))
# or
readr::read_csv(here::here("folder/subfolder/subsubfolder/file.csv"))

pacman package

The pacman package was designed to facilitate loading and installing packages (Rinker and Kurkiewicz 2018). I like to use the p_load() function, which checks if listed packages are already installed. If so, it loads them (as with library()). If not, it installs and then loads them (as long as they’re CRAN packages!). This facilitates sharing your code across machines or package libraries.

# install pacman
install.packages("pacman")

# install and/or load packages
pacman::p_load(here,
               renv,
               kableExtra)

rbbt package

The rbbt package contains an RStudio Addin for better integration of RStudio, Better Bibtex, and Zotero (Dunnington and Wiernik 2023). It can be installed via GitHub:

# install rbbt; remotes must first be installed
remotes::install_github("paleolimbot/rbbt")

You can follow this blog post from Gertjan Verhoeven (May 2021) on integrating RStudio and Zotero for writing in Rmarkdown, assuming only that you have RStudio installed.

Caveat: currently some rbbt functions conflict with Quarto’s cross-references, which also begin with @. For Quarto documents, I mainly use the rbbt Addin to insert Zotero citations while writing using a keyboard shortcut (see the blogpost for tips on how to set this up).

Developer packages and PATs

I was having trouble install rbbt in a new RProject that had renv installed (see below). The error I was receiving was:

Using GitHub PAT from the git credential store.
Error: Failed to install 'unknown package' from GitHub:
  HTTP error 401.
  Bad credentials

Click here to see how I solved the issue.

bbt_update_bib() issues

To be able to create .bib files from an .Rmd or .qmd file, I currently (May 24, 2024) need:

  • Better BibTex (BBT) version 6.7.140
  • Zotero version 6.0.37 (newest pre-release is 7.0.0-beta 81, but doesn’t work with BBT)
  • rbbt version 0.0.0.9000 (the only version currently):
remotes::install_github("paleolimbot/rbbt")

Then, make sure you have a bibliography file set in the YAML (e.g., bibliography: references.bib) that uses the folder cotaining the current script as the filepath (i.e., not the project folder), and run the following line manually in the Console:

  • if you don’t already have a bib file: rbbt::bbt_write_bib("filepath/this_script.qmd")
  • if you already have a bib file: rbbt::bbt_update_bib("filepath/this_script.qmd")

For example, I have an R project with a subfolder teaching_notes, which contains the files teaching_notes.qmd and references.bib which should contains the BibTex formatted references for the citations listed in the .qmd file. The beginning of the script looks like:

---
title: "Teaching notes"
subtitle: "For future planning"
bibliography: "references.bib"
---

```{r}
#| echo: false
#| eval: false
rbbt::bbt_update_bib(here::here("teaching_notes","teaching_notes.qmd"))
```

If I rather have a .bib file one folder higher, I would write ../references.bib (and add an additional ../ for every parent folder).

Resources I used to figure out the problem with bbt_update_bib():

  • Zotero forum thread about Zotero 7 Beta 72 problems
  • Zotero forum thread about Zotero 7 Beta 69 problems
  • rbbt GitHub issue about original error I was receiving

renv package

The renv packages stores your project-local environment (Ushey 2023). That is, it creates a time capsule of the R and package versions that you use within an R Project.

The benefits:

  • your output is not dependent on whichever project version you currently have stored on your machine
    • e.g., if you come back to an analysis after a long time, the output should still be the same even if a certain package version has deprecated a function your analysis uses.
  • makes the project reproducible for collaborators as well, who will likely have some different package versions (or will have missing packages) on their machine

The hard-to-get-your-head-around (imo):

  • with each new R Project with a renv.lock file, you need to install all the necessary packages again (even if they’re on your machine)
    • this is because each renv doesn’t look beyond your project folder!
  • remembering to update the renv.lock file frequently!
Running renv functions

Remember to always run renv functions in the console! You do not want e.g., renv initialising a new renv.lock file every time you render your documents.

Activate or init

  • to initialise a new project-local environment:
renv::init()

Snapshot

  • update renv.lock file
renv::snapshot()

After updating the renv.lock file, remember to commit/push these changes to git (if you’re using it)!

Version control

After updating the renv.lock file (i.e., running renv::snapshot()), remember to commit/push these changes to git (if you’re using git)! Otherwise, your renv.lock file will be outdated compared to your output.

Restore

  • will restore your project to the most recent renv.lock file versions
    • this step follows snapshot(), which updates the renv.lock file
  • very useful after updating R!
renv::restore()

Upgrade

  • to update the renv package:
# upgrade renv version
renv::upgrade()

Update and Hydrate

# updates all packages (stored in renv.lock) in the renv cache
renv::hydrate(update = "all")

# update should have no effect now, but doesn't hurt to check
renv::update()

# now take a snapshot with the updated packages
renv::snapshot()
Version control (repeated)

After updating the renv.lock file (i.e., running renv::snapshot()), remember to commit/push these changes to git (if you’re using git)! Otherwise, your renv.lock file will be outdated compared to your output.

Updating R

After updating R, you may want to also update your renv.lock file. When you open an RRproject that has a lockfile for a previous R version, you will get a message like this in the Console:

# Bootstrapping renv 0.17.3 --------------------------------------------------
* Downloading renv 0.17.3 ... OK
* Installing renv 0.17.3 ... Done!
* Successfully installed and loaded renv 0.17.3.
ℹ Using R 4.4.0 (lockfile was generated with R 4.3.0)
* Project '~/Documents/Academic DP/website/workflow' loaded. [renv 0.17.3]
* This project contains a lockfile, but none of the recorded packages are installed.
* Use `renv::restore()` to restore the project library.
install.packages("renv")
renv::install()

usethis package

The usethis package automates project set-up and development, and is also useful when creating a package (see the package Documentation for more info) (Wickham et al. 2023).

Table 1: Summary of useful commands from the usethis package
command function
git_token_help() print crediential situation
git_sitrep() print more verbose credential information
git_vaccinate() add appropriate files to .gitignore (remember to stage, commit, and push)
browse_github() view project GitHub
browse_project() / _package() visit project-related pages
browse_github_actions() view overview of GitHub actions
git_credentials() print credentials
create_github_token() create a Personal Access Token (PAT)
gitcreds_set() add your PAT

gitcreds package

Has some similar functionality to usethis with regards to using git/GitHub in an R project (Csárdi 2022). See the package GitHub page for more information.

Table 2: Summary of useful commands from the gitcreds package
command function
gitcreds_get() print git/GitHub credentials
gitcreds_set() set/replace credentials
gitcreds_delete() delete credentials

References

Csárdi, Gábor. 2022. Gitcreds: Query ’Git’ Credentials from ’r’. https://CRAN.R-project.org/package=gitcreds.
Dunnington, Dewey, and Brenton M. Wiernik. 2023. Rbbt: R Interface to the Better BiBTex Zotero Connector. https://github.com/paleolimbot/rbbt.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Rinker, Tyler W., and Dason Kurkiewicz. 2018. pacman: Package Management for R. Buffalo, New York. http://github.com/trinker/pacman.
Ushey, Kevin. 2023. Renv: Project Environments. https://CRAN.R-project.org/package=renv.
Wickham, Hadley, Jennifer Bryan, Malcolm Barrett, and Andy Teucher. 2023. Usethis: Automate Package and Project Setup. https://CRAN.R-project.org/package=usethis.
Source Code
# R for Reproducibility {#sec-r4repro}

## R Packages for Reproducibile Workflow

This is an overview of R packages that I use to maintain a reproducible workflow.

```{r, echo = F}
# set global options
knitr::opts_chunk$set(eval = F, # run each code chunk?
                      echo = T, # print code chunk?
                      message = F, # print messages (e.g., warnings)?
                      error = F, # print errors?
                      cache = T # cache? (Session > Restart R to clear cache)
                      ) 
```

# `here` package

The `here` package uses file path relative to the current working directory [@here-package]. If you're working within a RProject (please do so!) then this working directory is the folder containing the `.RProj` file.

```{r}
# install here package
install.packages("here")

# read in a csv file
readr::read_csv(here::here("folder", "subfolder", "subsubfolder", "file.csv"))
# or
readr::read_csv(here::here("folder/subfolder/subsubfolder/file.csv"))
```


# `pacman` package

The `pacman` package was designed to facilitate loading and installing packages [@pacman-package]. I like to use the `p_load()` function, which checks if listed packages are already installed. If so, it loads them (as with `library()`). If not, it installs and then loads them (as long as they're CRAN packages!). This facilitates sharing your code across machines or package libraries.
  
```{r}
# install pacman
install.packages("pacman")

# install and/or load packages
pacman::p_load(here,
               renv,
               kableExtra)
```

# `rbbt` package

The `rbbt` package contains an RStudio Addin for better integration of RStudio, Better Bibtex, and Zotero [@rbbt-package]. It can be installed via GitHub:

```{r}
#| eval: false
# install rbbt; remotes must first be installed
remotes::install_github("paleolimbot/rbbt")
```

You can follow this [blog post](https://gsverhoeven.github.io/post/zotero-rmarkdown-csl/) from Gertjan Verhoeven (May 2021) on integrating RStudio and Zotero for writing in Rmarkdown, assuming only that you have RStudio installed.

Caveat: currently some `rbbt` functions conflict with Quarto's cross-references, which also begin with `@`. For Quarto documents, I mainly use the `rbbt` Addin to insert Zotero citations while writing using a keyboard shortcut (see the blogpost for tips on how to set this up).

::: {.callout-note}
# Developer packages and PATs {.unlisted}
I was having trouble install `rbbt` in a new RProject that had `renv` installed (see below). The error I was receiving was:

```{markdown}
Using GitHub PAT from the git credential store.
Error: Failed to install 'unknown package' from GitHub:
  HTTP error 401.
  Bad credentials
```

[Click here](https://daniela-palleschi.github.io/workflow/git.html#expired-personal-acces-token) to see how I solved the issue.
:::

## `bbt_update_bib()` issues

To be able to create `.bib` files from an `.Rmd` or `.qmd` file, I currently (May 24, 2024) need:

- Better BibTex (BBT) version [6.7.140](https://github.com/retorquere/zotero-better-bibtex/releases/tag/v6.7.140)
- Zotero version [6.0.37](https://www.zotero.org/download/) (newest pre-release is 7.0.0-beta 81, but doesn't work with BBT)
- rbbt version 0.0.0.9000 (the only version currently):

```{r}
remotes::install_github("paleolimbot/rbbt")
```

Then, make sure you have a bibliography file set in the YAML (e.g., `bibliography: references.bib`) that uses the folder cotaining the current script as the filepath (i.e., *not* the project folder), and run the following line manually in the Console: 

- if you don't already have a bib file: `rbbt::bbt_write_bib("filepath/this_script.qmd")`
- if you already have a bib file: `rbbt::bbt_update_bib("filepath/this_script.qmd")`

For example, I have an R project with a subfolder `teaching_notes`, which contains the files `teaching_notes.qmd` and `references.bib` which should contains the BibTex formatted references for the citations listed in the .qmd file. The beginning of the script looks like:

````{markdown}
---
title: "Teaching notes"
subtitle: "For future planning"
bibliography: "references.bib"
---

```{r}
#| echo: false
#| eval: false
rbbt::bbt_update_bib(here::here("teaching_notes","teaching_notes.qmd"))
```
````

If I rather have a `.bib` file one folder higher, I would write `../references.bib` (and add an additional `../` for every parent folder).

Resources I used to figure out the problem with `bbt_update_bib()`:

- [Zotero forum thread about Zotero 7 Beta 72 problems](https://forums.zotero.org/discussion/113570/betterbibtex-not-working-after-the-recent-update-to-zotero-7-beta-72)
- [Zotero forum thread about Zotero 7 Beta 69 problems](https://forums.zotero.org/discussion/113507/upgrade-to-zotero-7-beta-69-56c363521-and-plugin-error)
- [rbbt GitHub issue](https://github.com/paleolimbot/rbbt/issues/47) about original error I was receiving

# `renv` package

The `renv` packages stores your project-local environment [@renv-package]. That is, it creates a time capsule of the R and package versions that you use *within an R Project*.

The benefits:

- your output is not dependent on whichever project version you currently have stored on your machine
  + e.g., if you come back to an analysis after a long time, the output should still be the same even if a certain package version has deprecated a function your analysis uses.
- makes the project *reproducible* for collaborators as well, who will likely have some different package versions (or will have missing packages) on their machine

The hard-to-get-your-head-around (imo):

- with each new R Project with a renv.lock file, you need to install all the necessary packages again (even if they're on your machine)
  + this is because each `renv` doesn't look beyond your project folder!
- remembering to update the `renv.lock` file frequently!

::: {.callout-warning}
## Running `renv` functions

Remember to always run `renv` functions ***in the console***! You do not want e.g., `renv` initialising a new `renv.lock` file every time you render your documents.

:::

## Activate or init

- to initialise a new *project-local environment*:

```{r}
renv::init()
```


## Snapshot

- update `renv.lock` file

```{r}
renv::snapshot()
```

After updating the `renv.lock` file, remember to commit/push these changes to git (if you're using it)!

::: {.callout-tip}
## Version control

After updating the `renv.lock` file (i.e., running `renv::snapshot()`), remember to commit/push these changes to git (if you're using git)! Otherwise, your `renv.lock` file will be outdated compared to your output.

:::

## Restore

- will restore your project to the most recent `renv.lock` file versions
  - this step follows `snapshot()`, which updates the `renv.lock` file
- very useful after updating R!

```{r}
renv::restore()
```

## Upgrade

- to update the `renv` package:

```{r}
# upgrade renv version
renv::upgrade()
```

## Update and Hydrate

```{r}
# updates all packages (stored in renv.lock) in the renv cache
renv::hydrate(update = "all")

# update should have no effect now, but doesn't hurt to check
renv::update()

# now take a snapshot with the updated packages
renv::snapshot()
```

::: {.callout-tip}
## Version control (repeated)

After updating the `renv.lock` file (i.e., running `renv::snapshot()`), remember to commit/push these changes to git (if you're using git)! Otherwise, your `renv.lock` file will be outdated compared to your output.

:::

## Updating R

After updating R, you may want to also update your `renv.lock` file. When you open an RRproject that has a lockfile for a previous R version, you will get a message like this in the Console:

```{markdown}
# Bootstrapping renv 0.17.3 --------------------------------------------------
* Downloading renv 0.17.3 ... OK
* Installing renv 0.17.3 ... Done!
* Successfully installed and loaded renv 0.17.3.
ℹ Using R 4.4.0 (lockfile was generated with R 4.3.0)
* Project '~/Documents/Academic DP/website/workflow' loaded. [renv 0.17.3]
* This project contains a lockfile, but none of the recorded packages are installed.
* Use `renv::restore()` to restore the project library.
```

```{r}
install.packages("renv")
```

```{r}
renv::install()
```




# `usethis` package

The `usethis` package automates project set-up and development, and is also useful when creating a package (see the package [Documentation](https://usethis.r-lib.org/index.html) for more info) [@usethis-package].

```{r}
#| echo: false
#| eval: true
pacman::p_load(
  tidyverse,
  knitr,
  kableExtra
)
```

```{r}
#| echo: false
#| eval: true
#| label: tbl-usethis
#| tbl-cap: "Summary of useful commands from the usethis package"

dplyr::tribble(
  ~"command", ~"function",
  "git_token_help()", "print crediential situation",
  "git_sitrep()", "print more verbose credential information",
  "git_vaccinate()", "add appropriate files to .gitignore (remember to stage, commit, and push)",
  "browse_github()", "view project GitHub",
  "browse_project() / _package()", "visit project-related pages",
  "browse_github_actions()", "view overview of GitHub actions",
  "git_credentials()", "print credentials",
  "create_github_token()", "create a Personal Access Token (PAT)",
  "gitcreds_set()", "add your PAT"
) %>% 
  knitr::kable() %>% 
  kableExtra::kable_styling()
```

# `gitcreds` package

Has some similar functionality to `usethis` with regards to using git/GitHub in an R project [@gitcreds-package]. See the package [GitHub page](https://github.com/r-lib/gitcreds) for more information.

```{r}
#| echo: false
#| eval: true
#| label: tbl-gitcreds
#| tbl-cap: "Summary of useful commands from the gitcreds package"

dplyr::tribble(
  ~"command", ~"function",
  "gitcreds_get()", "print git/GitHub credentials",
  "gitcreds_set()", "set/replace credentials",
  "gitcreds_delete()", "delete credentials"
) %>% 
  knitr::kable() %>% 
  kableExtra::kable_styling()
```