R-Projects

Creating a project-oriented workflow in R

Author

Affiliation

Daniela Palleschi

Leibniz-Zentrum Allgemeine Sprachwissenschaft

Workshop Day 1

Tue Oct 8, 2024

Last Modified

Mon Oct 7, 2024

Topics

Project-oriented workflows
creating an R-Project
project-relative filepaths with the here package

1 Installation requirements

required installations/recent versions of:
- R
  - at least version 4.4.0, “Puppy Cup”
  - check current version with R.version
  - download/update: https://cran.r-project.org/bin/macosx/
- RStudio
  - at least version 2023.12.1.402, “Ocean Storm”
  - Help > Check for updates
  - new install: https://posit.co/download/rstudio-desktop/

2 Project-oriented workflow

Folder structure:
- keeping everything related to a project in one place
- i.e., contained in a single folder, with subfolders as needed
Project-relative working directory
- the project folder should act as your working directory
- all file paths should be relative to this folder

2.1 Folder structure

a core computer literacy skill
- keep your Desktop as empty as possible
- have a sensible folder structure
- avoid mixing subfolders and files
  - i.e., if a folder contains subfolders, ideally it should not contain files

3 R-Projects

in data analysis, using an IDE is beneficial
- e.g., RStudio
most IDEs have their own implementation of a Project
in RStudio, this is the R-Project
- creates a .Rproj file in a project folder
- stores project settings
you can have several R-Projects open simultaneously
- and run several scripts across projects simultaneously
most importantly, R-Projects (can) centralise a specific project’s workflow and file path
to read more about R-Projects, check out Section 6.2: Projects from Wickham et al. (2023; or Ch. 8 - Workflow: Projects in Wickham & Grolemund, 2016)

3.1 Creating a new Project

when?
- whenever you’re starting a new course oR-Project which will use R
why?
- to keep all the relavent materials in one place
where?
- somewhere that makes sense, e.g., a folder called SoSe2024 or Mastersarbeit
how?
- File > New Project > New Directory > New Project > [Directory name] > Create Project

New R-Project

Create a new R-Project for this workshop

File > New Project > New Directory > New Project > [Directory name] > Create Project
make sure you choose a sensible location

3.2 Opening a Project

to open a project, locate its .Rproj file and double-click
or if you’re already in RStudio, you can use the Project (None) drop-down (top right)

3.3 Adding a README file

File > New File > Markdown File (not R Markdown!)
- add some text describing the purpose of this project
- include your name, the date
- use Markdown formatting (e.g., # for headings, *italics*, **bold**)
save as README.md in youR-Project directory

3.4 Global RStudio options

Figure 3: RStudio settings for reproducibility

Tools > Global Options
- Workspace: Restore .RData into workspace at startup: NO
- Save workspace to .RData on exit: Never
this will ensure that you are always starting with a clean slate
- and that your code is not dependent on some pacakge or object you created in another session
this is also how RMarkdown and Quarto scripts run
- they start with an empty environment and run the script linearly

Global settings

Change your Global Options so that

Workspace: Restore .RData into workspace at startup: NO
Save workspace to .RData on exit: Never

3.5 Identifying your R-Project

there are a ways to check which (if any) R-Project you’re in
- there are 6 differences between Figure 4 and Figure 5
- which is in an R-Project session?

Spot the differences
Show the differences

Figure 6: How to tell if you’re in a project

3.6 Folder structure

some folders you’ll typically want to have:
- data: containing your dataset(s)
- scripts (or analyses, etc.): containing any analysis scripts
- manuscript: containing any write-ups of your results
- materials: containing relevant experiment materials (e.g., stimuli)
let’s just create the first 2 (data and scripts)

`data/`

do you have “raw”, i.e., pre-processed data?
- if so, you might want to create a raw sub-folder
- and any other relevant sub-folders (e.g., processed or tidy)
download the online_cleaned.csv dataset from the GitHub or OSF repo from Ćwiek et al. (2021)
- or, move a dataset of your own to this folder
save the file as cwiek_2021-online_cleaned.csv

description of data collection:

In an online experiment with listeners of 25 different languages (from nine language families), participants listened to the 90 vocalizations (three for each of the 30 meanings), and for each, guessed its intended meaning from six written alternatives

– Ćwiek et al. (2021)

you could also download the data directly from GitHub in R:

write.csv(
  file = "data/cwiek_2021-online_cleaned.csv",
  read.csv("https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv")
  )

`scripts/`

try to create a single script for each “product”
- e.g., anonymised data, ‘cleaned’ data, data exploration, visualisation, analyses, etc.
you can create sub-folders as the project develops and move scripts around
- for now, let’s create a new script to take a look at our data

New script

Create a new script:

File > New File > Choose your preferred script type
Save it in your scripts/ folder: File > Save as...

Load in the data

load in the data however you normally would
- e.g., read.csv(), readr::read_csv(), …

Exercise: mini-Code Review

R-Project template

Download the R-Project template at https://osf.io/ctmwj/
Open (or switch to) rproject-template.Rproj
Inspect the folder structure and the files.
Look at the scripts/ folder. Is it clear which scripts should be run first?
Try running 02-visualisation.R first. Do you encounter any problems?

4 `here`-package

here package (Müller, 2020) enables file referencing
- avoids the use of setwd()

4.1 The problem with `setwd()`

If the first line of your R script is

setwd("C:\Users\jenny\path\that\only\I\have")

I will come into your office and SET YOUR COMPUTER ON FIRE🔥.

— Jenny Bryan

setwd() depends on your entire machine’s folder structure
setwd() breaks when you
- send youR-Project folder to a collaborator
- make your analyses open
- change the location of youR-Project folder
using slashes is also dependent on your operating system

trying to use somebody else’s (or your former) folder path will result in a warning message like:

Error in setwd("/Users/danielapalleschi/Documents/R/rproject-template") : cannot change working directory

4.2 The benefit of `here()`

uses the top-level directory of your Project as the working directory
- meaning we never need to specify the path to our project folder relative to our current higher-level folder structure
can separate folder names with a comma
- meaning it doesn’t matter if the original code was written on a Mac or a Windows machine

here

In your R Project, load the cwiek_2021-online_cleaned.csv data using here

Install here (if needed; e.g., install.packages("here"))
Load here at the beginning of your package
- or use here:: before calling a function
Use the here() function to load in your data
Inspect the dataset however you usually would (e.g., summary(), names(), etc.)
Save your script

4.3 `here::here()`

install package

In the Console

install.packages("here")

load package and call the here function

# load package
library(here)

# read in data
df_icon <- read.csv(here("data", "cwiek_2021-online_cleaned.csv"))

or directly call the here function without loading the package

# read in data without loading here
df_icon <- read.csv(here::here("data", "cwiek_2021-online_cleaned.csv"))

note that I stored the data with the prefix df_
- df stands for dataframe
I recommend using object-type defining prefixes for all objects in your Environment
- e.g., fit_ for models, fig_ for figures, sum_ for summaries, tbl_ for tables, etc.

Reproduce your analysis

Perform some data exploration (e.g., with names(), summary(), dplyr::glimpse(), whatever you typically do)
Save your script, then close RStudio/your R-Project.
Re-open the project. Can you re-run the script?

Topics 🏁

Project-oriented workflows ✅
creating an R-Project ✅
project-relative filepaths with the here package ✅

References

Bryan, J., Hester, J., Pileggi, S., & Aja, D. E. (n.d.). What They Forgot to Teach You About R. Retrieved May 6, 2024, from https://rstats.wtf/

Bryan, J., & TAs, T. S. 545. (n.d.). R Basics and workflows. In STAT 545 Course materials. Retrieved May 6, 2024, from https://stat545.com/

Ćwiek, A., Fuchs, S., Draxler, C., Asu, E. L., Dediu, D., Hiovain, K., Kawahara, S., Koutalidis, S., Krifka, M., Lippus, P., Lupyan, G., Oh, G. E., Paul, J., Petrone, C., Ridouane, R., Reiter, S., Schümchen, N., Szalontai, Á., Ünal-Logacev, Ö., … Perlman, M. (2021). Novel vocalizations are understood across cultures. Scientific Reports, 11(1), 10108. https://doi.org/10.1038/s41598-021-89445-4

Müller, K. (2020). Here: A Simpler Way to Find Your Files (Version 1.0.1). https://CRAN.R-project.org/package=here

Using RStudio Projects. (2024, April 16). Posit Support. https://support.posit.co/hc/en-us/articles/200526207-Using-RStudio-Projects

Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). https://r4ds.hadley.nz/

Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and model data. " O’Reilly Media, Inc.".

--- title: "R-Projects" subtitle: "Creating a project-oriented workflow in R" author: "Daniela Palleschi" institute: Leibniz-Zentrum Allgemeine Sprachwissenschaft lang: en date: 2024-10-08 date-format: "ddd MMM D, YYYY" date-modified: last-modified language: title-block-published: "Workshop Day 1" title-block-modified: "Last Modified" format: html: output-file: R-Projects.html number-sections: true number-depth: 2 toc: true code-overflow: wrap code-tools: true embed-resources: false pdf: output-file: R-Projects.pdf toc: true number-sections: false colorlinks: true code-overflow: wrap revealjs: footer: "R-Projects and {here}" output-file: R-Projects-slides.html editor_options: chunk_output_type: console bibliography: ../bibs/RProjects.bib execute: echo: false --- ```{r} #| echo: false #| eval: false rbbt::bbt_update_bib(here::here("pages", "RProjects.qmd")) ``` # Topics {.unnumbered .unlisted} - Project-oriented workflows - creating an R-Project - project-relative filepaths with the `here` package # Installation requirements - required installations/recent versions of: - R - at least version `4.4.0`, "Puppy Cup" - check current version with `R.version` - download/update: <https://cran.r-project.org/bin/macosx/> - RStudio - at least version `2023.12.1.402`, "Ocean Storm" - Help \> Check for updates - new install: <https://posit.co/download/rstudio-desktop/> # Project-oriented workflow {data-stack-name="Project-oriented workflow"} 1. Folder structure: + keeping everything related to a project in one place + i.e., contained in a single folder, with subfolders as needed 2. Project-relative working directory + the project folder should act as your working directory + all file paths should be relative to this folder ## Folder structure - a core computer literacy skill + keep your Desktop as empty as possible + have a sensible folder structure + avoid mixing subfolders and files + i.e., if a folder contains subfolders, ideally it should not contain files # R-Projects {data-stack-name="R-Projects"} - in data analysis, using an IDE is beneficial + e.g., RStudio - most IDEs have their own implementation of a Project - in RStudio, this is the R-Project + creates a `.Rproj` file in a project folder + stores project settings - you can have several R-Projects open simultaneously + and run several scripts across projects simultaneously - most importantly, R-Projects (can) centralise a specific project's workflow and file path - to read more about R-Projects, check out [Section 6.2: Projects](https://r4ds.hadley.nz/workflow-scripts.html#projects) from @wickham_r_2023 [or [Ch. 8 - Workflow: Projects](https://r4ds.had.co.nz/workflow-projects.html) in @wickham_r_2016] ## Creating a new Project - when? + whenever you're starting a new course oR-Project which will use R - why? + to keep all the relavent materials in one place - where? + somewhere that makes sense, e.g., a folder called `SoSe2024` or `Mastersarbeit` - how? + `File > New Project > New Directory > New Project > [Directory name] > Create Project` ### {.unnumbered .unlisted} ::: {.callout-tip} # New R-Project Create a new R-Project for this workshop + `File > New Project > New Directory > New Project > [Directory name] > Create Project` + make sure you choose a sensible location ::: ## Opening a Project - to open a project, locate its `.Rproj` file and double-click - or if you're already in RStudio, you can use the `Project (None)` drop-down (top right) :::: {.columns} ::: {.column width="50%"} ```{r} #| label: fig-click-open #| fig-cap: Double-click `.Rproj` #| out-width: "80%" magick::image_read(here::here("media", "rstudio_click_open.png")) ``` ::: ::: {.column width="50%"} ```{r} #| label: fig-project-open #| fig-cap: Open from RStudio #| out-width: "80%" magick::image_read(here::here("media", "rstudio_project_open.png")) ``` ::: :::: ## Adding a README file - `File > New File > Markdown File` (*not* R Markdown!) + add some text describing the purpose of this project + include your name, the date + use Markdown formatting (e.g., `#` for headings, `*italics*`, `**bold**`) - save as `README.md` in youR-Project directory ## Global RStudio options :::: {.columns} ::: {.column width="50%"} ```{r} #| label: fig-rstudio-settings #| fig-cap: RStudio settings for reproducibility #| out-width: "80%" magick::image_read(here::here("media", "RStudio_global-options.png")) ``` ::: ::: {.column width="50%"} - `Tools > Global Options` + **Workspace**: Restore .RData into workspace at startup: NO + Save workspace to .RData on exit: Never - this will ensure that you are always starting with a clean slate + and that your code is not dependent on some pacakge or object you created in another session - this is also how RMarkdown and Quarto scripts run + they start with an empty environment and run the script linearly ::: :::: ## {.unnumbered .unlisted} ::: {.callout-tip} ## Global settings Change your Global Options so that + **Workspace**: Restore .RData into workspace at startup: NO + Save workspace to .RData on exit: Never ::: ## Identifying your R-Project {.smaller} - there are a ways to check which (if any) R-Project you're in + there are 6 differences between @fig-noproject and @fig-project + which is in an R-Project session? ::: {.panel-tabset} ### Spot the differences :::: {.columns} ::: {.column width="45%"} ```{r} #| label: fig-noproject #| fig-cap: RStudio Session A #| out-width: "100%" magick::image_read(here::here("media", "rstudio_noproject.png")) ``` ::: ::: {.column width="5%"} ::: ::: {.column width="45%"} ```{r} #| label: fig-project #| fig-cap: RStudio Session B #| out-width: "100%" magick::image_read(here::here("media", "rstudio_project.png")) ``` ::: :::: ### Show the differences ```{r} #| label: fig-project-diffs #| fig-cap: How to tell if you're in a project #| out-width: "80%" magick::image_read(here::here("media", "RProject_spot-the-diffs.png")) ``` ::: ## Folder structure - some folders you'll typically want to have: + `data`: containing your dataset(s) + `scripts` (or `analyses`, etc.): containing any analysis scripts + `manuscript`: containing any write-ups of your results + `materials`: containing relevant experiment materials (e.g., stimuli) - let's just create the first 2 (`data` and `scripts`) ### `data/` - do you have "raw", i.e., pre-processed data? + if so, you might want to create a `raw` sub-folder + and any other relevant sub-folders (e.g., `processed` or `tidy`) - download the [online_cleaned.csv](https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv) dataset from the [GitHub](https://github.com/bodowinter/iconicity_challenge/tree/master) or [OSF](https://osf.io/4na58/) repo from @cwiek_novel_2021 + *or*, move a dataset of your own to this folder - save the file as `cwiek_2021-online_cleaned.csv` ::: {.content-visible when-format="revealjs"} ### {.unlisted .unnumbered} ::: - description of data collection: ::: {.fragment} > In an online experiment with listeners of 25 different languages (from nine language families), participants listened to the 90 vocalizations (three for each of the 30 meanings), and for each, guessed its intended meaning from six written alternatives > -- @cwiek_novel_2021 ::: - you could also download the data directly from GitHub in R: ::: {.fragment} ```{r} #| eval: false #| echo: true write.csv( file = "data/cwiek_2021-online_cleaned.csv", read.csv("https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv") ) ``` ::: ```{r} #| echo: false #| eval: false write_csv( file = here::here("data/cwiek_2021-online_cleaned.csv"), read_csv("https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv")) ``` ### `scripts/` - try to create a single script for each "product" + e.g., anonymised data, 'cleaned' data, data exploration, visualisation, analyses, etc. - you can create sub-folders as the project develops and move scripts around + for now, let's create a new script to take a look at our data ### {.unnumbered .unlisted} ::: {.callout-tip} ## New script Create a new script: 1. `File > New File >` Choose your preferred script type 5. Save it in your `scripts/` folder: `File > Save as...` ::: ### Load in the data - load in the data however you normally would + e.g., `read.csv()`, `readr::read_csv()`, ... ::: {.content-hidden when-format="revealjs"} ### Exercise: mini-Code Review ::: ::: {.callout-tip} #### R-Project template ::: nonincremental 1. Download the R-Project template at [https://osf.io/ctmwj/](https://osf.io/ctmwj/) 2. Open (or switch to) `rproject-template.Rproj` 3. Inspect the folder structure and the files. 4. Look at the `scripts/` folder. Is it clear which scripts should be run first? 5. Try running `02-visualisation.R` first. Do you encounter any problems? ::: ::: # `here`-package {data-stack-name="{here}"} - `here` package [@here-package] enables file referencing + avoids the use of `setwd()` ::: {.content-visible when-format="revealjs"} ## {.unnumbered .unlisted} ::: ```{r} #| label: fig-here #| fig-cap: Illustration by [Allison Horst](https://github.com/allisonhorst) magick::image_read(here::here("media", "Horst_here.png")) ``` ## The problem with `setwd()` ::: {.fragment} > If the first line of your R script is > > `setwd("C:\Users\jenny\path\that\only\I\have")` > > I will come into your office and SET YOUR COMPUTER ON FIRE🔥. --- [Jenny Bryan](https://x.com/hadleywickham/status/940021008764846080) ::: - `setwd()` depends on your entire machine's folder structure - `setwd()` breaks when you + send youR-Project folder to a collaborator + make your analyses open + change the location of youR-Project folder - using slashes is also dependent on your operating system ::: {.content-visible when-format="revealjs"} ### {.unnumbered} ::: - trying to use somebody else's (or your former) folder path will result in a warning message like: ::: {.fragment} `Error in setwd("/Users/danielapalleschi/Documents/R/rproject-template") : ` ` cannot change working directory` ::: ## The benefit of `here()` - uses the top-level directory of your Project as the working directory + meaning we never need to specify the path to our project folder relative to our current higher-level folder structure - can separate folder names with a comma + meaning it doesn't matter if the original code was written on a Mac or a Windows machine ## {.unlisted .unnumbered} ::: {.callout-tip} # `here` In your R Project, load the `cwiek_2021-online_cleaned.csv` data using `here` 1. Install `here` (if needed; e.g., `install.packages("here")`) 2. Load `here` at the beginning of your package + or use `here::` before calling a function 3. Use the `here()` function to load in your data 4. Inspect the dataset however you usually would (e.g., `summary()`, `names()`, etc.) 4. Save your script ::: ## `here::here()` - install package ```{r filename = "In the Console"} #| eval: false #| echo: true install.packages("here") ``` - load package and call the `here` function ```{r} #| eval: false #| echo: true # load package library(here) # read in data df_icon <- read.csv(here("data", "cwiek_2021-online_cleaned.csv")) ``` - or directly call the `here` function without loading the package ```{r} #| eval: false #| echo: true # read in data without loading here df_icon <- read.csv(here::here("data", "cwiek_2021-online_cleaned.csv")) ``` ::: {.content-visible when-format="revealjs"} ### {.unlisted .uncounted .unnumbered} ::: - note that I stored the data with the prefix `df_` + `df` stands for dataframe - I recommend using object-type defining prefixes for all objects in your Environment + e.g., `fit_` for models, `fig_` for figures, `sum_` for summaries, `tbl_` for tables, etc. ## {.unlisted .unnumbered} ::: {.callout-tip} # Reproduce your analysis 1. Perform some data exploration (e.g., with `names()`, `summary()`, `dplyr::glimpse()`, whatever you typically do) 1. Save your script, then close RStudio/your R-Project. 2. Re-open the project. Can you re-run the script? ::: # Topics 🏁 {.unnumbered .unlisted .nonincremental} - Project-oriented workflows ✅ - creating an R-Project ✅ - project-relative filepaths with the `here` package ✅ # References {.unlisted .unnumbered visibility="uncounted"} --- nocite: | @bryan_what_nodate @bryan_chapter_nodate @noauthor_using_2024 --- ::: {#refs custom-style="Bibliography"} :::

Topics

1 Installation requirements

2 Project-oriented workflow

2.1 Folder structure

3 R-Projects

3.1 Creating a new Project

3.2 Opening a Project

3.3 Adding a README file

3.4 Global RStudio options

3.5 Identifying your R-Project

3.6 Folder structure

data/

scripts/

Load in the data

Exercise: mini-Code Review

4 here-package

4.1 The problem with setwd()

4.2 The benefit of here()

4.3 here::here()

Topics 🏁

References

`data/`

`scripts/`

4 `here`-package

4.1 The problem with `setwd()`

4.2 The benefit of `here()`

4.3 `here::here()`