
.Rproj
Creating a project-oriented workflow in R
Daniela Palleschi
Leibniz-Zentrum Allgemeine Sprachwissenschaft
Tue Oct 8, 2024
Mon Oct 7, 2024
here
package4.4.0
, “Puppy Cup”R.version
2023.12.1.402
, “Ocean Storm”.Rproj
file in a project folderSoSe2024
or Mastersarbeit
File > New Project > New Directory > New Project > [Directory name] > Create Project
Create a new R-Project for this workshop
File > New Project > New Directory > New Project > [Directory name] > Create Project
.Rproj
file and double-clickProject (None)
drop-down (top right).Rproj
File > New File > Markdown File
(not R Markdown!)
#
for headings, *italics*
, **bold**
)README.md
in youR-Project directoryTools > Global Options
Change your Global Options so that
data
: containing your dataset(s)scripts
(or analyses
, etc.): containing any analysis scriptsmanuscript
: containing any write-ups of your resultsmaterials
: containing relevant experiment materials (e.g., stimuli)data
and scripts
)data/
raw
sub-folderprocessed
or tidy
)cwiek_2021-online_cleaned.csv
In an online experiment with listeners of 25 different languages (from nine language families), participants listened to the 90 vocalizations (three for each of the 30 meanings), and for each, guessed its intended meaning from six written alternatives
– Ćwiek et al. (2021)
scripts/
Create a new script:
File > New File >
Choose your preferred script typescripts/
folder: File > Save as...
read.csv()
, readr::read_csv()
, …rproject-template.Rproj
scripts/
folder. Is it clear which scripts should be run first?02-visualisation.R
first. Do you encounter any problems?here
-packagehere
package (Müller, 2020) enables file referencing
setwd()
setwd()
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE🔥.
setwd()
depends on your entire machine’s folder structuresetwd()
breaks when you
Error in setwd("/Users/danielapalleschi/Documents/R/rproject-template") :
cannot change working directory
here()
here
In your R Project, load the cwiek_2021-online_cleaned.csv
data using here
here
(if needed; e.g., install.packages("here")
)here
at the beginning of your package
here::
before calling a functionhere()
function to load in your datasummary()
, names()
, etc.)here::here()
here
functionhere
function without loading the packagedf_
df
stands for dataframefit_
for models, fig_
for figures, sum_
for summaries, tbl_
for tables, etc.names()
, summary()
, dplyr::glimpse()
, whatever you typically do)here
package ✅---
title: "R-Projects"
subtitle: "Creating a project-oriented workflow in R"
author: "Daniela Palleschi"
institute: Leibniz-Zentrum Allgemeine Sprachwissenschaft
lang: en
date: 2024-10-08
date-format: "ddd MMM D, YYYY"
date-modified: last-modified
language:
title-block-published: "Workshop Day 1"
title-block-modified: "Last Modified"
format:
html:
output-file: R-Projects.html
number-sections: true
number-depth: 2
toc: true
code-overflow: wrap
code-tools: true
embed-resources: false
pdf:
output-file: R-Projects.pdf
toc: true
number-sections: false
colorlinks: true
code-overflow: wrap
revealjs:
footer: "R-Projects and {here}"
output-file: R-Projects-slides.html
editor_options:
chunk_output_type: console
bibliography: ../bibs/RProjects.bib
execute:
echo: false
---
```{r}
#| echo: false
#| eval: false
rbbt::bbt_update_bib(here::here("pages", "RProjects.qmd"))
```
# Topics {.unnumbered .unlisted}
- Project-oriented workflows
- creating an R-Project
- project-relative filepaths with the `here` package
# Installation requirements
- required installations/recent versions of:
- R
- at least version `4.4.0`, "Puppy Cup"
- check current version with `R.version`
- download/update: <https://cran.r-project.org/bin/macosx/>
- RStudio
- at least version `2023.12.1.402`, "Ocean Storm"
- Help \> Check for updates
- new install: <https://posit.co/download/rstudio-desktop/>
# Project-oriented workflow {data-stack-name="Project-oriented workflow"}
1. Folder structure:
+ keeping everything related to a project in one place
+ i.e., contained in a single folder, with subfolders as needed
2. Project-relative working directory
+ the project folder should act as your working directory
+ all file paths should be relative to this folder
## Folder structure
- a core computer literacy skill
+ keep your Desktop as empty as possible
+ have a sensible folder structure
+ avoid mixing subfolders and files
+ i.e., if a folder contains subfolders, ideally it should not contain files
# R-Projects {data-stack-name="R-Projects"}
- in data analysis, using an IDE is beneficial
+ e.g., RStudio
- most IDEs have their own implementation of a Project
- in RStudio, this is the R-Project
+ creates a `.Rproj` file in a project folder
+ stores project settings
- you can have several R-Projects open simultaneously
+ and run several scripts across projects simultaneously
- most importantly, R-Projects (can) centralise a specific project's workflow and file path
- to read more about R-Projects, check out [Section 6.2: Projects](https://r4ds.hadley.nz/workflow-scripts.html#projects) from @wickham_r_2023 [or [Ch. 8 - Workflow: Projects](https://r4ds.had.co.nz/workflow-projects.html) in @wickham_r_2016]
## Creating a new Project
- when?
+ whenever you're starting a new course oR-Project which will use R
- why?
+ to keep all the relavent materials in one place
- where?
+ somewhere that makes sense, e.g., a folder called `SoSe2024` or `Mastersarbeit`
- how?
+ `File > New Project > New Directory > New Project > [Directory name] > Create Project`
### {.unnumbered .unlisted}
::: {.callout-tip}
# New R-Project
Create a new R-Project for this workshop
+ `File > New Project > New Directory > New Project > [Directory name] > Create Project`
+ make sure you choose a sensible location
:::
## Opening a Project
- to open a project, locate its `.Rproj` file and double-click
- or if you're already in RStudio, you can use the `Project (None)` drop-down (top right)
:::: {.columns}
::: {.column width="50%"}
```{r}
#| label: fig-click-open
#| fig-cap: Double-click `.Rproj`
#| out-width: "80%"
magick::image_read(here::here("media", "rstudio_click_open.png"))
```
:::
::: {.column width="50%"}
```{r}
#| label: fig-project-open
#| fig-cap: Open from RStudio
#| out-width: "80%"
magick::image_read(here::here("media", "rstudio_project_open.png"))
```
:::
::::
## Adding a README file
- `File > New File > Markdown File` (*not* R Markdown!)
+ add some text describing the purpose of this project
+ include your name, the date
+ use Markdown formatting (e.g., `#` for headings, `*italics*`, `**bold**`)
- save as `README.md` in youR-Project directory
## Global RStudio options
:::: {.columns}
::: {.column width="50%"}
```{r}
#| label: fig-rstudio-settings
#| fig-cap: RStudio settings for reproducibility
#| out-width: "80%"
magick::image_read(here::here("media", "RStudio_global-options.png"))
```
:::
::: {.column width="50%"}
- `Tools > Global Options`
+ **Workspace**: Restore .RData into workspace at startup: NO
+ Save workspace to .RData on exit: Never
- this will ensure that you are always starting with a clean slate
+ and that your code is not dependent on some pacakge or object you created in another session
- this is also how RMarkdown and Quarto scripts run
+ they start with an empty environment and run the script linearly
:::
::::
## {.unnumbered .unlisted}
::: {.callout-tip}
## Global settings
Change your Global Options so that
+ **Workspace**: Restore .RData into workspace at startup: NO
+ Save workspace to .RData on exit: Never
:::
## Identifying your R-Project {.smaller}
- there are a ways to check which (if any) R-Project you're in
+ there are 6 differences between @fig-noproject and @fig-project
+ which is in an R-Project session?
::: {.panel-tabset}
### Spot the differences
:::: {.columns}
::: {.column width="45%"}
```{r}
#| label: fig-noproject
#| fig-cap: RStudio Session A
#| out-width: "100%"
magick::image_read(here::here("media", "rstudio_noproject.png"))
```
:::
::: {.column width="5%"}
:::
::: {.column width="45%"}
```{r}
#| label: fig-project
#| fig-cap: RStudio Session B
#| out-width: "100%"
magick::image_read(here::here("media", "rstudio_project.png"))
```
:::
::::
### Show the differences
```{r}
#| label: fig-project-diffs
#| fig-cap: How to tell if you're in a project
#| out-width: "80%"
magick::image_read(here::here("media", "RProject_spot-the-diffs.png"))
```
:::
## Folder structure
- some folders you'll typically want to have:
+ `data`: containing your dataset(s)
+ `scripts` (or `analyses`, etc.): containing any analysis scripts
+ `manuscript`: containing any write-ups of your results
+ `materials`: containing relevant experiment materials (e.g., stimuli)
- let's just create the first 2 (`data` and `scripts`)
### `data/`
- do you have "raw", i.e., pre-processed data?
+ if so, you might want to create a `raw` sub-folder
+ and any other relevant sub-folders (e.g., `processed` or `tidy`)
- download the [online_cleaned.csv](https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv) dataset from the [GitHub](https://github.com/bodowinter/iconicity_challenge/tree/master) or [OSF](https://osf.io/4na58/) repo from @cwiek_novel_2021
+ *or*, move a dataset of your own to this folder
- save the file as `cwiek_2021-online_cleaned.csv`
::: {.content-visible when-format="revealjs"}
### {.unlisted .unnumbered}
:::
- description of data collection:
::: {.fragment}
> In an online experiment with listeners of 25 different languages (from nine language families), participants listened to the 90 vocalizations (three for each of the 30 meanings), and for each, guessed its intended meaning from six written alternatives
>
-- @cwiek_novel_2021
:::
- you could also download the data directly from GitHub in R:
::: {.fragment}
```{r}
#| eval: false
#| echo: true
write.csv(
file = "data/cwiek_2021-online_cleaned.csv",
read.csv("https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv")
)
```
:::
```{r}
#| echo: false
#| eval: false
write_csv(
file = here::here("data/cwiek_2021-online_cleaned.csv"), read_csv("https://raw.githubusercontent.com/bodowinter/iconicity_challenge/refs/heads/master/data/online_cleaned.csv"))
```
### `scripts/`
- try to create a single script for each "product"
+ e.g., anonymised data, 'cleaned' data, data exploration, visualisation, analyses, etc.
- you can create sub-folders as the project develops and move scripts around
+ for now, let's create a new script to take a look at our data
### {.unnumbered .unlisted}
::: {.callout-tip}
## New script
Create a new script:
1. `File > New File >` Choose your preferred script type
5. Save it in your `scripts/` folder: `File > Save as...`
:::
### Load in the data
- load in the data however you normally would
+ e.g., `read.csv()`, `readr::read_csv()`, ...
::: {.content-hidden when-format="revealjs"}
### Exercise: mini-Code Review
:::
::: {.callout-tip}
#### R-Project template
::: nonincremental
1. Download the R-Project template at [https://osf.io/ctmwj/](https://osf.io/ctmwj/)
2. Open (or switch to) `rproject-template.Rproj`
3. Inspect the folder structure and the files.
4. Look at the `scripts/` folder. Is it clear which scripts should be run first?
5. Try running `02-visualisation.R` first. Do you encounter any problems?
:::
:::
# `here`-package {data-stack-name="{here}"}
- `here` package [@here-package] enables file referencing
+ avoids the use of `setwd()`
::: {.content-visible when-format="revealjs"}
## {.unnumbered .unlisted}
:::
```{r}
#| label: fig-here
#| fig-cap: Illustration by [Allison Horst](https://github.com/allisonhorst)
magick::image_read(here::here("media", "Horst_here.png"))
```
## The problem with `setwd()`
::: {.fragment}
> If the first line of your R script is
>
> `setwd("C:\Users\jenny\path\that\only\I\have")`
>
> I will come into your office and SET YOUR COMPUTER ON FIRE🔥.
--- [Jenny Bryan](https://x.com/hadleywickham/status/940021008764846080)
:::
- `setwd()` depends on your entire machine's folder structure
- `setwd()` breaks when you
+ send youR-Project folder to a collaborator
+ make your analyses open
+ change the location of youR-Project folder
- using slashes is also dependent on your operating system
::: {.content-visible when-format="revealjs"}
### {.unnumbered}
:::
- trying to use somebody else's (or your former) folder path will result in a warning message like:
::: {.fragment}
`Error in setwd("/Users/danielapalleschi/Documents/R/rproject-template") : `
` cannot change working directory`
:::
## The benefit of `here()`
- uses the top-level directory of your Project as the working directory
+ meaning we never need to specify the path to our project folder relative to our current higher-level folder structure
- can separate folder names with a comma
+ meaning it doesn't matter if the original code was written on a Mac or a Windows machine
## {.unlisted .unnumbered}
::: {.callout-tip}
# `here`
In your R Project, load the `cwiek_2021-online_cleaned.csv` data using `here`
1. Install `here` (if needed; e.g., `install.packages("here")`)
2. Load `here` at the beginning of your package
+ or use `here::` before calling a function
3. Use the `here()` function to load in your data
4. Inspect the dataset however you usually would (e.g., `summary()`, `names()`, etc.)
4. Save your script
:::
## `here::here()`
- install package
```{r filename = "In the Console"}
#| eval: false
#| echo: true
install.packages("here")
```
- load package and call the `here` function
```{r}
#| eval: false
#| echo: true
# load package
library(here)
# read in data
df_icon <- read.csv(here("data", "cwiek_2021-online_cleaned.csv"))
```
- or directly call the `here` function without loading the package
```{r}
#| eval: false
#| echo: true
# read in data without loading here
df_icon <- read.csv(here::here("data", "cwiek_2021-online_cleaned.csv"))
```
::: {.content-visible when-format="revealjs"}
### {.unlisted .uncounted .unnumbered}
:::
- note that I stored the data with the prefix `df_`
+ `df` stands for dataframe
- I recommend using object-type defining prefixes for all objects in your Environment
+ e.g., `fit_` for models, `fig_` for figures, `sum_` for summaries, `tbl_` for tables, etc.
## {.unlisted .unnumbered}
::: {.callout-tip}
# Reproduce your analysis
1. Perform some data exploration (e.g., with `names()`, `summary()`, `dplyr::glimpse()`, whatever you typically do)
1. Save your script, then close RStudio/your R-Project.
2. Re-open the project. Can you re-run the script?
:::
# Topics 🏁 {.unnumbered .unlisted .nonincremental}
- Project-oriented workflows ✅
- creating an R-Project ✅
- project-relative filepaths with the `here` package ✅
# References {.unlisted .unnumbered visibility="uncounted"}
---
nocite: |
@bryan_what_nodate
@bryan_chapter_nodate
@noauthor_using_2024
---
::: {#refs custom-style="Bibliography"}
:::