From wide-to-long and long-to-wide with tidyr
Humboldt-Universität zu Berlin
2024-05-14
Today we will…
tidyr packagepivot_longer() to make data longerpivot_wider() to make data widerLoad the tidyverse package
Load a subset of the tidy_data_lifetime_pilot.csv data. For demonstration purposes, we’ll only look at two trials from a single participant.
# A tibble: 10 × 7
   px    trial region    ff    fp   rpd    tt
   <chr> <dbl> <chr>  <dbl> <dbl> <dbl> <dbl>
 1 px5       3 verb-1   190   190   190   190
 2 px5       3 verb     175   175   175   321
 3 px5       3 verb+1   154   154  1258  1723
 4 px5       3 verb+2   160   283   283   672
 5 px5       3 verb+3   156   575  1940   575
 6 px5       8 verb-1   246   246   246   246
 7 px5       8 verb     228   960   960  1892
 8 px5       8 verb+1   176   573   573   967
 9 px5       8 verb+2   151   151   151   450
10 px5       8 verb+3   216   981  2852   981
region: contains info on which sentence region the row’s reading times correspond toff: first fixation time, an eye-tracking reading measurefp: first-pass reading time, an eye-tracking reading measurerpd: regression path duration, an eye-tracking reading measurett: total reading time, an eye-tracking reading measurethis is the major step of data tidying
what variable and observation mean will depend on what you want to do, and will change at different steps of your analyses
you typically want long data
the tidyr package from the tidyverse has some useful functions to faciliate this: pivot_longer() and pivot_wider()
tidyr
to pivot (verb): to turn or rotate on a point, like a hinge. Or a basketball player pivoting back and forth on one foot to protect the ball. (vocabulary.com)
a pivot (noun): a fixed point supporting something that turns or balances (dictonary.Cambridge.org)
Figure 1: A memorable scence (to millenials) from Friends where the word ‘pivot’ is repeatedly used (YouTube clip)
pivot_longer()
pivot_longer() takes wide data and makes it longer
cols: which columns do we want to combine into a single column?names_to: what should we call the new column containing the previous column names?values_to: what should we call the new column containing the values from the previous columns?pivot_longer()measure, and put their values in a second variable called time
# A tibble: 40 × 5
   px    trial region measure  time
   <chr> <dbl> <chr>  <chr>   <dbl>
 1 px5       3 verb-1 ff        190
 2 px5       3 verb-1 fp        190
 3 px5       3 verb-1 rpd       190
 4 px5       3 verb-1 tt        190
 5 px5       3 verb   ff        175
 6 px5       3 verb   fp        175
 7 px5       3 verb   rpd       175
 8 px5       3 verb   tt        321
 9 px5       3 verb+1 ff        154
10 px5       3 verb+1 fp        154
# ℹ 30 more rows
ff, fp, rpd, and tt, we have two columns (measure and time) which contain the reading time measure names and corresponding reading timestime)pivot_wider()
pivot_wider() takes long data and makes it widerid_cols: identifying columnsnames_from: what should we call the new column containing the previous column names?names_prefix:values_from: new column valuespivot_wider()region column in df_longer and widen it
tt (total reading time) the resultfour reading time measures and list them in a single variable that we’ll call measure, and put their values in a second variable called time
# A tibble: 8 × 8
  px    trial measure `reg_verb-1` reg_verb `reg_verb+1` `reg_verb+2`
  <chr> <dbl> <chr>          <dbl>    <dbl>        <dbl>        <dbl>
1 px5       3 ff               190      175          154          160
2 px5       3 fp               190      175          154          283
3 px5       3 rpd              190      175         1258          283
4 px5       3 tt               190      321         1723          672
5 px5       8 ff               246      228          176          151
6 px5       8 fp               246      960          573          151
7 px5       8 rpd              246      960          573          151
8 px5       8 tt               246     1892          967          450
# ℹ 1 more variable: `reg_verb+3` <dbl>
df_lifetime, df_longer, and df_longer_wider, we have 40 reading time values
# A tibble: 40 × 5
   px    trial region measure  time
   <chr> <dbl> <chr>  <chr>   <dbl>
 1 px5       3 verb-1 ff        190
 2 px5       3 verb-1 fp        190
 3 px5       3 verb-1 rpd       190
 4 px5       3 verb-1 tt        190
 5 px5       3 verb   ff        175
 6 px5       3 verb   fp        175
 7 px5       3 verb   rpd       175
 8 px5       3 verb   tt        321
 9 px5       3 verb+1 ff        154
10 px5       3 verb+1 fp        154
# ℹ 30 more rows
# A tibble: 15 × 5
   px    trial region measure  time
   <chr> <dbl> <chr>  <chr>   <dbl>
 1 px5       3 verb-1 ff        190
 2 px5       3 verb-1 fp        190
 3 px5       3 verb-1 rpd       190
 4 px5       3 verb-1 tt        190
 5 px5       3 verb   ff        175
 6 px5       3 verb   fp        175
 7 px5       3 verb   rpd       175
 8 px5       3 verb   tt        321
 9 px5       3 verb+1 ff        154
10 px5       3 verb+1 fp        154
11 px5       3 verb+1 rpd      1258
12 px5       3 verb+1 tt       1723
13 px5       3 verb+2 ff        160
14 px5       3 verb+2 fp        283
15 px5       3 verb+2 rpd       283
# A tibble: 8 × 8
  px    trial measure `reg_verb-1` reg_verb `reg_verb+1` `reg_verb+2`
  <chr> <dbl> <chr>          <dbl>    <dbl>        <dbl>        <dbl>
1 px5       3 ff               190      175          154          160
2 px5       3 fp               190      175          154          283
3 px5       3 rpd              190      175         1258          283
4 px5       3 tt               190      321         1723          672
5 px5       8 ff               246      228          176          151
6 px5       8 fp               246      960          573          151
7 px5       8 rpd              246      960          573          151
8 px5       8 tt               246     1892          967          450
# ℹ 1 more variable: `reg_verb+3` <dbl>
More reading: PsyTeachR
Today we…
tidyr package ✅pivot_longer() to make data longer ✅pivot_wider() to make data wider ✅