From wide-to-long and long-to-wide with tidyr
Humboldt-Universität zu Berlin
2024-05-14
Today we will…
tidyr
packagepivot_longer()
to make data longerpivot_wider()
to make data widerLoad the tidyverse
package
Load a subset of the tidy_data_lifetime_pilot.csv
data. For demonstration purposes, we’ll only look at two trials from a single participant.
# A tibble: 10 × 7
px trial region ff fp rpd tt
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 px5 3 verb-1 190 190 190 190
2 px5 3 verb 175 175 175 321
3 px5 3 verb+1 154 154 1258 1723
4 px5 3 verb+2 160 283 283 672
5 px5 3 verb+3 156 575 1940 575
6 px5 8 verb-1 246 246 246 246
7 px5 8 verb 228 960 960 1892
8 px5 8 verb+1 176 573 573 967
9 px5 8 verb+2 151 151 151 450
10 px5 8 verb+3 216 981 2852 981
region
: contains info on which sentence region the row’s reading times correspond toff
: first fixation time, an eye-tracking reading measurefp
: first-pass reading time, an eye-tracking reading measurerpd
: regression path duration, an eye-tracking reading measurett
: total reading time, an eye-tracking reading measurethis is the major step of data tidying
what variable and observation mean will depend on what you want to do, and will change at different steps of your analyses
you typically want long data
the tidyr
package from the tidyverse
has some useful functions to faciliate this: pivot_longer()
and pivot_wider()
tidyr
to pivot (verb): to turn or rotate on a point, like a hinge. Or a basketball player pivoting back and forth on one foot to protect the ball. (vocabulary.com)
a pivot (noun): a fixed point supporting something that turns or balances (dictonary.Cambridge.org)
pivot_longer()
pivot_longer()
takes wide data and makes it longer
cols
: which columns do we want to combine into a single column?names_to
: what should we call the new column containing the previous column names?values_to
: what should we call the new column containing the values from the previous columns?pivot_longer()
measure
, and put their values in a second variable called time
# A tibble: 40 × 5
px trial region measure time
<chr> <dbl> <chr> <chr> <dbl>
1 px5 3 verb-1 ff 190
2 px5 3 verb-1 fp 190
3 px5 3 verb-1 rpd 190
4 px5 3 verb-1 tt 190
5 px5 3 verb ff 175
6 px5 3 verb fp 175
7 px5 3 verb rpd 175
8 px5 3 verb tt 321
9 px5 3 verb+1 ff 154
10 px5 3 verb+1 fp 154
# ℹ 30 more rows
ff
, fp
, rpd
, and tt
, we have two columns (measure
and time
) which contain the reading time measure names and corresponding reading timestime
)pivot_wider()
pivot_wider()
takes long data and makes it widerid_cols
: identifying columnsnames_from
: what should we call the new column containing the previous column names?names_prefix
:values_from
: new column valuespivot_wider()
region
column in df_longer
and widen it
tt
(total reading time) the resultfour reading time measures and list them in a single variable that we’ll call measure
, and put their values in a second variable called time
# A tibble: 8 × 8
px trial measure `reg_verb-1` reg_verb `reg_verb+1` `reg_verb+2`
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 px5 3 ff 190 175 154 160
2 px5 3 fp 190 175 154 283
3 px5 3 rpd 190 175 1258 283
4 px5 3 tt 190 321 1723 672
5 px5 8 ff 246 228 176 151
6 px5 8 fp 246 960 573 151
7 px5 8 rpd 246 960 573 151
8 px5 8 tt 246 1892 967 450
# ℹ 1 more variable: `reg_verb+3` <dbl>
df_lifetime
, df_longer
, and df_longer_wider
, we have 40 reading time values
# A tibble: 40 × 5
px trial region measure time
<chr> <dbl> <chr> <chr> <dbl>
1 px5 3 verb-1 ff 190
2 px5 3 verb-1 fp 190
3 px5 3 verb-1 rpd 190
4 px5 3 verb-1 tt 190
5 px5 3 verb ff 175
6 px5 3 verb fp 175
7 px5 3 verb rpd 175
8 px5 3 verb tt 321
9 px5 3 verb+1 ff 154
10 px5 3 verb+1 fp 154
# ℹ 30 more rows
# A tibble: 15 × 5
px trial region measure time
<chr> <dbl> <chr> <chr> <dbl>
1 px5 3 verb-1 ff 190
2 px5 3 verb-1 fp 190
3 px5 3 verb-1 rpd 190
4 px5 3 verb-1 tt 190
5 px5 3 verb ff 175
6 px5 3 verb fp 175
7 px5 3 verb rpd 175
8 px5 3 verb tt 321
9 px5 3 verb+1 ff 154
10 px5 3 verb+1 fp 154
11 px5 3 verb+1 rpd 1258
12 px5 3 verb+1 tt 1723
13 px5 3 verb+2 ff 160
14 px5 3 verb+2 fp 283
15 px5 3 verb+2 rpd 283
# A tibble: 8 × 8
px trial measure `reg_verb-1` reg_verb `reg_verb+1` `reg_verb+2`
<chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
1 px5 3 ff 190 175 154 160
2 px5 3 fp 190 175 154 283
3 px5 3 rpd 190 175 1258 283
4 px5 3 tt 190 321 1723 672
5 px5 8 ff 246 228 176 151
6 px5 8 fp 246 960 573 151
7 px5 8 rpd 246 960 573 151
8 px5 8 tt 246 1892 967 450
# ℹ 1 more variable: `reg_verb+3` <dbl>
More reading: PsyTeachR
Today we…
tidyr
package ✅pivot_longer()
to make data longer ✅pivot_wider()
to make data wider ✅