With the advent of more powerful statistical methods for assessing time series data, it is now becoming more common to compare whole vowel formant trajectories rather then just using average values.
In this post I will show how to tidy a formant measurements dataset and plot formants using the tidyverse (Wickham 2017).
From wide to long
To illustrate the process, I will use formant data that was kindly provided by Stephen Nichols.
# A tibble: 399 × 6
Time Vowel Word Duration formant_int value
<dbl> <chr> <chr> <dbl> <chr> <dbl>
1 194. u tulemupepelako 43.3 F1_05 406.
2 194. u tulemupepelako 43.3 F1_10 439.
3 194. u tulemupepelako 43.3 F1_15 453.
4 194. u tulemupepelako 43.3 F1_20 456.
5 194. u tulemupepelako 43.3 F1_25 430.
6 194. u tulemupepelako 43.3 F1_30 357.
7 194. u tulemupepelako 43.3 F1_35 314.
8 194. u tulemupepelako 43.3 F1_40 288.
9 194. u tulemupepelako 43.3 F1_45 286.
10 194. u tulemupepelako 43.3 F1_50 327.
# ℹ 389 more rows
The next step is to separate the formant_int column into two columns: one for formant, and one for interval. The argument convert = TRUE ensures that the interval column is converted into a numeric column.
# A tibble: 399 × 7
Time Vowel Word Duration formant interval value
<dbl> <chr> <chr> <dbl> <chr> <int> <dbl>
1 194. u tulemupepelako 43.3 F1 5 406.
2 194. u tulemupepelako 43.3 F1 10 439.
3 194. u tulemupepelako 43.3 F1 15 453.
4 194. u tulemupepelako 43.3 F1 20 456.
5 194. u tulemupepelako 43.3 F1 25 430.
6 194. u tulemupepelako 43.3 F1 30 357.
7 194. u tulemupepelako 43.3 F1 35 314.
8 194. u tulemupepelako 43.3 F1 40 288.
9 194. u tulemupepelako 43.3 F1 45 286.
10 194. u tulemupepelako 43.3 F1 50 327.
# ℹ 389 more rows
Now we can put back the formant labels into one column each. We can achieve this with pivot_wider(), the counterpart of pivo_longer().
# A tibble: 133 × 8
Time Vowel Word Duration interval F1 F2 F3
<dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 194. u tulemupepelako 43.3 5 406. 1205. 2626.
2 194. u tulemupepelako 43.3 10 439. 1226. 2556.
3 194. u tulemupepelako 43.3 15 453. 1246. 2507.
4 194. u tulemupepelako 43.3 20 456. 1291. 2451.
5 194. u tulemupepelako 43.3 25 430. 1418. 2331.
6 194. u tulemupepelako 43.3 30 357. 1555. 2268.
7 194. u tulemupepelako 43.3 35 314. 1603. 2335.
8 194. u tulemupepelako 43.3 40 288. 1709. 2437.
9 194. u tulemupepelako 43.3 45 286. 1712. 2461.
10 194. u tulemupepelako 43.3 50 327. 1747. 2470.
# ℹ 123 more rows
We now have a dataset with separate columns for each formant and individual rows for each vowel interval.
The pipe
All the individual steps above can be chained by using the pipe %>%. What the pipe does is simply “transferring” the output of the function before it as the input of the function after it.