Plotting tongue contours with ggplot2

Linguistics
Author

Stefano Coretta

Published

August 23, 2018

When plotting tongue contours data obtained from ultrasound tongue imaging in R using ggplot2, a common option to smooth over the individual contours and show the general pattern is to use geom_smooth(methood = "loess"). However, as I will show in this post, in certain cases this method leads to very disorted contours. Such distortion is more or less always present, although at a lower degree in less extreme cases.

To show the shortcomings of using geom_smooth() and present a viable alternative, we’ll be using ultrasound tongue imaging data from one speaker (me). This dataset includes tongue contours from within the closure of the conosonants /t, d/ preceeded by /a, o, u/. The dataset looks like this (some columns dropped):

select(tongue_data, rec_date, fan_line, X, Y, word, vowel, c2)
# A tibble: 1,239 × 7
   rec_date            fan_line     X     Y word  vowel c2   
   <chr>                  <int> <dbl> <dbl> <chr> <chr> <chr>
 1 29/11/2016 15:10:52        6  37.4  9.25 pada  a     d    
 2 29/11/2016 15:21:30        6  38.6 13.1  pada  a     d    
 3 29/11/2016 15:10:52        7  34.4 10.3  pada  a     d    
 4 29/11/2016 15:11:03        7  34.3  9.81 pata  a     t    
 5 29/11/2016 15:11:14        7  34.6 11.0  podo  o     d    
 6 29/11/2016 15:13:39        7  34.3  9.65 pada  a     d    
 7 29/11/2016 15:16:05        7  34.8 11.5  pada  a     d    
 8 29/11/2016 15:17:07        7  34.5 10.5  putu  u     t    
 9 29/11/2016 15:19:45        7  34.3  9.64 putu  u     t    
10 29/11/2016 15:21:30        7  35.4 13.8  pada  a     d    
# ℹ 1,229 more rows

rec_date is the date and time of recording. Each observed tongue contour has a unique rec_date (this will come in handy later). fan_line is the number of the line in the fan coordinate system used by Articulate Assistant Advanced (which I used to record the data). X and Y are the horizontal and vertical position of each point on the contour. The unit is millimeters. word, vowel and c2 are self-explanatory.

Let’s start by plotting the smoothed contours by vowel and consonant.

tongue_data %>%
  ggplot(aes(X, Y)) +
  geom_smooth(aes(colour = vowel), method = "loess") +
  coord_fixed() +
  facet_grid(c2 ~ vowel) +
  theme(legend.position = "none")

We can immediately notice that with /u/ there is something odd going on. That does not look like a tongue surface (maybe that of a chameleon! Definitely not one of a ‘hooman’.) The smooths for /a/ and /o/ seem quite standard.

To see what is going on, let’s plot now also the individual points as recorded in the data, whith a superimoposed smooth, for comparison.

tongue_data %>%
  ggplot(aes(X, Y)) +
  geom_point(alpha = 0.1) +
  geom_smooth(aes(colour = vowel), method = "loess") +
  coord_fixed() +
  facet_grid(c2 ~ vowel) +
  theme(legend.position = "none")

While the smooths with /a/ and /o/ more or less have a good fit when compared to the points, with /u/ the smooths are really off.

This happpens because the tongue root (in this particular case) developpes vertically rather than slanted. The smooth isagnostic about the fact that points lying on the same X value but with different Y values belong to different portion of the tongue contour. The result is that smoothing happens across tongue parts.

An alternative (if you don’t like points) is to use geom_path() to plot the individual tongue contours as lines. geom_path() connects points with a line, following the order in which they appear in the dataset. So, before using this geometry, we need to arrange the dataframe such that the points are in the right order (now they are in the wrong order).

To do so, we can use rec_date (which identifies the individual contours) and fan_line which indicates the orders of points (for each contour, there a maximum 42 points/fan lines; NAs have been excluded).

tongue_data <- tongue_data %>%
  arrange(rec_date, fan_line)

tongue_data
# A tibble: 1,239 × 30
   speaker seconds rec_date          prompt label TT_displacement_sm TT_velocity
   <chr>     <dbl> <chr>             <chr>  <chr>              <dbl>       <dbl>
 1 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 2 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 3 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 4 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 5 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 6 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 7 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 8 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
 9 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
10 it01       1.11 29/11/2016 15:10… Dico … max_…               77.4       -7.18
# ℹ 1,229 more rows
# ℹ 23 more variables: TT_velocity_abs <dbl>, TD_displacement_sm <dbl>,
#   TD_velocity <dbl>, TD_velocity_abs <dbl>, TR_displacement_sm <dbl>,
#   TR_velocity <dbl>, TR_velocity_abs <dbl>, fan_line <int>, X <dbl>, Y <dbl>,
#   word <chr>, language <chr>, sex <chr>, item <int>, ipa <chr>, c1 <chr>,
#   c1_phonation <chr>, vowel <chr>, anteropost <chr>, height <chr>, c2 <chr>,
#   c2_phonation <chr>, c2_place <chr>

We can now use geom_path(). The argument group = rec_date ensures that individual lines are plotted (without it, the last point of one contour is connected with the first of the contour following in the dataset).

tongue_data %>%
  ggplot(aes(X, Y)) +
  geom_path(aes(group = rec_date, colour = vowel), alpha = 0.5) +
  coord_fixed() +
  facet_grid(c2 ~ vowel) +
  theme(legend.position = "none")

The tongue root in /u/ is now properly rendered.

But what fif you want to plot a single contour (possibly with confidence intervals) for each of the 6 panels in the previous figure, rather than all the contours?

An option is to plot an average contour (litterally, the aveages of X and Y). We can easily do that by grouping the data by fan_line and then summarise() it. Plotting can then be done with geom_path() and geom_polygon(). All together, the code looks like this.

xy_mean <- tongue_data %>%
  group_by(fan_line, vowel, c2) %>%
  summarise(
    X_mean = mean(X, na.rm = TRUE),
    Y_mean = mean(Y, na.rm = TRUE)
  )

xy_ci <- tongue_data %>%
  group_by(fan_line, vowel, c2) %>%
  summarise(
    X_CI_low = t.test(X)$conf.int[1],
    X_CI_up = t.test(X)$conf.int[2],
    Y_CI_low = t.test(Y)$conf.int[1],
    Y_CI_up = t.test(Y)$conf.int[2]
  )

ci_upper <- xy_ci %>%
  dplyr::select(-X_CI_low, -Y_CI_low) %>%
  dplyr::rename(
    CI_X = X_CI_up,
    CI_Y = Y_CI_up
  )

ci_lower <- xy_ci %>%
  dplyr::select(-X_CI_up, -Y_CI_up) %>%
  dplyr::arrange(dplyr::desc(fan_line)) %>%
  dplyr::rename(
    CI_X = X_CI_low,
    CI_Y = Y_CI_low
  )

ci <- rbind(ci_upper, ci_lower)

ggplot(xy_mean, aes(X_mean, Y_mean)) +
  geom_polygon(data = ci, aes(x = CI_X, y = CI_Y), alpha = 0.2) +
  geom_path(aes(X_mean, Y_mean, colour = vowel)) +
  facet_grid(c2 ~ vowel) +
  coord_fixed() +
  theme(legend.position = "none")