B Bayesian meta-analysis of the voicing effect in English

A Bayesian meta-analysis of the English voicing effect was run on the basis of 11 estimated posterior distributions extracted from 9 different publications, following the procedures discussed in Nicenboim, Roettger & Vasishth (2018). The studies were selected by scraping the first 100 results on Google Scholar with the keywords “vowel duration voicing English.” Other studies which were known to the author but not present among the Google Scholar results were also included. Since two publications (Sharf 1962; and Klatt 1973) tested both monosyllabic and disyllabic words, two separate posterior distributions were estimated for each word type. This leads to a total of 11 posterior distribution of the effect of consonant voicing on vowel duration in English (7 estimated posteriors from 7 publications plus 2 each from 2 publications).

The posterior distributions of each study have been obtained by fitting a Bayesian linear model to the summary data (the means of vowel duration before voiceless and voiced stops) provided by the respective publications. These models had the mean vowel durations as outcome and consonant voicing (voiceless vs voiced) as the only predictor. Three studies, Luce & Charles-Luce (1985), Davis & Summers (1989), and Ko (2018), reported measures of dispersion along with the means. Measurement error models were used to obtain the posterior distributions from these studies. The measurement error term in such models allows us to include information of the dispersion of the mean vowel durations, and hence of the uncertainty that comes with them. All the models for estimating the posterior of the individual studies were fitted with the following priors: a normal distribution with mean = 0 ms and SD = 300 for the intercept, and a normal distribution with mean = 0 ms and SD = 100 for the effect of consonant voicing. The simple models (without an error term) also included a prior for the residual variance as a half Cauchy distribution with location = 0 ms and scale = 25.

A data set with the mean estimates and estimated standard errors from these 11 posterior distributions (Table B.1) has then been used to fit a further Bayesian measurement error model. In this model, the mean estimates with the estimated standard errors were included as the outcome, while a by-study random intercept was the only predictor. The models were fitted in R with brms using Markov Chain Monte Carlo simulations, with 4 chains, 2000 iterations of which 1000 for warm-up.

Table B.1: Bayesian estimates of the voicing effect in indvidual studies.
Study Estimate Est.Error Q2.5 Q97.5 Syllable position N. speakers
Heffner (1937) 62.15 20.09 22.08 100.42 final 1
House & Fairbanks (1953) 81.43 14.32 52.96 109.12 final 10
Zimmerman & Sapon (1958) 86.77 29.34 24.84 142.44 final 2
Peterson & Lehiste (1960) 103.43 28.68 43.58 159.85 final 5
Sharf (1962) 24.25 13.36 -3.89 50.70 non-final 1
Sharf (1962) 51.65 34.32 -21.95 115.89 final 1
Chen (1970) 151.89 25.81 96.90 194.18 final 1
Klatt (1973) 22.30 38.91 -71.78 99.85 non-final 3
Klatt (1973) 49.39 47.45 -72.70 130.84 final 3
Mack (1982) 125.18 21.21 81.84 165.28 final 3
Luce & Charles-Luce (1985) final 77.51 9.87 58.61 96.92 final 3
Luce & Charles-Luce (1985) medial 40.87 8.51 24.12 57.24 final 3
Davis & Van Summers (1989) 18.52 4.46 10.16 27.46 non-final 3
Laeufer (1992) 74.07 41.90 -15.48 150.30 final 5
Ko (2018) 36.43 35.66 -32.02 104.79 final 7

The following is the summary of the meta-analytical model (as output by summary() function). The population-level effects are the ones of interest. Figure B.1 is a visual aid to the summary, and shows a variety of credible intervals of the estimates from the model. The blue-coloured bars represent (from darker to lighter blue) the 50%, 80%, and 95% credible intervals (CIs). The black lines are the 66% (thick) and 98% (thin) CIs.

## Group-Level Effects: 
## ~study (Number of levels: 15) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    23.08      8.66     9.54    43.08 1.00     1465     2085
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept           75.74      9.68    56.28    95.28 1.00     2088     2258
## syl_posnonMfinal   -49.22     18.67   -84.90    -8.45 1.00     1860     1682

## Warning: 'stat_intervalh' is deprecated.
## Use 'stat_interval' instead.
## See help("Deprecated") and help("tidybayes-deprecated").
## Warning: 'stat_pointintervalh' is deprecated.
## Use 'stat_pointinterval' instead.
## See help("Deprecated") and help("tidybayes-deprecated").
Credible intervals of the meta-analytical posterior distributions.

Figure B.1: Credible intervals of the meta-analytical posterior distributions.

The 95% credible interval (CI) of the model intercept (which corresponds to the estimated voicing effect in word-final syllables) is between 56.39 and 96.43 ms. This means that there is a 95% probability that the true effect lies between about 56 and 96 ms. The mean of the posterior distribution is 75.83 ms (SD = 10.01). Given the 95% CI of the meta-analytical posterior distribution, it can be inferred that the true effect of voicing in word-final syllables in English is positive and between 50 and 100 ms. However, note that the meta-analytical estimate might suffer from publication bias (cf. below).

The posterior mean of the coefficient when the target syllable is in penultimate position is -49.14 ms (SD = 19.10, 95% CI = [-85.69, -8.98]). Note that the estimated error is double compared to that of the intercept, which means the there is greater uncertainty in this than the other estimate. We can argue that, on average, the mean voicing effect in penultimate syllables is about 50 ms smaller than the mean effect in monosyllabic words in the surveyed studies. The mean of the voicing effect in disyllabic words can thus be estimated to be around 25 ms (75 - 50 ms).

A visual representation of the meta-analytical distributions is given in Figure B.2. The plot shows the full posterior distributions of the voicing effect in the word-final and penultimate contexts. Note how the posterior distribution in penultimate position is wider than the other.

Meta-analytical posterior distributions of the voicing effect in syllable-final and penultimate position.

Figure B.2: Meta-analytical posterior distributions of the voicing effect in syllable-final and penultimate position.

Figure B.3 shows the mean estimates (the points) of the voicing effect with 95% CIs (the horizontal segments) for each of the 11 studies. For each study, the plot gives both the original estimate (as obtained from the raw data summary of the study) and the estimate shrunk by the random effects in the meta-analytical model. The vertical lines indicate the meta-analytical 95% CI of the voicing effect in final (solid) and penultimate syllable position (dashed). Original estimates further away from the meta-analytical mean effect and those with greater uncertainty (wider errors) show greater shrinkage to the mean.

Estimated voicing effect from the original source and from the meta-analysis.

Figure B.3: Estimated voicing effect from the original source and from the meta-analysis.

Figure B.4 is a funnel plot, which can be used to visually check whether the sample suffers from publication bias. In this plot, the x-axis corresponds to the original estimated difference in vowel duration, while the y-axis is a measure of precision (calculated as 1 divided by the estimated error of the difference). The meta-analytical means are indicated by the thick and dashed vertical lines for syllable-final and penultimate position respectively. The shaded areas indicate the 95% CI of the meta-analytical posterior of the voicing effect in final (light blue) and penultimate (dark blue) position. When there is no bias, the points with lower precision should be more spread out and symmetrically placed around the meta-analytical mean, while points with higher precision should cluster around the mean. This ideal situation is clearly not the case for the final syllable context. There seems to be a bias towards bigger effects (which also happen to have lower precision). This indicates that the estimate probably suffers from publication bias (i.e. bias towards publishing positive and significant results) and it is not representative of the true effect. It is not possible to assess bias with the effect in penultimate syllable position given the low number of studies.

By-study funnel plot showing the estimate against the precision. The vertical thick and dashed lines are the meta-analytical means of the effect in final and penultimate position.

Figure B.4: By-study funnel plot showing the estimate against the precision. The vertical thick and dashed lines are the meta-analytical means of the effect in final and penultimate position.