Bayesian CrI-width power analysis

Linear models
Bayesian
brms
Author

Stefano Coretta

Published

April 5, 2022

This post shows how to do a quick and dirty Bayesian power analysis.

“Power” analysis might not be a very appropriate term when Bayesian statistics is concerned. A good alternative could be “precision” analysis, since the method presented in this post is about estimate precision.

More specifically, the aim of this type of analysis is to find the approximate minimum sample size required to reach a given level of precision. By precision, I mean the width of the posterior probability of the effect of interest and, by corollary, the widths of its Credible Intervals (CrI). The smaller the width of posterior/CrI, the greater the precision of the estimate.

To calculate the minimum sample size for a particular precision value, we can take advantage of the formula of the standard error of the mean:

se=σn

where se is the standard error of the mean, σ is the sample standard deviation and n is the sample size.

Note that in Bayesian statistics, the se is the standard deviation of the posterior probability of the effect of interest.

In the scenario of a power analysis, we want to know which ne we need to get a specific see. We can calculate the expected sample size ne to get a specific see by plugging in the observed seo, the known sample size no and the chosen see, in the following formula (which is derived from the standard error formula above).

ne=(no×seosee)2

Let’s see how to calculate ne.

Imagine you have fitted a Bayesian linear model to f0 values with formality as a predictor (formal vs informal speech). Let’s say the posterior probability of the effect of formality has a 95% CrI = [-10, +30] Hz, meaning that the standard deviation is 10 Hz. In total, there were 150 data points.

So, the observed standard error (i.e. the standard deviation of the posterior) is seo=10, the sample size is no=150.

Then we need to choose the standard error see we would want to be able to get if we ran a second iteration of the study. Let’s say we want an see of 5 Hz. Now, which is the ne we need to get see=5?

Easy! We just plug in the numbers to get ne.

n_o <- 150
se_o <- 10
se_e <- 5

n_e <- ( ( sqrt(n_o) * se_o ) / se_e )^2
n_e
[1] 600

The sample size required to go from an seo of 10 Hz to an see of 4 Hz is 600. 😱

So to get an se that is half of the observed one we would need 4 times the number of data points.

Functions ex machina

Let’s spice our coding up a bit and write a few functions we can use to calculate samples sizes for different &se_e$s.

Also, so far we’ve been talking about see or, in other words, the standard deviation of the of the posterior probability of the effect of interest. If you remember your probability 101, you should know that the width of the 95% CrI of a posterior probability is 4 times the standard deviation.

So let’s write a function to calculate ne based on the width of the 95% CrI, rather than on the see.

estimate_n <- function(obs_se, obs_n, width, divide = 1) {
  sigma <- sqrt(obs_n) * obs_se
  se <- width / 4
  new_n <- (sigma / se)^2
  return(new_n / divide)
}

Let’s take it for a spin, using the numbers from the example above. The seo we observed was 10 Hz, that’s corresponds to a width of 104=40 Hz. (Note that the first argument is the seo, not the width, but you can change that for consistency if you wish). The sample size no was 150. The see we want was 5 Hz, so the width is 54=20 Hz.

Let’s plug in these numbers.

estimate_n(obs_se = 10, obs_n = 150, width = 20)
[1] 600

As we have already seen above, to get a 95% CrI width of 20 Hz, we would need at least 600 observations.

Now let’s define a few functions that calculate the required sample size at different widths and create a plot.

get_obs_range <- function(obs_se, obs_n, min_width = 1, max_width, by = 1, divide = 1) {
  obs_range <- NULL
  for (i in seq(min_width, max_width, by = by)) {
    obs_range <- c(obs_range, estimate_n(obs_se, obs_n, i, divide))
  }
  return(obs_range)
}

plot_obs_range <- function(obs_se, obs_n, min_width = 1, max_width, by = 1, divide = 1) {
  obs_range <- get_obs_range(obs_se, obs_n, min_width, max_width, by, divide)

  ggplot2::ggplot() +
    aes(x = seq(min_width, max_width, by = by), y = obs_range) +
    geom_point() +
    geom_path() +
    labs(
      caption = paste0("SE = ", obs_se, ", ", obs_n, " obs"),
      x = "CI width",
      y = paste0("Estimated N ", "(N/", divide, ")")
    )
}

Let’s now plot the required sample sizes to reach a 95% CrI width in the range from 10 to 50, by 5.

plot_obs_range(10, 150, min_width = 10, max_width = 50, by = 5) +
  scale_y_continuous(breaks = seq(0, 3000, 250))