<- 150
n_o <- 10
se_o <- 5
se_e
<- ( ( sqrt(n_o) * se_o ) / se_e )^2
n_e n_e
[1] 600
Stefano Coretta
April 5, 2022
This post shows how to do a quick and dirty Bayesian power analysis.
“Power” analysis might not be a very appropriate term when Bayesian statistics is concerned. A good alternative could be “precision” analysis, since the method presented in this post is about estimate precision.
More specifically, the aim of this type of analysis is to find the approximate minimum sample size required to reach a given level of precision. By precision, I mean the width of the posterior probability of the effect of interest and, by corollary, the widths of its Credible Intervals (CrI). The smaller the width of posterior/CrI, the greater the precision of the estimate.
To calculate the minimum sample size for a particular precision value, we can take advantage of the formula of the standard error of the mean:
where
Note that in Bayesian statistics, the
In the scenario of a power analysis, we want to know which
Let’s see how to calculate
Imagine you have fitted a Bayesian linear model to f0 values with formality as a predictor (formal vs informal speech). Let’s say the posterior probability of the effect of formality has a 95% CrI = [-10, +30] Hz, meaning that the standard deviation is 10 Hz. In total, there were 150 data points.
So, the observed standard error (i.e. the standard deviation of the posterior) is
Then we need to choose the standard error
Easy! We just plug in the numbers to get
The sample size required to go from an
So to get an
Let’s spice our coding up a bit and write a few functions we can use to calculate samples sizes for different &se_e$s.
Also, so far we’ve been talking about
So let’s write a function to calculate
Let’s take it for a spin, using the numbers from the example above. The
Let’s plug in these numbers.
As we have already seen above, to get a 95% CrI width of 20 Hz, we would need at least 600 observations.
Now let’s define a few functions that calculate the required sample size at different widths and create a plot.
get_obs_range <- function(obs_se, obs_n, min_width = 1, max_width, by = 1, divide = 1) {
obs_range <- NULL
for (i in seq(min_width, max_width, by = by)) {
obs_range <- c(obs_range, estimate_n(obs_se, obs_n, i, divide))
}
return(obs_range)
}
plot_obs_range <- function(obs_se, obs_n, min_width = 1, max_width, by = 1, divide = 1) {
obs_range <- get_obs_range(obs_se, obs_n, min_width, max_width, by, divide)
ggplot2::ggplot() +
aes(x = seq(min_width, max_width, by = by), y = obs_range) +
geom_point() +
geom_path() +
labs(
caption = paste0("SE = ", obs_se, ", ", obs_n, " obs"),
x = "CI width",
y = paste0("Estimated N ", "(N/", divide, ")")
)
}
Let’s now plot the required sample sizes to reach a 95% CrI width in the range from 10 to 50, by 5.