library(tidyverse)
<- readRDS("data/tucker2019/mald_1_1.rds") mald
25 Wrangling MCMC draws
25.1 MCMC what?
Bayesian regression models fitted with brms/Stan use the Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate the probability distributions of the model’s coefficient. You have first encountered MCMC in Chapter 22. The MCMC algorithm does two things: first it finds a probability area that is the most compatible with the prior probabilities and the probability of the data given the priors; then, it sample values (called draws) from this high-probability area to reconstruct the posterior distribution. Since the high-probability area is assumed to be representative of the posterior distribution (that’s why we sample from it), the MCMC draws are also called posterior draws. Elizabeth Pankratz has written a nice short introduction to MCMC algorithms here: Finding posteriors through sampling. I strongly recommend that you read that before proceeding (if you want to further your understanding of MCMC, the resources linked on that page are an excellent starting point). Note that to be a proficient user of brms and Bayesian regression models, you don’t need to fully understand the mathematics behind MCMC algorithms, as long as you understand them conceptually.
When you run a model with brms, the draws (i.e. the sampled values) are stored in the model object. All operations on a model, like obtaining the summary()
, are actually operations on those draws. We normally fit regression models with four MCMC chains. The sampling algorithm within each chain runs for 2000 iterations by default. The first half (1000 iterations) are used to “warm up” the algorithm (i.e. find the high-probability area) while the second half (1000 iterations) are the ones that sample from the high-probability area to reconstruct the posterior distribution. For four chains ran with 2000 iterations of which 1000 for warm up, we end up with 4000 iterations we can use to learn details about the posterior.
The rest of the post will teach you how to extract and manipulate the model’s draws. We will first revisit the model fitted in Chapter 27.
25.2 Extract MCMC posterior draws
There are different ways to extract the MCMC posterior draws from a fitted model. In this book, we will use the as_draws_df()
function from the posterior package. The function extracts the draws from a Bayesian regression model and outputs them as a data frame. Let’s take the rt_bm_1
model from Chapter 27 and extract the draws. Let’s first read the tucker2019/mald_1_1.rds
data again.
Now we reload the Bayesian regression model from Chapter 27. We simply use the same code we used in that chapter. If you run the code now, the file with the model will be loaded into R, so the model will not be fitted from scratch. This is both convenient and it ensure the code is reproducible.
library(brms)
<- brm(
rt_bm_1 ~ IsWord,
RT family = gaussian,
data = mald,
seed = 6725,
file = "cache/ch-regression-cat-rt_bm_1"
)
Let’s review the model summary.
summary(rt_bm_1)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RT ~ IsWord
Data: mald (Number of observations: 5000)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 953.14 6.12 941.17 965.17 1.00 4590 2897
IsWordFALSE 116.34 8.80 98.74 134.14 1.00 4815 3235
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 312.47 3.11 306.44 318.60 1.00 4720 3456
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The Regression Coefficients
table includes Intercept
and IsWordFALSE
which is what we expect from the model formula and data. We are good to go! Now we can extract the MCMC draws from the model using the as_draws_df()
function. This function from the posterior package returns a tibble with values obtained in each draw of the MCMC algorithm. Since we fitted the model with 4 MCMC chains and 1000 sampling draws per chain, there is a total of 4000 drawn values (i.e. 4000 rows in the data frame).
<- as_draws_df(rt_bm_1)
rt_bm_1_draws rt_bm_1_draws
Don’t worry about the Intercept
, lprior
and lp__
columns. Open the data frame in the RStudio viewer. You will see three extra column: .chain
, .iteration
and .draw
. They indicate:
.chain
: The MCMC chain number (1 to 4)..iteration
: The iteration number within chain (1 to 1000)..draw
: The draw number across all chains (1 to 4000).
Make sure that you understand these columns in light of the MCMC algorithm. The following columns contain the drawn values at each draw for three parameters of the model: b_Intercept
, b_IsWordFALSE
and sigma
. To remind yourself what these mean, let’s have a look at the mathematical formula of the model.
\[ \begin{align} RT_i & \sim Gaussian(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 \cdot w_i \\ \end{align} \]
So:
b_Intercept
is \(\beta_0\). This is the mean RT when the the word is a real word.b_IsWordFALSE
is \(\beta_1\). This is the difference in RT between nonce words and real words.sigma
is \(\sigma\). This is the overall standard deviation of the RTs.
Any inference made on the basis of the model are inferences derived from the draws. One could say that the model “results” are, to put it simply, these draws and that the draws can be used to make inferences about the population one is investigating (if one is interested in inference, in the first place).
25.3 Summary measures of the posterior draws
The Regression Coefficients
table from the summary()
of the model reports summary measures calculated from the drawn values of b_Intercept
and b_IsWordFALSE
. These summary measures are the mean (Estimate
), the standard deviation (Est.error
) and the lower and upper limits of the 95% Credible Interval (CrI). Since we have now the draws, we can of course obtain those same measures ourselves from the data frame with the draws! We will use the quantile2()
function from the posterior package to obtain 95% and 80% CrIs.
library(posterior)
This is posterior version 1.6.1
Attaching package: 'posterior'
The following objects are masked from 'package:stats':
mad, sd, var
The following objects are masked from 'package:base':
%in%, match
<- rt_bm_1_draws |>
rt_b1_summaries summarise(
intercept_mean = mean(b_Intercept), intercept_sd = sd(b_Intercept),
intercept_lo_95 = quantile2(b_Intercept, 0.025), intercept_up_95 = quantile2(b_Intercept, 0.975),
IsWordFALSE_mean = mean(b_IsWordFALSE), IsWordFALSE_sd = sd(b_IsWordFALSE),
IsWordFALSE_lo_95 = quantile2(b_IsWordFALSE, 0.025), IsWordFALSE_up_95 = quantile2(b_IsWordFALSE, 0.975),
|>
) # fancy way of mutating multiple columns
mutate(
across(everything(), ~round(., 2))
) rt_b1_summaries
Compare the values obtained now with the values in the model summary above. They are the same, because the summary measures in the model summary are simply summary measures of the draws. Hopefully now it is clear where the values in the Regression Coefficient
table come from and how these are related to the MCMC draws.
25.4 Plotting posterior draws
Plotting posterior draws is as straightforward as plotting any data. You already have all of the tools to understand plotting draws with ggplot2. To plot the reconstructed posterior probability distribution of any parameter, we plot the probability density (with geom_density()
) of the draws of that parameter. Let’s plot b_IsWordFALSE
. This will be the posterior probability density of the difference in RTs between nonce (IsWord
is FALSE
) and real words (IsWord
is TRUE
).
|>
rt_bm_1_draws ggplot(aes(b_IsWordFALSE)) +
geom_density() +
geom_rug(alpha = 0.2)
25.5 Calculate posterior predictions from the draws
In Chapter 28, you learned how to use indexing of categorical predictors to estimate the mean of the outcome variable for each level in the categorical predictor. You might be wondering: with contrasts you get the mean of the reference level and the difference between the second level and the reference level; with indexing you get the mean at each level; however, all of these estimands (the mean of all levels and the difference between the levels) are important for inference. So, should you fit the model twice to obtain all estimands? Of course, that is not necessary because independent from the way you code categorical predictors, the model will have all the information needed to calculate all of the estimands of interest. This information is in the MCMC draws. In this section, you will learn how to obtain the estimated mean RT when the word is a nonce word from the model that used contrasts.
Let’s revise the formula of the mean \(\mu\) from the model formula above: \(\mu_i = \beta_0 + \beta_1 \cdot w_i\). The mean depends on the value of \(w\) (that’s what the subscript \(i\) is for). We can substitute \(w_i\) with 0 for IsWord
= TRUE
and with 1 for IsWord
= FALSE
. So we get \(\mu_T = \beta_0\) for TRUE
and \(\mu_F = \beta_0 + \beta_1\). Look again at rt_bm_1_draws
. Recall that b_Intercept
is \(\beta_0\) and b_IsWordFALSE
is \(\beta_1\). Can you figure out what you need to do to get the posterior probability distribution of \(\mu_F\) (i.e. the mean RT when IsWord
is FALSE
)? You sum the columns b_Intercept
and b_IsWordFALSE
! It goes without saying that the posterior probability distribution of the mean RT when IsWord
is TRUE
is the probability density of b_Intercept
.
<- rt_bm_1_draws |>
rt_bm_1_draws mutate(
# predicted RT for real words
real = b_Intercept,
# predicted HNR for polite attitude
nonce = b_Intercept + b_IsWordFALSE
)
|> select(real, nonce) rt_bm_1_draws
Warning: Dropping 'draws_df' class as required metadata was removed.
The sum operation for nonce
is applied row-wise, to each draw (remember, each row in the draw tibble is the draw from one iteration of the MCMC chains). So, for nonce
, the value of b_Intercept
at draw 1 is added to the value of b_IsWordFALSE
at draw 1, and so on. You end up with a list of sums that has the same length as the initial draws (here, 4000, i.e. 1000 per chain!). Then you can summarise and plot this new list as you did with the b_
coefficients earlier. To make further processing of the draws easier, we can reshape the tibble so that instead of having two columns real
and nonce
with the drawn values in them, we end up with a tibble with one column IsWord
(where we store the value real
or nonce
) and a column named value
with the drawn value. Reshaping tibbles is called pivoting in tidyverse parlance. We won’t go into the details of all pivoting operations, so I strongly recommend you to look through the vignette on pivoting (a vignette is a short tutorial that some R packages provide for users to learn basic operations with the functions in the package). The operation we need to get the shape we want is pivoting from a “wider” format to a “longer” format: this is achieved with pivot_longer()
from tidyr. Make sure you read the section on pivot_longer()
in the vignette to understand what the following code does.
<- rt_bm_1_draws |>
rt_bm_1_draws_long select(.chain, .iteration, .draw, real, nonce) |>
pivot_longer(real:nonce, names_to = "IsWord", values_to = "value")
Warning: Dropping 'draws_df' class as required metadata was removed.
rt_bm_1_draws_long
Check the resulting rt_bm_1_draws_long
tibble. The first two rows are draw 1 of chain 1: one row is for the case when the word is a real word, and the other row is for the case when the word is a nonce word. There are a total of 8000 rows: 4000 draws (1000 draws times 4 chains) times 2 (two levels of IsWord
). Now let’s make a violin plot of the predicted HNR in informal and polite attitude!

We can use the ggdist package to plot something a bit fancier, for example a “half eye” geometry (which can be obtained with the “halfeye” statistic, stat_halfeye()
). Check the ?stat_halfeye
documentation to learn about it.
library(ggdist)
Attaching package: 'ggdist'
The following objects are masked from 'package:brms':
dstudent_t, pstudent_t, qstudent_t, rstudent_t
|>
rt_bm_1_draws_long ggplot(aes(value, IsWord)) +
stat_halfeye() +
labs(x = "Predicted RT (ms)", y = "Word type")

Another useful ggplot2 statistic is stat_interval()
.
|>
rt_bm_1_draws_long ggplot(aes(IsWord, value)) +
stat_interval() +
labs(
x = "Word type", y = "Predicted RT (ms)"
)

When reporting the results, it is recommended to report the CrIs of the predicted outcome for each level in the categorical predictor. It is useful to include tables like Table 25.1. To make the generation of the table more straightforward, we can use a couple of tricks when creating the rt_bm_cris
below. The glue()
function from the tidyverse package glue allows you to specify strings based on the output of R code computation. Code is included between curly braces {}
.
library(glue)
<- rt_bm_1_draws_long |>
rt_bm_1_cris mutate(IsWord = factor(IsWord, levels = c("real", "nonce"))) |>
group_by(IsWord) |>
summarise(
mean = mean(value), sd = sd(value),
ci_90 = glue("[{round(quantile2(value, 0.05))}, {round(quantile2(value, 0.95))}]"),
ci_80 = glue("[{round(quantile2(value, 0.1))}, {round(quantile2(value, 0.9))}]"),
ci_60 = glue("[{round(quantile2(value, 0.2))}, {round(quantile2(value, 0.8))}]"),
)
::kable(rt_bm_1_cris, col.names = c("Type", "Mean RT", "SD", "90% CrI", "80% CrI", "60% CrI"), digits = 0) knitr
Type | Mean RT | SD | 90% CrI | 80% CrI | 60% CrI |
---|---|---|---|---|---|
real | 953 | 6 | [943, 963] | [945, 961] | [948, 958] |
nonce | 1069 | 6 | [1059, 1080] | [1062, 1077] | [1064, 1075] |
In the main text, you could report the results like so:
According to the model, the predicted RTs when the word is real word is between 1059 and 1080 ms at 90% confidence (\(\beta\) = 1069, SD = 6). When the word is a nonce word, the predicted RTs are between 943 and 963 ms (\(\beta\) = 953, SD = 6).