Introduction to Bayesian Linear Models

05 - More on priors

Stefano Coretta

University of Edinburgh

Group-level effects

Group-level effects are the Bayesian equivalent of frequentist random effects.

As with the population-level effects, the interpretation of the group-level effects is the same as that of the frequentist random effects, but you have a full (posterior) probability distribution instead of a point estimate.

Group-level priors

get_prior(
  RT ~ 0 + IsWord + IsWord:PhonLev_c + (0 + IsWord + IsWord:PhonLev_c | Subject),
  family = lognormal,
  data = mald
)
                prior class                  coef   group resp dpar nlpar lb ub
               (flat)     b                                                    
               (flat)     b           IsWordFALSE                              
               (flat)     b IsWordFALSE:PhonLev_c                              
               (flat)     b            IsWordTRUE                              
               (flat)     b  IsWordTRUE:PhonLev_c                              
               lkj(1)   cor                                                    
               lkj(1)   cor                       Subject                      
 student_t(3, 0, 2.5)    sd                                                0   
 student_t(3, 0, 2.5)    sd                       Subject                  0   
 student_t(3, 0, 2.5)    sd           IsWordFALSE Subject                  0   
 student_t(3, 0, 2.5)    sd IsWordFALSE:PhonLev_c Subject                  0   
 student_t(3, 0, 2.5)    sd            IsWordTRUE Subject                  0   
 student_t(3, 0, 2.5)    sd  IsWordTRUE:PhonLev_c Subject                  0   
 student_t(3, 0, 2.5) sigma                                                0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default
 (vectorized)
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default
brm_5_priors <- c(
  prior(normal(7, 0.5), class = b, coef = IsWordFALSE),
  prior(normal(7, 0.5), class = b, coef = IsWordTRUE),
  prior(normal(0, 0.1), class = b, coef = `IsWordFALSE:PhonLev_c`),
  prior(normal(0, 0.1), class = b, coef = `IsWordTRUE:PhonLev_c`),
  prior(cauchy(0, 0.02), class = sigma),
  prior(cauchy(0, 0.01), class = sd),
  prior(lkj(2), class = cor)
)

Prior predictive checks

brm_5_priorpp <- brm(
  RT ~ 0 + IsWord + IsWord:PhonLev_c + (0 + IsWord + IsWord:PhonLev_c | Subject),
  family = lognormal,
  prior = brm_5_priors,
  data = mald,
  sample_prior = "only",
  cores = 4,
  file = "data/cache/brm_5_priorpp",
  seed = my_seed
)

Prior predictive checks: plot

conditional_effects(brm_5_priorpp, "PhonLev_c:IsWord")

Run the model

brm_5 <- brm(
  RT ~ 0 + IsWord + IsWord:PhonLev_c + (0 + IsWord + IsWord:PhonLev_c | Subject),
  family = lognormal,
  prior = brm_5_priors,
  data = mald,
  cores = 4,
  file = "data/cache/brm_5",
  seed = my_seed
)

Model summary

brm_5
 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: RT ~ 0 + IsWord + IsWord:PhonLev_c + (0 + IsWord + IsWord:PhonLev_c | Subject) 
   Data: mald (Number of observations: 3000) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Subject (Number of levels: 30) 
                                                Estimate Est.Error l-95% CI
sd(IsWordTRUE)                                      0.07      0.01     0.05
sd(IsWordFALSE)                                     0.11      0.02     0.08
sd(IsWordTRUE:PhonLev_c)                            0.01      0.00     0.00
sd(IsWordFALSE:PhonLev_c)                           0.01      0.01     0.00
cor(IsWordTRUE,IsWordFALSE)                         0.56      0.14     0.25
cor(IsWordTRUE,IsWordTRUE:PhonLev_c)               -0.04      0.37    -0.72
cor(IsWordFALSE,IsWordTRUE:PhonLev_c)              -0.06      0.37    -0.73
cor(IsWordTRUE,IsWordFALSE:PhonLev_c)              -0.12      0.34    -0.73
cor(IsWordFALSE,IsWordFALSE:PhonLev_c)             -0.29      0.35    -0.83
cor(IsWordTRUE:PhonLev_c,IsWordFALSE:PhonLev_c)     0.04      0.38    -0.68
                                                u-95% CI Rhat Bulk_ESS Tail_ESS
sd(IsWordTRUE)                                      0.09 1.00     1672     2460
sd(IsWordFALSE)                                     0.14 1.00     1453     2297
sd(IsWordTRUE:PhonLev_c)                            0.02 1.00     2433     1442
sd(IsWordFALSE:PhonLev_c)                           0.02 1.00     1595     1446
cor(IsWordTRUE,IsWordFALSE)                         0.80 1.00     1199     1989
cor(IsWordTRUE,IsWordTRUE:PhonLev_c)                0.69 1.00     4944     2715
cor(IsWordFALSE,IsWordTRUE:PhonLev_c)               0.65 1.00     4708     3065
cor(IsWordTRUE,IsWordFALSE:PhonLev_c)               0.59 1.00     4670     2622
cor(IsWordFALSE,IsWordFALSE:PhonLev_c)              0.50 1.00     3757     2670
cor(IsWordTRUE:PhonLev_c,IsWordFALSE:PhonLev_c)     0.73 1.00     3205     2757

Regression Coefficients:
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
IsWordTRUE                6.85      0.01     6.82     6.88 1.00     1246
IsWordFALSE               6.97      0.02     6.93     7.01 1.00     1552
IsWordTRUE:PhonLev_c      0.03      0.01     0.02     0.04 1.00     5067
IsWordFALSE:PhonLev_c     0.02      0.01     0.01     0.03 1.00     6071
                      Tail_ESS
IsWordTRUE                1997
IsWordFALSE               2216
IsWordTRUE:PhonLev_c      2707
IsWordFALSE:PhonLev_c     2723

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.26      0.00     0.26     0.27 1.00     5289     3176

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Conditional posterior probability distributions

conditional_effects(brm_5, "PhonLev_c:IsWord")

Getting group-level adjustments

brm_5_shrunk <- brm_5 |>
  spread_draws(r_Subject[Subject,term]) |>
  mean_qi() |>
  mutate(source = "shrunk")

gmean_RT <- mean(mald$RT_log, na.rm = TRUE)

original <- mald |>
  group_by(Subject) |>
  summarise(
    mean_RT = mean(RT_log) ,
    sd_RT = sd(RT_log),
    lower = mean_RT - 1.96 * sd_RT,
    upper = mean_RT + 1.96 * sd_RT
  )

Plotting group-level adjustments for IsWordTRUE.

brm_5_shrunk |>
  filter(term %in% c("IsWordTRUE")) |>
  ggplot(aes(r_Subject, reorder(as.character(Subject), r_Subject))) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_point(size = 2) +
  geom_errorbarh(aes(xmin = .lower, xmax = .upper)) +
  geom_point(data = original, aes(mean_RT - gmean_RT, Subject), colour = "red", size = 2, alpha = 0.6) +
  facet_grid(cols = vars(term)) +
  labs(
    title = "By-subject conditional mode for Intercept",
    x = "Conditional mode",
    y = "Subject",
    caption = str_wrap("The red dots mark the by-subject mean RT from the raw data.")
  ) +
  xlim(-.4, .4)

Plotting group-level adjustments for IsWordTRUE.

brm_5_shrunk |>
  filter(term %in% c("IsWordFALSE")) |>
  ggplot(aes(r_Subject, reorder(as.character(Subject), r_Subject))) +
  geom_vline(xintercept = 0, linetype = "dashed") +
  geom_point(size = 2) +
  geom_errorbarh(aes(xmin = .lower, xmax = .upper)) +
  geom_point(data = original, aes(mean_RT - gmean_RT, Subject), colour = "red", size = 2, alpha = 0.6) +
  facet_grid(cols = vars(term)) +
  labs(
    title = "By-subject conditional mode for Intercept",
    x = "Conditional mode",
    y = "Subject",
    caption = str_wrap("The red dots mark the by-subject mean RT from the raw data.")
  ) +
  xlim(-.4, .4)