Introduction to Bayesian Linear Models

05 - More on priors

Stefano Coretta

University of Edinburgh

Interactions: priors

get_prior(
  RT ~ IsWord * PhonLev,
  family = lognormal,
  data = mald
)

                  prior     class                coef group resp dpar nlpar lb
                 (flat)         b                                             
                 (flat)         b         IsWordFALSE                         
                 (flat)         b IsWordFALSE:PhonLev                         
                 (flat)         b             PhonLev                         
 student_t(3, 6.9, 2.5) Intercept                                             
   student_t(3, 0, 2.5)     sigma                                            0
 ub       source
         default
    (vectorized)
    (vectorized)
    (vectorized)
         default
         default

Interactions: priors with no intercept

get_prior(
  RT ~ 0 + IsWord + IsWord:PhonLev,
  family = lognormal,
  data = mald
)

                prior class                coef group resp dpar nlpar lb ub
               (flat)     b                                                
               (flat)     b         IsWordFALSE                            
               (flat)     b IsWordFALSE:PhonLev                            
               (flat)     b          IsWordTRUE                            
               (flat)     b  IsWordTRUE:PhonLev                            
 student_t(3, 0, 2.5) sigma                                            0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
 (vectorized)
      default

The model and the priors

\[ \begin{align} \text{RT_i} & \sim LN(\mu_i, \sigma) \\ log(\mu_i) & = \beta_{1_{IW[i]}} + \beta_{2_{IW[i]}} \cdot \text{PhonLev}_i \\ \beta_{1_{IW[T]}} & \sim Gaussian(\mu_1, \sigma_1) \\ \beta_{1_{IW[F]}} & \sim Gaussian(\mu_2, \sigma_2) \\ \beta_{2_{IW[T]}} & \sim Gaussian(\mu_3, \sigma_3) \\ \beta_{2_{IW[F]}} & \sim Gaussian(\mu_4, \sigma_4) \\ \sigma & \sim Cauchy_{+}(0, \sigma_5) \end{align} \]

Centring numeric predictors

PhonLev is numeric, so for the \(\beta_{1_{IW[i]}}\) priors, we need to think about the mean log-RT when PhonLev is 0.
Let’s centre PhonLev, so that we need to think about the mean log_RT when PhonLev is at its mean.

mean(mald$PhonLev)

[1] 7.092255

mald <- mald |> 
  mutate(PhonLev_c = PhonLev - mean(PhonLev))

The model (revised)

\[ \begin{align} \text{RT_i} & \sim LN(\mu_i, \sigma) \\ log(\mu_i) & = \beta_{1_{IW[i]}} + \beta_{2_{IW[i]}} \cdot (\text{PhonLev}_i - mean(\text{PhonLev})) \\ \end{align} \] ## Prior for \(\beta_i\)

Let’s say again that the mean RT is between 500 and 2500 ms at 95% confidence. Now let’s log these.
In logs: log(500) = 6.2 and log(2500) = 7.8
Get \(\mu_1\) and \(\mu_2\)
- mean(c(6.2, 7.8)) = 7
Get \(\sigma_1\) and \(\sigma_2\)
- (7.8 - 7) / 2 = 0.4 (let’s round to 0.5)

\[ \begin{align} \text{RT_i} & \sim LN(\mu_i, \sigma) \\ log(\mu_i) & = \beta_{1_{IW[i]}} + \beta_{2_{IW[i]}} \cdot (\text{PhonLev}_i - mean(\text{PhonLev})) \\ \beta_{1_{IW[T]}} & \sim Gaussian(7, 0.5) \\ \beta_{1_{IW[F]}} & \sim Gaussian(7, 0.5) \\ \end{align} \]

Prior for \(\beta_2\)

Let’s assume no expectation about the effect of PhonLev, apart from that can be negative or positive or null, and not very large.
\(Gaussian(0, 0.1)\).
- This means we are 95% “confident” that the effect of PhonLev on log-RTs is between -0.2 and +0.2 for each unit increase of PhonLev.
- 800 * exp(0.2) = 800 * 1.22 = 976 i.e. a (976 - 800 =) 176 ms increase per unit increase. (Also 800 * (1.22 - 1)).

Prior predictive checks

brm_4b_priors <- c(
  prior(normal(7, 0.5), class = b, coef = IsWordFALSE),
  prior(normal(7, 0.5), class = b, coef = IsWordTRUE),
  prior(normal(0, 0.1), class = b, coef = `IsWordFALSE:PhonLev_c`),
  prior(normal(0, 0.1), class = b, coef = `IsWordTRUE:PhonLev_c`),
  prior(cauchy(0, 0.02), class = sigma)
)

brm_4b_priorpp <- brm(
  RT ~ 0 + IsWord + IsWord:PhonLev_c,
  family = lognormal,
  prior = brm_4b_priors,
  data = mald,
  sample_prior = "only",
  cores = 4,
  file = "data/cache/brm_4b_priorpp",
  seed = my_seed
)

Prior predictive checks plot

conditional_effects(brm_4b_priorpp, "PhonLev_c:IsWord")

Run the model

brm_4b <- brm(
  RT ~ 0 + IsWord + IsWord:PhonLev_c,
  family = lognormal,
  prior = brm_4b_priors,
  data = mald,
  cores = 4,
  file = "data/cache/brm_4b",
  seed = my_seed
)

Model summary

brm_4b

 Family: lognormal 
  Links: mu = identity; sigma = identity 
Formula: RT ~ 0 + IsWord + IsWord:PhonLev_c 
   Data: mald (Number of observations: 3000) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
IsWordTRUE                6.85      0.01     6.84     6.86 1.00     4894
IsWordFALSE               6.97      0.01     6.95     6.98 1.00     5171
IsWordTRUE:PhonLev_c      0.03      0.01     0.02     0.04 1.00     5139
IsWordFALSE:PhonLev_c     0.02      0.01     0.01     0.03 1.00     5870
                      Tail_ESS
IsWordTRUE                3055
IsWordFALSE               3022
IsWordTRUE:PhonLev_c      3388
IsWordFALSE:PhonLev_c     3001

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.28      0.00     0.27     0.29 1.00     4232     3160

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior predictive checks

pp_check(brm_4b, ndraws = 50)

Conditional posterior probabilies

conditional_effects(brm_4b, "PhonLev_c:IsWord")