
Correlation is causation
University of Edinburgh
2026-05-20
“Correlation is not causation”
🎽 shorter clothing ↔︎️ more ice cream 🍨
🌡️ higher temperature ➡️ shorter clothing 🎽
🌡️ higher temperature ➡️ more ice cream 🍨
Graphical causal theory.
Nodes represent variables.
Edges (arrows) represent causal effects (directed).
Causality flows “linearly”, no circular causality (acylic).












# Adapted from https://solomon.quarto.pub/sr2/06.html#overthinking-simulated-science-distortion.
# Example from McElreath *Statistical Rethinking* (2nd ed).
set.seed(1914)
n <- 200 # Number of grant proposals
p <- 0.1 # Proportion to select
# Uncorrelated newsworthiness and trustworthiness
coll <- tibble(
newsworthiness = rnorm(n, mean = 0, sd = 1),
trustworthiness = rnorm(n, mean = 0, sd = 1)
) |>
# total_score
mutate(total_score = newsworthiness + trustworthiness) |>
# Select top 10% of combined scores
mutate(selected = ifelse(total_score >= quantile(total_score, 1 - p), TRUE, FALSE))

Regression for causation










\(P \not\!\perp\!\!\!\perp C\)
\(S \not\!\perp\!\!\!\perp C\)
\(S \not\!\perp\!\!\!\perp P\)
\(\perp\!\!\!\perp\) = “independent”
\(\not\!\perp\!\!\!\perp\) = “not independent”

\(P \not\!\perp\!\!\!\perp C\)
\(S \not\!\perp\!\!\!\perp C\)
\(S \perp\!\!\!\perp P | C\)
P is not independent of C
S is not independent of C
S is independent of P, conditional on C

\(P \rightarrow S\)
\(P \leftarrow C \rightarrow S\)

\(S \leftarrow C \rightarrow p\)
Backdoor recipe
List all of the paths connecting X (the potential cause of interest) and Y (the outcome).
Classify each path by whether it is open or closed. A path is open unless it contains a collider.
Classify each path by whether it is a backdoor path. A backdoor path has an arrow entering X.
If there are any open backdoor paths, decide which variable(s) to condition on to close it (if possible).

| Path | Open | Backdoor |
|---|---|---|
| \(W \rightarrow P\) | yes | no |
| \(W \leftarrow S \rightarrow A \rightarrow E \rightarrow P\) | yes | yes |
| \(W \leftarrow S \rightarrow A \rightarrow P\) | yes | yes |
| \(W \leftarrow S \rightarrow E \rightarrow P\) | yes | yes |
| \(W \leftarrow S \rightarrow E \leftarrow A \rightarrow P\) | no | yes |

A, E
S
lm(P ~ W + S)
![]()
It assumes DAG is correct.
![]()
Adjustment variables should be observable.
![]()
Complex systems require dynamic system modelling.

Correlation is causation (if you use causal inference).

Directed Acyclic Graphs (DAGs).

Choose your covariates carefully.