Example of collider bias

Let’s simulate some data. This represents academic papers, which have rigor and innovativeness. These are normally distributed, centered at 0, and are independent of each other.

Reviewers have a threshold for what they will accept for publication. As we all know, whether a given paper is accepted is due in part to luck. In this world, papers are judged on their rigor and innovativeness. We add those together, and if the sum is greater than a number randomly chosen between 0 and 2, the paper is published.


N = 1000

df = tibble(
  rigor = rnorm(N, 0, 1),
  innovativeness = rnorm(N, 0, 1),
  threshold = runif(N, 0, 2),
  published = (rigor + innovativeness) > threshold

Let’s visualize the relationship between rigor and innovativeness. As expected, there is no relationship between the two.

df |>
  ggplot(aes(x = rigor, y = innovativeness)) +
  geom_point(aes(color=published)) +
  geom_smooth(method = "lm") +
`geom_smooth()` using formula = 'y ~ x'