An agent-based model of online community joining

Jeremy Foote

Benjamin Mako Hill
U. of Washington

Nate TeBlunthuis
U. of Washington

Aaron Shaw

Research Agenda

How does the technical, social, and communication context influence collective action decisions (and vice versa)?

Foote and Contractor, “The behavior and network position of peer production community founders, iConference 2018

Starting Online Communities: Motivations and Goals of Wiki Founders, CHI 2017

Communication structures and performance of early-stage online volunteer communities, In preparation

How do we decide which groups to join?

Groups size distributions vary

Universities are log-normally distributed

Data from Wikidata

Online group sizes have long-tail distributions

The number of groups each user belongs to is also long-tail


What decision-making processes could produce these outcomes?

A simple model that produces long-tail distributions

Preferential Attachment (Barabási, 1999)

One small problem

A (more) believable model

Three processes

  • Consideration
  • Exposure
  • Decision

Agent-based modeling

  • Simplified, simulated system of interacting agents
  • Allows us to:
    • Explicitly model micro processes
    • Test the macro-level implications of theories
    • Move from equations to agents

Our agent-based model

  • Start with \(N\) potential contributors and \(X\) potential communities
  • Every month:
    • \(n\) users are chosen (consideration)
    • Each user is presented with an “exposure set” of \(x\) communities (exposure)
      • The user decides which communities to join (decision)

Simulations Results

Random processes aren’t consistent with empirical data

Random exposure processes cannot produce long-tail distributions

Word of mouth exposure through Communication Networks

Word of mouth exposure models look similar to Reddit

We can make the decision algorithm more realistic

  • Based on expected value of joining (Resnick et al., 2011)
  • People use past communities to fit a model of how large their current set will be
  • They prefer to:
    • Join communities early
    • Join large communities
  • But there are costs to join and to participate

A word of mouth exposure model plus a “learning” model looks roughly like Reddit


Agent-based modeling and theory-building

  • Simulation is a great tool for modeling organizational processes
  • Exposure processes are important but understudied
  • Communication networks provide a plausible mechanism

Future work can validate our models

  • Surveys and experiments
    • How are people exposed to new communities?
    • How do they decide which to join/leave?
  • Explaining other patterns of participation (e.g., plateau of group membership over time)

The End


Jeremy Foote

Benjamin Mako Hill

Nate TeBlunthuis

Aaron Shaw


Decision algorithms

  • Random
  • Biggest
  • Linear growth
  • Recent growth
  • Learning model

Learning model

  • Agents fit a regression model on previously joined communities
  • They then predict the growth of their current choice set



  • Participation benefits (PB)
    • \(PB = log(S + 1)\)
  • Early adopter benefits (EA)
    • \(EA = log(S + 1)/log(i + 1)\)
  • Cost = \(X\)
  • \(U = PB + EA - C\)

Learning model:

\[\begin{align} log(growth) &=\beta_0 + \beta_1 \times S_j + \beta_2 \times A_j \\\ &+ \beta_3 \times (A_c - A_j) + \beta_4 \times S_c \end{align}\]
  • \(S_c\) = current size, \(A_j\) = age when joined
  • Predicted size after 6 months

All models - community sizes

All models - communities per person

A few problems

  • Doesn’t explain how people estimate future growth
  • Not derived from empirical data
  • Only applies to single decision

Adding communities

  • Adding communities every month is a start

But it’s not enough


Adding popularity

  • The choices that people consider are weighted by popularity

Limitations and challenges