An agent-based model of online community joining

Jeremy Foote
Northwestern

Benjamin Mako Hill
U. of Washington

Nate TeBlunthuis
U. of Washington

Aaron Shaw
Northwestern

Research Agenda

Foote and Contractor, “The behavior and network position of peer production community founders, iConference 2018

Starting Online Communities: Motivations and Goals of Wiki Founders, CHI 2017

Communication structures and performance of early-stage online volunteer communities, In preparation

How do we decide which groups to join?

Groups size distributions vary

Universities are log-normally distributed

Data from Wikidata

Online group sizes have long-tail distributions

Data from Stuck_in_the_Matrix on BigQuery

The number of groups each user belongs to is also long-tail

Data from Stuck_in_the_Matrix on BigQuery

Why?

What decision-making processes could produce these outcomes?

A simple model that produces long-tail distributions

Preferential Attachment (Barabási, 1999)

One small problem

A (more) believable model

Three processes

Consideration
Exposure
Decision

Agent-based modeling

Simplified, simulated system of interacting agents
Allows us to:
- Explicitly model micro processes
- Test the macro-level implications of theories
- Move from equations to agents

Our agent-based model

Start with \(N\) potential contributors and \(X\) potential communities
Every month:
- \(n\) users are chosen (consideration)
- Each user is presented with an “exposure set” of \(x\) communities (exposure)
  - The user decides which communities to join (decision)

Simulations Results

Random processes aren’t consistent with empirical data

Random exposure processes cannot produce long-tail distributions

Word of mouth exposure through Communication Networks

Word of mouth exposure models look similar to Reddit

We can make the decision algorithm more realistic

Based on expected value of joining (Resnick et al., 2011)
People use past communities to fit a model of how large their current set will be
They prefer to:
- Join communities early
- Join large communities
But there are costs to join and to participate

A word of mouth exposure model plus a “learning” model looks roughly like Reddit

Conclusion

Agent-based modeling and theory-building

Simulation is a great tool for modeling organizational processes
Exposure processes are important but understudied
Communication networks provide a plausible mechanism

Future work can validate our models

Surveys and experiments
- How are people exposed to new communities?
- How do they decide which to join/leave?
Explaining other patterns of participation (e.g., plateau of group membership over time)

The End

Thanks!

Jeremy Foote
@jdfoote

Benjamin Mako Hill
@makoshark

Nate TeBlunthuis
@groceryheist

Aaron Shaw
@aaronshaw

Appendix

Decision algorithms

Random
Biggest
Linear growth
Recent growth
Learning model

Learning model

Agents fit a regression model on previously joined communities
They then predict the growth of their current choice set

Utility

Participation benefits (PB)
- \(PB = log(S + 1)\)
Early adopter benefits (EA)
- \(EA = log(S + 1)/log(i + 1)\)
Cost = \(X\)
\(U = PB + EA - C\)

Learning model:

\[\begin{align} log(growth) &=\beta_0 + \beta_1 \times S_j + \beta_2 \times A_j \\\ &+ \beta_3 \times (A_c - A_j) + \beta_4 \times S_c \end{align}\]

An agent-based model of online community joining

Research Agenda

How do we decide which groups to join?

Groups size distributions vary

Online group sizes have long-tail distributions

The number of groups each user belongs to is also long-tail

Why?

What decision-making processes could produce these outcomes?

A simple model that produces long-tail distributions

One small problem

A (more) believable model

Three processes

Agent-based modeling

Our agent-based model

Simulations Results

Random processes aren’t consistent with empirical data

Random exposure processes cannot produce long-tail distributions

Word of mouth exposure through Communication Networks

Word of mouth exposure models look similar to Reddit

We can make the decision algorithm more realistic

A word of mouth exposure model plus a “learning” model looks roughly like Reddit

Conclusion

Agent-based modeling and theory-building

Future work can validate our models

The End

Thanks!

Appendix

Decision algorithms

Learning model

Utility

Utility

Learning model:

All models - community sizes

All models - communities per person

A few problems

Adding communities

But it’s not enough

Results

Adding popularity

Limitations and challenges

An agent-based model of online community joining

Research Agenda

How does the technical, social, and communication context influence collective action decisions (and vice versa)?

How do we decide which groups to join?

Groups size distributions vary

Online group sizes have long-tail distributions

The number of groups each user belongs to is also long-tail

Why?

What decision-making processes could produce these outcomes?

A simple model that produces long-tail distributions

One small problem

A (more) believable model

Three processes

Agent-based modeling

Our agent-based model

Simulations Results

Random processes aren’t consistent with empirical data

Random exposure processes cannot produce long-tail distributions

Word of mouth exposure through Communication Networks

Word of mouth exposure models look similar to Reddit

We can make the decision algorithm more realistic

A word of mouth exposure model plus a “learning” model looks roughly like Reddit

Conclusion

Agent-based modeling and theory-building

Future work can validate our models

The End

Thanks!

Appendix

Decision algorithms

Learning model

Utility

Utility

Learning model:

All models - community sizes

All models - communities per person

A few problems

Adding communities

But it’s not enough

Results

Adding popularity

Limitations and challenges