Jeremy Foote

Northwestern University

Benjamin Mako Hill

University of Washington

Nate TeBlunthuis

University of Washington

Data from Wikidata

While reddit looks like this:

Data from Stuck_in_the_Matrix on BigQuery

- One model proposed as source of long-tail distributions is preferential attachment (Barabási, 1999)
- People probabilistically join groups based on their relative size

- It’s not realistic
- Observed distributions often aren’t power-law (Broido and Clauset, 2018)
- People don’t make decisions in this way

- Resnick et al. (2011)
- People estimate the benefits of joining
- Participation benefits (information, friendship)
- Based on estimate of future growth
- Early adopter benefits (influence)
- Estimate the costs (new technology, learning new norms, etc.)
- Join if benefits exceed costs

- Only applies to single decision
- Doesn’t explain how people estimate future growth
- Not derived from empirical data

- Basic structure
- Start with \(N\) potential contributors and \(X\) empty communities

- Every “month”:
- \(X\) new communities are created
- \(n\) users are chosen at random
- Each user is presented with a “choice set” of \(x\) communities
- Choice sets are randomly chosen, weighted by community size
- The user estimates the future size and joins the community with the greatest expected utility

- Random
- Biggest
- Linear growth
- Recent growth
- Learning model

- Agents fit a regression model on previously joined communities
- They then predict the growth of their current choice set

- Better for disproving than proving
- Other models could match just as well

- Choices about where and how to simplify

- Surveys and experiments
- Do past experiences influence decisions?
- What do actual “choice sets” look like?

- Modeling interests explicitly
- Explaining longitudinal patterns of participation

- We don’t need preferential attachment*
- A Resnick et al. model can produce realistic distributions
- But it’s very sensitive to how people make estimates
- Only learning models produced realistic outcomes
- Likely due to heterogeneity in expectations

Jeremy Foote

@jdfoote

Benjamin Mako Hill

@makoshark

Nate TeBlunthuis

@groceryheist

- Participation benefits (PB)
- \(PB = log(S + 1)\)

- Early adopter benefits (EA)
- \(EA = log(S + 1)/log(i + 1)\)

- Cost = \(X\)
- \(U = PB + EA - C\)

- \(S_c\) = current size, \(A_j\) = age when joined
- Predicted size after 6 months

–> –> –> –>

–> –> –> –> –> –> –> –>

–>