The Formation and Growth of Collaborative Online Organizations

Jeremy Foote
Northwestern University / Purdue University

September 26, 2019

The Plan

  • Overall theoretical approach
  • Brief summary of three projects
  • Detailed summary of ABM project
  • Overall conclusions

The Big Question

Why do some collaborative online organizations succeed?

Collaborative Online Organizations (COO)


Why do some COO succeed?

Most COO do not succeed

  • COO size is highly skewed, with top organizations getting most of the contributions

There have been three main approaches

Approach 1: Why are Wikipedia and Linux so successful?

  • Why people contribute (Nov, von Krogh, Lakhani, Lampe)
  • Who contributes (Antin, Shaw and Hargittai)
  • How work is organized (Arazy, Butler, Crowston, Keegan, Matei, Zhu)

The weakness of Approach 1

  • Selecting on the dependent variable

Approach 2: Predicting COO outcomes based on membership, structure, and design

The weakness of approach 2

  • Groups are not independent

Approach 3: Online Organizational Ecology predicts outcomes based on community-level relationships

The weakness of approach 3

  • Organizational ecology treats organizations as agents and people as resources

Studying organizational outcomes using individual decisions

  • How do people allocate their efforts in a complex environment with lots of choices?

People are influenced by individual attributes, technology, and the state of the system

What is the “system”?

  • An earlier generation of communication scholars suggested “open systems” (Katz and Kahn, 1966; Rogers and Argawala, 1976; Farace, Monge, and Russell, 1977)
    • A system takes in inputs, processes them, and produces outputs
    • Systems are composed of subsystems and compose suprasystems (e.g., a firm composed of departments composed of work groups)

Digital trace data lets us study the relationship between people and systems

  • Earlier researchers had difficulty gathering the type of data needed for open systems approaches
  • COO data is:
    • Fine-grained data about behavior and interactions
    • Within and between groups
    • Unobtrusive

Four projects on individual decisions in new collaborative online organizations

Project 1

Early-stage communication networks and community outcomes

Integrative work groups are more productive and successful

  • Many theories suggest that integrative groups with low hierarchy should be successful:
    • Coordination
      • Information flow (Katz, 2005)
      • Transactive memory / Shared mental models (Wegner, 1985; Mathiew, 2000)
    • Social Integration
      • Legitimate peripheral participation (Lave and Wenger, 1991)
      • Group identity (Scott, 2007)

Integrative structures identified with social network analysis correlate with success

  • Low hierarchy, few people on the periphery (Cummings and Cross, 2003)
  • High density (Balkundi and Harrison, 2006)
  • Early-stage COO should benefit even more from integrative networks

Edits taken from wiki talk pages on Wikia

Relationships between communication structures and productivity

Bootstrapped 95% confidence interval for β coefficents

There is basically no relationship between communication structures and survival

Bootstrapped 95% confidence interval for β coefficents

Project 2

Why do people start new communities?

Foote, Gergle, and Shaw. (2017). Starting Online Communities: Motivations and Goals of Wiki Founders, CHI 2017

Previous research typically treats small communities as failures

A puzzle

Why do people keep starting communities if they are so likely to fail?

Learning from founders

  • 300+ founders responded about their:
    • Motivations
    • Goals
    • Experience

Top goals

  • High-quality information
  • Long-lasting community
  • High-growth community

Most projects are on niche topics for small communities

Projected contributors after 30 days

Project 3

Who starts new communities?

Foote and Contractor. (2018). The behavior and network position of peer production founders. Lecture Notes in Computer Science.

Starting new organizations

  • Entrepreneurs:
    • Have more diverse experience than others (Backes-Gellner & Moog, 2013)
    • Are more likely to have worked with entrepreneurs (Nanda & Sørensen, 2010)
  • Successful entrepreneurs:
    • Have more experience (Cassar, 2014)
    • Have large, diverse social networks (Stam et al., 2014)

We examined the behavior and network position of ~61,000 wiki editors

Timeline of data collection

Network graph of the Spongebob wiki from Wikia

Many founders are learning the system

  • Nearly 90% of wikis were founded by new users
  • ~1% of existing users founded a wiki

Non-newbie founders are more active with more diverse experience, but at the periphery of social networks

Overall, past behavior and networks have little relationship with community growth

Project 4

Social exposure and participation processes in online communities

  • How do people decide which groups to participate in?
    • Exposure processes + decisions processes

Social computing research theorizes that people decide to participate in a group based on expected utility

  • People estimate expected utility of joining based on future activity levels (Resnick et al.)
    • Join if expected benefits exceed expected costs

People are exposed to groups via social ties

  • Two categories of exposure to COO (Kraut et al.)
    • Impersonal exposure
    • Interpersonal exposure

These theories focus on individual level outcomes

  • Decision rules should predict
    • Group level outcomes
    • Population level outcomes
  • These are rarely tested

Online group sizes have heavy-tailed distributions

The number of groups each user belongs to is also heavy-tailed

A simple model that produces heavy-tailed distributions

  • Cumulative advantage (Merton, 1968; Barabási, 1999)
    • Future activity levels are based probabilistically on current activity

Do social computing theories of exposure and participation decisions explain heavy-tailed participation?

  • Possible cumulative advantage mechanisms
    • Expected utility is based on current size
    • Large COO have larger set of neighbors to share with
    • People in many communities have access to more neighbors

Agent-based modeling

  • Simplified, simulated system of interacting agents
  • Allows us to:
    • Explicitly model micro processes
    • Test the macro-level implications of theories

Our agent-based model

  • Start with \(N\) potential contributors and \(X\) potential communities
    • Every month:
      • Each user is presented with an “exposure set” of \(x\) communities (exposure)
        • The user decides which communities to participate in (decision)
  • Naive versions of each theory
  • Combined version

Simulation Results

Null model as baseline

Expected utility models are skewed but not heavy-tailed

Naive versions of social exposure are not skewed

A combined version is robust with community sizes roughly similar to reddit


  • Word of mouth exposure plus expected utility participation is a partial explanation for heavy-tailed community size distributions
  • Two main weaknesses
    • No model was as skewed at both the head and tail
    • No model explained the heavy tail of participation rates


  • Framework for theories to be informed by higher-level behavior
    • Could test whether people actually share larger groups
    • Are people more likely to join when they already belong to many COO?
  • Simulation can enrich social computing theories

Overall Implications

Systems of COO

  • Communities are interdependent
    • Founders and joiners are influenced by state of other COO
    • Past experience and luck can influence future behavior
  • COO data can help us understand these recursive processes

Small, temporary organizations

  • Most COO are intentionally small
    • In aggregate, these are valuable
    • Create narrow public goods without requiring oligarchy

Social motivations aren’t all that important

  • Founders cared more about the artifact than the community
  • New COO didn’t require integrative networks to be productive or survive
  • Explanations
    • Strong selection effects mean only the motivated join
    • Ease of leaving means dissenters leave

Affordances matter

  • Low costs to join and leave and create COO
    • Porous boundaries of COO
    • Also influences individuals and populations

The End


Productivity Model

Survival Model

Robustness tests

Cutoff @ 500

Robustness tests

Cutoff @ 900

Robustness tests

Dichotomize edges @ 3

Coreness description

Degenerate graph example

Density with size quartiles

ABM Project

Word of mouth results are fragile

Combined models are much less fragile