The Formation and Growth of Collaborative Online Organizations

Jeremy Foote
Northwestern University / Purdue University

September 26, 2019

The Plan

Overall theoretical approach
Brief summary of three projects
Detailed summary of ABM project
Overall conclusions

The Big Question

Why do some collaborative online organizations succeed?

Collaborative Online Organizations (COO)

Examples

Why do some COO succeed?

Most COO do not succeed

COO size is highly skewed, with top organizations getting most of the contributions

There have been three main approaches

Approach 1: Why are Wikipedia and Linux so successful?

Why people contribute (Nov, von Krogh, Lakhani, Lampe)
Who contributes (Antin, Shaw and Hargittai)
How work is organized (Arazy, Butler, Crowston, Keegan, Matei, Zhu)

The weakness of Approach 1

Selecting on the dependent variable

Approach 2: Predicting COO outcomes based on membership, structure, and design

The weakness of approach 2

Groups are not independent

Approach 3: Online Organizational Ecology predicts outcomes based on community-level relationships

The weakness of approach 3

Organizational ecology treats organizations as agents and people as resources

Studying organizational outcomes using individual decisions

How do people allocate their efforts in a complex environment with lots of choices?

People are influenced by individual attributes, technology, and the state of the system

What is the “system”?

An earlier generation of communication scholars suggested “open systems” (Katz and Kahn, 1966; Rogers and Argawala, 1976; Farace, Monge, and Russell, 1977)
- A system takes in inputs, processes them, and produces outputs
- Systems are composed of subsystems and compose suprasystems (e.g., a firm composed of departments composed of work groups)

Digital trace data lets us study the relationship between people and systems

Earlier researchers had difficulty gathering the type of data needed for open systems approaches
COO data is:
- Fine-grained data about behavior and interactions
- Within and between groups
- Unobtrusive

Four projects on individual decisions in new collaborative online organizations

Project 1

Early-stage communication networks and community outcomes

Integrative work groups are more productive and successful

Many theories suggest that integrative groups with low hierarchy should be successful:
- Coordination
  - Information flow (Katz, 2005)
  - Transactive memory / Shared mental models (Wegner, 1985; Mathiew, 2000)
- Social Integration
  - Legitimate peripheral participation (Lave and Wenger, 1991)
  - Group identity (Scott, 2007)

Edits taken from wiki talk pages on Wikia

Relationships between communication structures and productivity

Bootstrapped 95% confidence interval for β coefficents

There is basically no relationship between communication structures and survival

Bootstrapped 95% confidence interval for β coefficents

Project 2

Why do people start new communities?

Foote, Gergle, and Shaw. (2017). Starting Online Communities: Motivations and Goals of Wiki Founders, CHI 2017

Previous research typically treats small communities as failures

A puzzle

Why do people keep starting communities if they are so likely to fail?

Learning from founders

300+ founders responded about their:
- Motivations
- Goals
- Experience

Top goals

High-quality information
Long-lasting community
High-growth community

Most projects are on niche topics for small communities

Projected contributors after 30 days

Project 3

Who starts new communities?

Foote and Contractor. (2018). The behavior and network position of peer production founders. Lecture Notes in Computer Science.

Starting new organizations

Entrepreneurs:
- Have more diverse experience than others (Backes-Gellner & Moog, 2013)
- Are more likely to have worked with entrepreneurs (Nanda & Sørensen, 2010)

Successful entrepreneurs:
- Have more experience (Cassar, 2014)
- Have large, diverse social networks (Stam et al., 2014)

We examined the behavior and network position of ~61,000 wiki editors

Timeline of data collection

Network graph of the Spongebob wiki from Wikia

Many founders are learning the system

Nearly 90% of wikis were founded by new users
~1% of existing users founded a wiki

Overall, past behavior and networks have little relationship with community growth

Project 4

These theories focus on individual level outcomes

Decision rules should predict
- Group level outcomes
- Population level outcomes
These are rarely tested

Online group sizes have heavy-tailed distributions

Data from Stuck_in_the_Matrix on BigQuery

The number of groups each user belongs to is also heavy-tailed

Data from Stuck_in_the_Matrix on BigQuery

A simple model that produces heavy-tailed distributions

Cumulative advantage (Merton, 1968; Barabási, 1999)
- Future activity levels are based probabilistically on current activity

Agent-based modeling

Simplified, simulated system of interacting agents
Allows us to:
- Explicitly model micro processes
- Test the macro-level implications of theories

Our agent-based model

Start with \(N\) potential contributors and \(X\) potential communities
- Every month:
  - Each user is presented with an “exposure set” of \(x\) communities (exposure)
    - The user decides which communities to participate in (decision)
Naive versions of each theory
Combined version

Simulation Results

Null model as baseline

Expected utility models are skewed but not heavy-tailed

A combined version is robust with community sizes roughly similar to reddit

Conclusion

Word of mouth exposure plus expected utility participation is a partial explanation for heavy-tailed community size distributions
Two main weaknesses
- No model was as skewed at both the head and tail
- No model explained the heavy tail of participation rates

Conclusion

Framework for theories to be informed by higher-level behavior
- Could test whether people actually share larger groups
- Are people more likely to join when they already belong to many COO?
Simulation can enrich social computing theories

Overall Implications

Systems of COO

Communities are interdependent
- Founders and joiners are influenced by state of other COO
- Past experience and luck can influence future behavior
COO data can help us understand these recursive processes

Small, temporary organizations

Most COO are intentionally small
- In aggregate, these are valuable
- Create narrow public goods without requiring oligarchy

Affordances matter

Low costs to join and leave and create COO
- Porous boundaries of COO
- Also influences individuals and populations

The Formation and Growth of Collaborative Online Organizations

The Plan

The Big Question

Collaborative Online Organizations (COO)

Examples

Why do some COO succeed?

Most COO do not succeed

There have been three main approaches

Approach 1: Why are Wikipedia and Linux so successful?

The weakness of Approach 1

Approach 2: Predicting COO outcomes based on membership, structure, and design

The weakness of approach 2

Approach 3: Online Organizational Ecology predicts outcomes based on community-level relationships

The weakness of approach 3

Studying organizational outcomes using individual decisions

People are influenced by individual attributes, technology, and the state of the system

What is the “system”?

Digital trace data lets us study the relationship between people and systems

Four projects on individual decisions in new collaborative online organizations

Project 1

Early-stage communication networks and community outcomes

Integrative work groups are more productive and successful

Integrative structures identified with social network analysis correlate with success

Edits taken from wiki talk pages on Wikia

Relationships between communication structures and productivity

There is basically no relationship between communication structures and survival

Project 2

Why do people start new communities?

Previous research typically treats small communities as failures

A puzzle

Learning from founders

Top goals

Most projects are on niche topics for small communities

Project 3

Who starts new communities?

Starting new organizations

We examined the behavior and network position of ~61,000 wiki editors

Many founders are learning the system

Non-newbie founders are more active with more diverse experience, but at the periphery of social networks

Overall, past behavior and networks have little relationship with community growth

Project 4

Social exposure and participation processes in online communities

Social computing research theorizes that people decide to participate in a group based on expected utility

People are exposed to groups via social ties

These theories focus on individual level outcomes

Online group sizes have heavy-tailed distributions

The number of groups each user belongs to is also heavy-tailed

A simple model that produces heavy-tailed distributions

Do social computing theories of exposure and participation decisions explain heavy-tailed participation?

Agent-based modeling

Our agent-based model

Simulation Results

Null model as baseline

Expected utility models are skewed but not heavy-tailed

Naive versions of social exposure are not skewed

A combined version is robust with community sizes roughly similar to reddit

Conclusion

Conclusion

Overall Implications

Systems of COO

Small, temporary organizations

Social motivations aren’t all that important

Affordances matter

The End

Appendix

Productivity Model

Survival Model

Robustness tests

Robustness tests

Robustness tests

Coreness description

Degenerate graph example

Density with size quartiles

ABM Project

Word of mouth results are fragile

Combined models are much less fragile