The Formation and Growth of Collaborative Online Groups

Jeremy Foote
Northwestern University

Purdue University
January 17, 2018

The Big Questions

How (and why) do new groups form?

Why do some groups succeed?

The Takeaways

  • New types of organizations
  • Coordination without structure
  • Data science complements organizational communication research

The Plan

  • Theories of group formation
  • Founding processes
  • Early-stage communication networks
  • Joining processes + group ecology
  • Q & A

Theories of group formation

Many groups undergo similar processes

Constraints differ in different contexts

  • Group goals and outputs
  • Input resources and skills
  • Incentives for participation
  • Task interdependencies
  • Communication and coordination costs

Online peer production organizations have a unique set of constraints

  • No direct financial incentives
  • Almost no formal hierarchy or roles
  • Text-based communication

Most theories are focused on explaining processes post-formation

  • Forming, Storming, Norming, Performing (Tuckman, 1965)
  • Input-Process-Output models (Ilgen et al., 2005)
  • Decision development in small groups (Poole, 1989)

Data science can provide new insight into organizational processes

  • Typical methods:
    • Ethnography
    • Surveys
    • Content analysis
    • Experiments
  • Data science approaches:
    • Traces of behavior
    • Large-scale testing of organizational theories
    • Analyses of ecosystems of organizations
    • Generation of hypotheses

Small and early-stage online groups are important research sites

Founding processes

Who starts new communities?

Foote and Contractor. (2018). The behavior and network position of peer production founders. Lecture Notes in Computer Science.

Starting new organizations

  • Entrepreneurs:
    • Have more diverse experience than others (Backes-Gellner & Moog, 2013)
    • Are more likely to have worked with entrepreneurs (Nanda & Sørensen, 2010)
  • Successful entrepreneurs:
    • Have more experience (Cassar, 2014)
    • Have large, diverse social networks (Stam et al., 2014)


We examined the behavior and network position of ~61,000 wiki editors

Timeline of data collection

Network graph of the Spongebob wiki from Wikia

Many founders are learning the system

  • Nearly 90% of wikis were founded by new users
  • ~1% of existing users founded a wiki

Non-newbie founders are more active with more diverse experience, but at the periphery of social networks

Overall, past behavior and networks have little relationship with community growth

How do people perceive the communities they start?

Foote, Gergle, and Shaw. (2017). Starting Online Communities: Motivations and Goals of Wiki Founders, CHI 2017

Previous research typically treats small communities as failures

Learning from founders

  • 300+ founders responded about their:
    • Motivations
    • Goals
    • Experience

Top goals

  • High-quality information
  • Long-lasting community
  • High-growth community

Most projects are on niche topics for small communities

Projected contributors after 30 days

Organizational Implications

  • Low costs → organizational diversity
  • Small projects are not failures
  • Insight via platform-wide methods

Early-stage community structures

Moving back to post-founding processes

Integrative work groups are more productive and successful

  • Many theories suggest that integrative groups with low hierarchy should be successful:
    • Coordination
      • Information flow (Katz, 2005)
      • Transactive memory / Shared mental models (Wegner, 1985; Mathiew, 2000)
    • Social Integration
      • Legitimate peripheral participation (Lave and Wenger, 1991)
      • Group identity (Scott, 2007)

Integrative structures identified with social network analysis correlate with success

  • Low hierarchy, few people on the periphery (Cummings and Cross, 2003)
  • High density (Balkundi and Harrison, 2006)

Early-stage peer production communities should benefit even more from integrative networks

  • Coordination
    • Unclear goals and norms
    • No formal hierarchy to organize work
  • Socialization
    • Only text-based communication
    • No financial incentives
    • Very low barriers to exit

Hypothesis: Integrative communication networks should be associated with successful peer production communities


Edits taken from wiki talk pages

Converted to edit summaries

"anon"  "articleid" "date_time" "editor"  "minor" "namespace" "title" "tokens_added"  "tokens_removed"
"FALSE" 1461  "2009-05-29 17:06:15" "CreateWiki script" "TRUE"  0 "Glee TV Show Wiki" 0 0
"FALSE" 1461  "2009-05-29 17:15:43" "Eurekapedia" "FALSE" 0 "Glee TV Show Wiki" 861 0
"TRUE"  1461  "2009-05-29 18:56:32" ""  "FALSE" 0 "Glee TV Show Wiki" 306 81
"FALSE" 1461  "2009-09-14 22:19:33" "Angela"  "TRUE"  0 "Glee TV Show Wiki" 0 0
"FALSE" 1461  "2009-09-17 00:29:25" "Oompa-Loompa"  "FALSE" 0 "Glee TV Show Wiki" 301 1036
"FALSE" 1461  "2009-09-21 16:58:53" "Eurekapedia" "FALSE" 0 "Glee TV Show Wiki" 76  0
"TRUE"  1461  "2009-09-21 19:24:49" "" "FALSE" 0 "Glee TV Show Wiki" 8 13
"FALSE" 1461  "2009-09-21 20:44:54" "Oompa-Loompa"  "FALSE" 0 "Glee TV Show Wiki" 39  35
"FALSE" 1461  "2009-09-22 17:12:05" "Eurekapedia" "FALSE" 0 "Glee TV Show Wiki" 17  0
"FALSE" 1461  "2009-09-27 22:16:52" "Jbbdude" "FALSE" 0 "Glee TV Show Wiki" 25  0
"FALSE" 1461  "2009-10-19 17:37:33" "Eurekapedia" "FALSE" 0 "Glee TV Show Wiki" 35  0
"FALSE" 1461  "2009-11-27 22:03:05" "Oompa-Loompa"  "FALSE" 0 "Glee TV Show Wiki" 18  19
"FALSE" 1461  "2009-11-28 02:20:08" "Homer-simpson" "FALSE" 0 "Glee TV Show Wiki" 0 0
"FALSE" 1461  "2009-12-04 17:58:06" "Oompa-Loompa"  "FALSE" 0 "Glee TV Show Wiki" 167 19
"FALSE" 1461  "2009-12-04 20:17:37" "HeartOfOblivion" "TRUE"  0 "Glee TV Show Wiki" 7 180
"FALSE" 1461  "2009-12-10 00:38:13" "BroadwayBoy22" "FALSE" 0 "Glee TV Show Wiki" 106 30
"FALSE" 1461  "2009-12-10 01:45:44" "Homer-simpson" "FALSE" 0 "Glee TV Show Wiki" 5 2
"FALSE" 1461  "2009-12-10 20:22:18" "BroadwayBoy22" "FALSE" 0 "Glee TV Show Wiki" 550 9
"FALSE" 1461  "2009-12-10 23:47:54" "Oompa-Loompa"  "FALSE" 0 "Glee TV Show Wiki" 49  49
"FALSE" 2021  "2009-09-28 15:46:59" "Jbbdude" "FALSE" 0 "Glee"  6 952
"TRUE"  2022  "2009-05-29 19:43:32" ""  "FALSE" 0 "Pilot" 1584  0
"FALSE" 2022  "2009-09-17 00:23:02" "Oompa-Loompa"  "FALSE" 0 "Pilot" 1953  1511
"TRUE"  2022  "2009-09-17 19:59:56" ""  "FALSE" 0 "Pilot" 5 5

We use measures of success taken from our survey research

  • Snapshot of activity up to 700 edits used to predict:
  • Productivity
    • Number of words added in first 700 edits
  • Survival
    • How long until a wiki goes 30 days without multiple editors

Communication networks from 76,000 online wikis

  • Look at all talk edits at the point the wiki had 700 edits (~1,200 wikis)
  • Created communication networks for each wiki:
    • People “talk to” the last 5 editors of a page
    • People also “talk to” the owner of a user talk page
    • Tie from A to B exists if A talks to B 2+ times
  • Kept networks where 4+ people were in a network and networks were complex enough to build measures (1,002 wikis)

Network measures: Density and Hierarchy

Density: proportion of possible ties that exist

Hierarchy: proportion of non-cyclical paths

Network measures: Core members and Centralization

Core Ratio: proportion of members with coreness > 2

Betweenness Centralization: gini of betweenness centrality scores

Degree Centralization: gini of indegree centrality scores


  • Speed to reach 700 edits
  • Number of contributors
  • Inequality in contribution amount
  • Number of talk edits
  • Size of network
  • Average edge weight
  • Months since founding


What should we expect?

Bootstrapped 95% confidence interval for β coefficents

Relationships between communication structures and productivity

Bootstrapped 95% confidence interval for β coefficents

There is basically no relationship between communication structures and productivity

There is basically no relationship between communication structures and survival

Bootstrapped 95% confidence interval for β coefficents

These results are not due to collinearity or sample size


Threats and Limitations

  • A single platform
    • Wikis represent information aggregation rather than creation
  • Trace data interactions may be biased measures of relationships

Social structures are less important drivers of group processes


Communication can happen through comments on the artifact

Coordination and learning through transparent processes

Stigmergic coordination

Stigmergic coordination on Wikia

Social Integration

Contributors may not need integrative networks

  • Learning can happen through the artifact
  • Participants are volunteers
  • Communication networks are opaque
    • People are de-motivated by being on the periphery

Organizational Implications

  • Constraints change the importance of social relationships (and communication)
  • Peer production and well-defined outputs (Hill, 2011)

Current and Future Work

How do perceptions of the environment influence decisions and how do decisions recreate environments?

Organizational Communication +
Collective Action +
Human-computer Interaction

The Takeaways

  • New types of organizations
  • Coordination without structure
  • Data science complements organizational communication research



Aaron Shaw
Mako Hill
Darren Gergle
Noshir Contractor
Nate TeBlunthuis
Seungyoon Lee
Institutional support


Book Chapters

Working Papers

  • Foote, J., Shaw, A., Hill, B.M. Early-stage communication networks of productive and long-lived online volunteer communities (Presented at ICA, ASA, OCMC, Sunbelt).
  • Foote, J., Hill, B. M., TeBlunthuis, N., Shaw, A. An agent-based model of online community joining (Presented at ICA, IC2S2, OCMC).
  • Lee, S., Foote, J., Zhao, S., French, D. The intersection of groups and individuals: Effects of adolescents’ perception of peer groups on future network structure.
  • Treem, J., Foote, J., van den Hooff, B. Enterprise Social Media as a Multifunction Public Good: The Role of Perceived Critical Mass in Motivating Differential Use.


Productivity Model

Survival Model

Robustness tests

Cutoff @ 500

Robustness tests

Cutoff @ 900

Robustness tests

Dichotomize edges @ 3

Coreness description

Degenerate graph example

Density with size quartiles

Upcoming projects

Organizational ecology looks at how environments influence organizational outcomes

  • Organization-level analysis
  • However, environments also influence individual decisions

Online communities present a unique opportunity to study the relationship between environment and group processes

  • Platforms of thousands of communities
  • Detailed individual-level trace data across communities

Project 1: How do adolescents perceive peer groups and how does this influence which groups they join?

Project 2: Conditions for participation in enterprise social media

  • Enterprise social media provide both “connective” and “communal” goods
  • Combination of interviews and agent-based simulation

Project 3: Exposure and joining processes in Reddit

Project 4: Ecology of Online Groups

  • Data science approach to relationships between groups
  • How does growth in one group relate to growth in another?