```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) knitr::opts_knit$set(root.dir = './') source("resources/preamble.R") ``` Understanding the emergence of new and small online communities

Jeremy Foote
Purdue University ::: notes Mention CDSC and my students :::# Intro ## Group size distributions vary Universities are log-normally distributed ```{r, echo=FALSE, message=F, cache = T, fig.height=5} library(tidyverse) dir = "/home/jeremy/Projects/exposure_and_joining/presentation/" df = read.csv(paste0(dir, './university_sizes.csv')) df %>% filter(!is.na(students_count)) |> ggplot() + geom_histogram(aes(x=students_count, y=stat(count/sum(count))), fill='#ff5f05', color='#13294b', bins = 40) + scale_x_continuous(trans = 'log', breaks = 10 ^ (1:5), labels = round) + ylab('Proportion of universities') + xlab('Students per university') + theme_minimal() ```::: notes X axis is logged - most universities are small We can imagine constraints that lead to this distribution ::: ## Online group sizes have long-tail distributions ```{r, echo=F, message=F, cache = T, fig.height=5} df = read.csv(paste0(dir, '../data/reddit_users_per_subreddit_gt4_comments_201701.csv')) df %>% ggplot() + geom_histogram(aes(x=n, y=stat(count/sum(count))), # Fill with Purdue gold fill = '#CEB888', color = 'black', bins = 40) + scale_x_continuous(trans = 'log', breaks = 10 ^ (1:6), labels = round) + ylab('Proportion of subreddits') + xlab('Unique authors per subreddit (at least 5 comments)') + theme_minimal() ```::: notes Online communities are much more skewed; most are very small but a few are very large Brings up 2 questions - What about this context leads to this distribution? Why do some communities grow while most don't ::: ## Why do some online communities grow? {auto-animate="true"} ::: notes Four projects that address this question in different ways Not all designed to answer this question, but have implications for it ::: ## Why do some online communities grow? {auto-animate="true"} The efforts of devoted founders?
## Why do some online communities grow? {auto-animate="true"} Integrative early networks? The efforts of devoted founders?

## Why do some online communities grow? {auto-animate="true"} Attributes of the information space? Integrative early networks? The efforts of devoted founders?

## Why do some online communities grow? {auto-animate="true"} Benefits of smallness? Attributes of the information space? Integrative early networks? The efforts of devoted founders?

# The role of founders ::: notes First study - plan is to talk for ~10 minutes about each with time for questions and discussion ::: ## Two survey studies, on Wikia and Reddit - There are hundreds of attempts to start new online communities every day - Why do people start them? - What can they do to help them grow? ## Methods - Survey of ~600 Wikia founders (2016) and ~900 Reddit founders (2023) - Contacted directly after founding - Asked about motivations, goals, and plans - On Reddit, also tracked community outcomes ## Foundings are cheap and easy  ## Founders have diverse motivations and goals  ## They aren't trying to start giant communities  ## Founders motivations and plans predict outcomes - Founders motivated by topical interest get more visitors, contributors, and subscribers - Founders who planned to raise awareness get more visitors, contributors, and subscribers # Early Network Structures ## Integrative workgroups are more successful - Low hierarchy, high density, and few peripheral members - Coordination benefits - Transactive Memory / Shared Mental Models - Information sharing - Social integration - Legitimate peripheral participation - Group identity ## Wikis should also benefit from integrative networks - Similar to workgroups - Group of people working together on a shared task - Need to coordinate and make decisions - Also - Anonymous - Text-based, asynchronous communication - Little formal hierarchy - Low barriers to exit ::: notes Attributes of wikis should make integrative networks even more important Both coordination and social integration are more difficult, so we'd think that groups that are tight-knit would be more productive and more likely to survive ::: ## Methods - ~1,000 new wiki communities - Look at first 700 edits - Edges between editors who edited the same page or edited a user talk page - Measure network structure - Density, centralization, hierarchy, and coreness ## Network structures don't matter! {auto-animate="true"} > - No relationship between network structure and productivity  ## Network structures don't matter! {auto-animate="true"} > - No relationship between network structure and community survival  ## Why? - Some guesses - Specificity of topic (less need for coordination) - Shared artifact helps to coordinate (stigmergy) - Why is social integration not important? ::: notes Pause for questions ::: # Attributes of the information space ## Enteprise social media are multifunctional public goods - Public goods in economic sense - non-rivalrous and non-excludable - Fulk et al. (1996) argue that electronic communication systems are public goods - Connective goods - Mostly one-to-one communication - Communal goods - Shared information repositories ::: notes Non-rivalrous - doesn't get used up Non-excludable - can't keep people from using it ::: ## Why do many ESMs fail? - How do people decide whether to use them? ## Interview + agent-based modeling - 39 interviews with employees after the introduction of an ESM - Based on findings, we developed an agent-based model ## People use ESM based on whether they perceive critical mass - Connective good - Perceive whether others respond to posts or accept friend requests - Communal good - Determine whether information helps them do their work ## Agent-based model - Agents and an information space - Each timestep, agents probaliistically - Decide whether to be active - Whether to try to connect with others - Whether to try to find information - Whether to add information - Information probabilistically decays ## Agent-based model - If successful, they are more likely to be active and to try the given task again ::: fragment { width=60% } ::: ## The contours of the information space are important  ## The contours of the information space are important - Information space can act as semi-permanent store of value - Encourages activity which can support connective goods - Decay rate was very important, hard to support an information community that is quickly outdated - Perceptions of what others are doing is important # Benefits of smallness ## Are small communities failures? - Size is usually seen as a proxy for success - But most communities start small and stay small ::: fragment { width=80% } ::: ## Interview study of small community participants - 20 interviews with participants in small communities - Asked about - Experiences in the small community - Experiences in other online communities - Reflections on differences between the two ## Small communities are different - Highly specialized (you know what to expect) - Places to find experts - Less negativity (trolls ignore small communities) - Often offshoots of larger communities - Still few dyadic relationships - Provide sense of control in a chaotic information environment # Toward a theory of new and small communities ## How do new and small communities work? - Founding explores the space of possible communities - Very high competition in early days - Path dependencies and rich-get-richer dynamics - But also persistence of small communities ## How do new and small communities work? - Group identity is important - Dyadic relationships don't seem to be - Low costs of entry and exit - Lots of interdependence between communities (TeBlunthuis et al., 2023) ## Implications for organizational research - Shared artifacts can do lots of coordination work - Puzzle of why social integration isn't important - Online communities may identify people who are already eager to participate - Maybe early-stage members are the devoted early adopters - Need to develop new tools to understand interdependence ## Implications for organizational research - Online communities are a _rich_ context - Data about many organizations - Data about individuals moving between organizations - Rich text data ## Implications for platform design - New metrics for success and community health - More support for diverse types of communities ## Important open questions - What causes the skewed distributions of participation? - How does additional structure emerge as communities grow? - What is AI going to do to small communities? ## Thanks! Collaborators for this work: ::: collaborators         ::: # Extra slides ## Reddit founders regression   ## Early networks scatterplot 