Day 2

Dad Joke

  • Where do you take someone who has been injured in a hide-and-seek accident?
  • To the I.C.U.


  • Export HW to DOCX or PDF
  • Identifying a dataset
  • We will cover:
    • Reddit (briefly)
    • Twitter
  • Also, consider existing datasets
    • Link on wiki
  • How to lead discussions?
    • Powerpoint
    • “Handout”
    • Be creative
  • Need one more discussant on Dec 1

Plan for the day

  • Topic review
  • Problem set review

HW Review

  • Select respondents by bot
  • A few minutes to think/prepare
    • Identify things that you are still confused about
  • Person assigned will make sure there is a good response on Piazza

Wordplay project

Supplemental slides


  • Python holds things in RAM, and writes to secondary memory / disk
  • Running Python
    • Terminal
      • Interactive
    • Command line
    • Jupyter
  • Python is interpreted, not compiled


  • Common bugs
    • Syntax
    • Intermediate objects
    • Complexity
  • Strategies
    • Read through the code
    • Make things visible
    • Simplify


  • Name that stores data
  • Has different types, e.g.:
    • Strings
    • Integers
    • Floats
    • Lists
    • Dictionaries
  • Can’t start with a number
  • Usually written_like_this
  • If it’s not saved to a variable (or to disk), it’s gone!


  • Control flow
    • Run different parts of the code depending on the “state”
  • Conditionals depend on booleans
    • Expressions that evaluate to True or False
    • <, >, >=, <=, ==, !=, in, or not in
    • They are evaluated in order and can “short circuit”

Example of a conditional

#x = input("How many hours have you been working on this homework?")
x = 6
if x > 5 or xy == 45:
  print("That's too long!")
## That's too long!


  • Pieces of code that you want to reuse
    • Often take in “arguments”
    • They do something with the arguments, and often “return” something

Types of functions

  • Built in functions
    • print()
    • type()
  • Modules / libraries
    • import random
    • random.randint(1,5)

Example function

def exclaim(s):
  # Changes a normal string into an exclamation!
  s = s.upper()
  return s + '!!'
exclaim("Hello, everyone")

Bit by Bit


  • Blumenstock et al. example
    • Using call records of 1.5m people to estimate wealth
  • The world is being measured
    • It is very, very easy to store digital data, and more and more things are becoming digital
    • Computational capacity is also increasing quickly
  • This is ushering in a new era for social scientists

Research design

  • Ready-made vs custom made
    • Ready-made data was created for another purpose (e.g., digital trace data)
    • Custom made data is designed to answer the question (e.g., surveys or experiments)
  • Observing behavior
  • Asking questions
  • Running experiments
  • Creating mass collaborations


  • What did you think of the distinction between ready made and custom made data? Are there kinds of research that don’t fit neatly into one or the other? What other dimensions can we classify research along?
  • Are there different ethical concerns when using one type of data or the other?
  • What are the different analytic concerns for each type of data?