I recently taught a two-session workshop introducing R to Kellogg MBA students. I had a few goals for the workshops:
- Convince students of the benefits of using text-based programming for data exploration and analysis
- Introduce basic programming concepts (e.g., variables, functions)
- Give students a basic understanding of how to do some fundamental data analysis tasks in R: importing, cleaning, visualizing, and modeling
Those are really big goals for only four hours. I decided to use the tidyverse as much as possible and not even teach base R syntax like ‘[,]’, apply, etc. I used the first session to show and explain code using the nycflights13 dataset. For the the second session we did a few more examples but mostly worked on exercises using a dataset from Wikia that I created (with help from Mako and Aaron Halfaker’s code and data).
Overall, I think that the workshops went pretty well. I think that students definitely have a better understanding and a better set of tools than I did after I had used R for four hours!
That being said, there was plenty of room for improvement. I am scheduled to teach another set of workshops early next year and I’m planning to make a few changes:
- Make both of the workshops more hands-on and interactive. I think I'll divide the topics covered: the first workshop will be on importing, cleaning, and grouping data and the second will be on visualizing and creating inferential models.
- Get more help - teaching non-programmers R requires some hand-holding and individual attention. To be successful, I think a workshop like this requires 1 "TA" for every 8-10 students.
- Find a more relevant dataset. Although I actually learned a few things about my dataset that will help with my papers that use it, I think it would be better to have a dataset that is as similar as possible to what students will be working with in their careers.
- Connect the visualization and regression more directly to a specific analysis problem rather than as syntax-learning exercises.
Reuse this workshop!
I found some pretty good resources already in existence for introducing students to R, but none of them quite fit the scope of what I was looking for. All of the code that I used (as well as some slides for the beginning of class) are on github and GPL licensed. Please reuse my work and submit pull requests!