My First Pull Request - Some Clever Stats Name

A couple of weeks ago I made my first open source pull request. This was something I’ve wanted to do for quite a while but didn’t really have the time because of school, work, a family, … But I finally decided to stop making excuses and just do it.

I chose the broom package because I was familiar with it and there are usually issues tagged as beginner friendly. These issues are meant to help novices like me to get their feet wet. I looked around a bit and found a lot of issues related to models I’m not familiar with. I didn’t want to start with something like that. But then I found issue # 53 and thought “well I can do that!” So I asked if I could tackle it and then set to work.

The issue was very simple: replace every instance of base R’s match.arg with the rlang version arg_match. This was a perfect first issue because it seemed so easy that there was no way I could screw it up. After looking around at the R files I quickly realized it would be tedious to look through them all until I found the relevant ones. So I decided to use the command line.

I had heard of grep before and knew what it was used for but never really found myself with a need for it. I was kind of excited to use the command line for something other than git or the basics like ls, pwd, etc. So after some reading up I used

grep -c -i 'match.arg' *

to generate a list of files and the number of times match.arg occurs in each file. I wanted to count the occurrences so that I didn’t bother with any files that had zero counts. There are a lot of R files and all of them were listed so there was still a bunch of noise. After some more Googling I found a post on Stack Exchange that helped me get rid of all the files that didn’t have any matches. The final command was

grep -c -i 'match.arg' * | grep -v ':0$'

It takes the output of the first command, a list of files and counts, and then passes that to another search that only returns the files that have counts greater than zero. I managed to figure this out pretty quickly so I thought I would take it to another level and also replace all the matches found. This led me to the sed command. I spent way too much time trying to figure out how to get this work work. Eventually I gave up because I didn’t want to waste any more time with it. So I took the list of files generated from the two stage grep, 12 in all, and use the find and replace feature of RStudio. This wasn’t as simple as I had hoped as I wanted to be able to do this with a few commands in Terminal but it got the job done and that’s what counts. In any case, it was a lot faster and easier than looking at every R file.

In terms of the GitHub stuff like branching, submitting a pull request, etc. I followed Nic Crane’s Ten Steps to Becoming a Tidyverse Contributor which is super helpful and encouraging. I had done some of this before but mostly on personal projects or small group projects from school with only two other people. Plus it had been a while since I’d done this so it was nice to have a well laid out refresher.

The pull request hasn’t been merged yet but hopefully will be soon. Actually, I went into a bit of a freak out because the continuous integration builds failed. But after taking a few breaths and having someone point out the builds failed because another reason, I had calmed down. I’m not sure where things go from here. As in, am I supposed to diagnose and fix the problem? I would like to help if I can but I’m not at all familiar with the package that caused the error. I’m sure the broom maintainer Alex Hayes has a lot of other things to worry about so for now I will just wait until he or someone else gets back to me.