Categories
#rstats Data analysis ruminations Work

Reflections on five years of the janitor R package

One thing led to another. In early 2016, I was participating in discussions on the #rstats Twitter hashtag, a community for users of the R programming language. There, Andrew Martin and I met and realized we were both R users working in K-12 education. That chance interaction led to me attending a meeting of education data users that April in NYC.

Going through security at LaGuardia for my return flight, I chatted with Chris Haid about data science and R. Chris affirmed that I’d earned the right to call myself a “data scientist.” He also suggested that writing an R package wasn’t anything especially difficult.

My plane home that night was hours late. Fired up and with unexpected free time on my hands, I took a few little helper functions I’d written for data cleaning in R and made my initial commits in assembling them into my first software package, janitor, following Hilary Parker’s how-to guide.

That October, the janitor package was accepted to CRAN, the official public repository of R packages. I celebrated and set a goal of someday attaining 10,000 downloads.

Yesterday janitor logged its one millionth download, wildly exceeding my expectations. I thought I’d take this occasion to crunch some usage numbers and write some reflections. This post is sort of a baby book for the project, almost five years in.

By The Numbers

This chart shows daily downloads since the package’s first CRAN release. The upper line (red) is weekdays, the lower line (green) is weekends. Each vertical line represents a new version published on CRAN.

From the very beginning I was excited to have users, but this chart makes that exciting early usage seem miniscule. janitor’s most substantive updates were published in March 2018, April 2019, and April 2020, with it feeling more done each time, but most user adoption has occurred more recently than that. I guess I didn’t have to worry so much about breaking changes.

Another way to look at the growth is year-over-year downloads:

YearDownloadsRatio vs. Prior Year
2016-1713,284
2017-1847,3043.56x
2018-19161,4113.41x
2019-20397,3902.46x
2020-21 (~5 months)383,5956
Download counts are from the RStudio mirror, which does not represent all R user activity. That said, it’s the only available count and the standard measure of usage.
Categories
#rstats Making Work

That feeling when your first user opens an issue

You know how new businesses frame the first dollar they earn?

I wrote an R package that interfaces with the SurveyMonkey API. I worked hard on it, on and off the clock, and it has a few subtle features of which I’m quite proud. It’s paying off, as my colleagues at TNTP have been using it to fetch and analyze their survey results.

The company and I open-sourced the project, deciding that if we have already invested the work, others might as well benefit. And maybe some indirect benefits will accrue to the company as a result. I made the package repository public, advertised it in a few places, then waited. Like a new store opening its doors and waiting for that first customer.

They showed up on Friday! With the project’s first GitHub star and a bug report that was good enough for me to quickly patch the problem. Others may have already been quietly using the package, but this was the first confirmed proof of use. It’s a great feeling as an open-source developer wondering, “I built it: will they come?”

Consider this blog post to be me framing that dollar.