#rstats Data analysis ruminations Software Work

Same Developer, New Stack

I became a committer on the Apache Superset project. I’ve written previously about deploying Superset…

I’ve been fortunate to work with and on open-source software this year. That has been the case for most of a decade: I began using R in 2014. I hit a few milestones this summer that got me thinking about my OSS journey.

I became a committer on the Apache Superset project. I’ve written previously about deploying Superset at work as the City of Ann Arbor’s data visualization platform. The codebase (Python and JavaScript) was totally new to me but I’ve been active in the community and helped update documentation.

Those contributions were sufficient to get me voted in as a committer on the project. It’s a nice recognition and vote of confidence but more importantly gives me tools to have a greater impact. And I’m taking baby steps toward learning Superset’s backend. Yesterday I made my first contribution to the codebase, fixing a small bug just in time for the next major release.

Superset has great momentum and a pleasant and involved (and growing!) community. It’s a great piece of software to use daily and I look forward to being a part of the project for the foreseeable future.

I used pyjanitor for the first time today. I had known of pyjanitor‘s existence for years but only from afar. It started off as a Python port of my janitor R package, then grew to encompass other functionality. My janitor is written for beginners, and that came full circle today as I, a true Python beginner, used pyjanitor to wrangle some data. That was satisfying, though I’m such a Python rookie that I struggled to import the dang package.

These days I spend more time in Python than R. Which is fine by me: it’s been a pleasure to learn new things. My current work involves more extract-transform-loading (ETL) of data, where Python shines, than deep data analysis and statistics, where I believe R is superior. And learning Python has been handy for deploying and administering software written in Python such as Apache Airflow and Superset.

My janitor R package is mostly in maintenance mode. This is good – I can spend the time on other things. It still seems to be widely used but it mostly “just works” and will stay in its lane. That said, I’m pleasantly surprised that the community keeps contributing incremental improvements.

One of the reasons I took my current job two years ago was that I wanted fresh challenges in terms of both domain (municipal operations instead of K-12 ed) and technology. The job has delivered on both fronts. Learning new software – Python, Docker, Linux, SQL – has both kept me engaged and enabled me to accomplish things for the city that I couldn’t otherwise.

Five years ago I was deep into R packages and that community (largely based around the Twitter hashtag , which was my license plate at the time). Now I’m invested in Superset. I wonder what my tech stack will be in 2028? If my job still centers on using a computer to count things, I hope I’ll be using high-quality open-source tools and helping make them even better.

Leave a Reply

Your email address will not be published. Required fields are marked *