Categories
#rstats Data analysis ruminations Software Work

Same Developer, New Stack

I’ve been fortunate to work with and on open-source software this year. That has been the case for most of a decade: I began using R in 2014. I hit a few milestones this summer that got me thinking about my OSS journey.

I became a committer on the Apache Superset project. I’ve written previously about deploying Superset at work as the City of Ann Arbor’s data visualization platform. The codebase (Python and JavaScript) was totally new to me but I’ve been active in the community and helped update documentation.

Those contributions were sufficient to get me voted in as a committer on the project. It’s a nice recognition and vote of confidence but more importantly gives me tools to have a greater impact. And I’m taking baby steps toward learning Superset’s backend. Yesterday I made my first contribution to the codebase, fixing a small bug just in time for the next major release.

Superset has great momentum and a pleasant and involved (and growing!) community. It’s a great piece of software to use daily and I look forward to being a part of the project for the foreseeable future.

I used pyjanitor for the first time today. I had known of pyjanitor‘s existence for years but only from afar. It started off as a Python port of my janitor R package, then grew to encompass other functionality. My janitor is written for beginners, and that came full circle today as I, a true Python beginner, used pyjanitor to wrangle some data. That was satisfying, though I’m such a Python rookie that I struggled to import the dang package.

Categories
Data analysis Local reporting Software Work

Making the Switch to Apache Superset

This is the story of how the City of Ann Arbor adopted Apache Superset as its business intelligence (BI) platform. Superset has been a superior product for both creators and consumers of our data dashboards and saves us 94% in costs compared to our prior solution.

Background

As the City of Ann Arbor’s data analyst, I spend a lot of time building charts and dashboards in our business intelligence / data visualization platform. When I started the job in 2021, we were halfway through a contract and I used that existing software as I completed my initial data reporting projects.

After using it for a year, I was feeling its pain points. Building dashboards was a cumbersome and finicky process and my customers wanted more flexible and aesthetically-pleasing results. I began searching for something better.

Being a government entity makes software procurement tricky – we can’t just shop and buy. Our prior BI platform was obtained via a long Request for Proposals (RFP) process. This time I wanted to try out products to make sure they would perform as expected. Will it work with our data warehouse? Can we embed charts in our public-facing webpages?

The desire to try before buying led me to consider open-source options as well as products that we already had access to through existing contracts (i.e., Microsoft Power BI).

Categories
Biking DIY Gardening

The biggest thing I’ll ever tote on a bike

I have carried a lot of things on my cargo bike. It’s become a game: what unlikely object can I next transport via bicycle? I clearly remember the rush of hauling my first big item, a suitcase, five years ago. That load was liberating then, pushing the boundaries of what I could do, but now I wouldn’t think twice about it.

I returned this suitcase to Macy’s and went shopping at Briarwood Mall. October 2016.

Yesterday I reached my high score in this game, if you will. Like in a heist movie, I sought to pull off the world’s greatest job before taking it easy evermore. And I did it.

I’m not done hauling – I’ll still carry things on this bike every day – but during the record-breaking ride I swore that if I made it home without incident, I’d not try anything this big again. This is the tale of hauling a 275 gallon plastic tote, in a metal pallet, six miles across Ann Arbor.

Categories
Writing

I completed NaNoWriMo 2021 – but my story’s not done

The last time I sat down at the blog it was to declare that I was going to attempt to write a 50,000 word novel in November. Since then I’ve written a lot, just not here. To be precise, I met the NaNo word goal a day early and finished the month with 51,553 words in my story, writing substantially on each of November’s thirty days.

It was a blast! The story has tumbled out. At times I feel like I’m reading it as it materializes in front of me. It will definitely need editing, but I think I was right about having an interesting plot, and my prose has not been as wretched as I feared it might be. I type fast and my natural tendency is to be wordy in both my speech and writing, so NaNo let me play to my strengths and pile up the words.

(There is a metaphor that makes the rounds in NaNo circles along the lines of, writing your book is like building a sandcastle. The first draft is digging up the sand to work with. Don’t worry about the quality yet, just get it out so you can shape it as you revise.)

Lessons learned include:

  • The targets and progress tracking were hugely motivating. This, plus talking with people about what I was doing, was the magic of NaNo.
  • I’d thought dialogue would be hard to write. Turns out it flows much better for me than descriptions of scenery.
  • Beginning with an outline that described 25+ chapters was essential. Once a good idea for a chapter was in place I was comfortable telling its story in detail.
  • Many of these ideas and plot points occurred when I was walking my dog and would tell her the story. Now if I get something juicy, I take care to dictate to my phone so there’s no risk of forgetting it.
  • I had success with an old digital typewriter (an AlphaSmart Neo) I’d had lying around. I wrote everything on there, transferring it to a computer later. The featurelessness of the Neo deterred me from editing, which kept my words flowing, and it entirely blocked me from getting distracted by the internet.

Despite having 50,000 words, I’m not done writing my story. I want to finish it, in part because I want to know how it ends! (I know the general ending, but want to know the details I’ll only think of while writing).

I’m guessing I’m three-quarters done with the story and I fear that if I take a day off, I’ll lose steam. So I’m going to continue writing, setting a target of averaging 1,000 words a day for the first half of December. That would take me to 67k, which might be enough.

I guess if I’m not done at that point, I’ll keep going. During NaNo I averaged 1,700 words per day. Sometimes that was difficult, and I relied on a few vacation days where I racked up several thousand. But averaging 1,000 per day feels sustainable.

Then I’ll take a little break before I come back and re-read what I’ve written. Editing will be a whole ‘nother ordeal. But that’s for later. For now, here’s to my story – it ended up drawing on many of my interests, experiences, and dreams, and it’s a weird little story no one else could have written, for better and for worse.

P.S.: I typically edit blog posts for a while without making them better. One lesson I hope I’ve learned from NaNo is to rein that tendency in. So this post gets merely a quick read-through.

Categories
Life events

I ran a half-marathon!

On Sunday I ran the Dexter-Ann Arbor half marathon (DX-A2)! In my goal time of under two hours (1:56:44) and feeling good.

It all came together: enough training, perfect weather, and good strategy in terms of pace, nutrition, etc. I even got a bib number that was an omen of good fortune: 777. Going into the race, my longest run ever had been 10 miles just a couple weeks earlier. This Sunday was the longest run of my life.

Here I am, barreling toward the finish line.

The Race

It was hard to pick a pace target to aim for. An online calculator suggested that based on some old Turkey Trot 5K times, I could run 13.1 miles in 1:52:00, and I’m more fit now than I was in those races. On the other hand, most of my training mileage was at speeds of 10-10:30 per mile, so it seemed like a stretch to think I could maintain 8:35/mile for two hours. In the end, I shot for the classic target of sub-2 hours, and I’d felt good running big chunks of my long runs at that pace.

I didn’t want to go out too fast and jeopardize my chances of finishing, but it turns out I could have sped up. The race felt surprisingly easy, which felt bizarre then and still feels strange to type. I chatted with one of the 9:00/mi pacers during miles 6-12, agreeing around mile 10 that based on how I felt I should speed up in the last mile. My pace over my last 1.1 miles was more like 8:15/mi, uphill.

I like making new friends and it was fun to pass the time talking with my pacer, Mr. 1820.

I knew this race was a big deal for me, but I was surprised by how many friends and family encouraged me, and how much that meant to me. My wife and kids cheered me at the finish line (“daddy you ran so far, good job!”); my extended family asked questions and gave me props as I trained; my friends at the office and online congratulated me; and tons of strangers along the course shouted encouragement. Especially when I can look at runners who run faster and longer and think, maybe this wasn’t a big deal, it’s validating that friends and family show love.

Categories
#rstats Data analysis ruminations Work

Reflections on five years of the janitor R package

One thing led to another. In early 2016, I was participating in discussions on the Twitter hashtag, a community for users of the R programming language. There, Andrew Martin and I met and realized we were both R users working in K-12 education. That chance interaction led to me attending a meeting of education data users that April in NYC.

Going through security at LaGuardia for my return flight, I chatted with Chris Haid about data science and R. Chris affirmed that I’d earned the right to call myself a “data scientist.” He also suggested that writing an R package wasn’t anything especially difficult.

My plane home that night was hours late. Fired up and with unexpected free time on my hands, I took a few little helper functions I’d written for data cleaning in R and made my initial commits in assembling them into my first software package, janitor, following Hilary Parker’s how-to guide.

That October, the janitor package was accepted to CRAN, the official public repository of R packages. I celebrated and set a goal of someday attaining 10,000 downloads.

Yesterday janitor logged its one millionth download, wildly exceeding my expectations. I thought I’d take this occasion to crunch some usage numbers and write some reflections. This post is sort of a baby book for the project, almost five years in.

By The Numbers

This chart shows daily downloads since the package’s first CRAN release. The upper line (red) is weekdays, the lower line (green) is weekends. Each vertical line represents a new version published on CRAN.

From the very beginning I was excited to have users, but this chart makes that exciting early usage seem miniscule. janitor’s most substantive updates were published in March 2018, April 2019, and April 2020, with it feeling more done each time, but most user adoption has occurred more recently than that. I guess I didn’t have to worry so much about breaking changes.

Another way to look at the growth is year-over-year downloads:

YearDownloadsRatio vs. Prior Year
2016-1713,284
2017-1847,3043.56x
2018-19161,4113.41x
2019-20397,3902.46x
2020-21 (~5 months)383,5956
Download counts are from the RStudio mirror, which does not represent all R user activity. That said, it’s the only available count and the standard measure of usage.