One hell of a release party
Apr. 23rd, 2020 07:08 amYesterday I delivered the ID lists, and checked against what the automated system would have submitted had it been hooked up. There were substantial differences, so I tracked those down and fixed the bugs. At the meeting with F we decided on a policy for dealing with IDs that had been used multiple times, and a possible change in Qualtrics that might cut down on the problem. They had also found a bug in their weights system that invalidated one of the files they’d already submitted, so we decided to re-run the affected aggregations accordingly. They also asked if CMU would attend a three-way meeting with F and another research group, M, who is handling the international survey going out today. They’ve never dealt with Qualtrics data before, and it made sense not to have them just repeat the last two weeks we spent fine-tuning the data processing pipeline with F. I got permission and constraints from R and L and everyone in the group who was involved in constructing mapping files, and tarred up all our data processing code under GPL-3 with a nice fat README to send over to help get them started. They won’t have the same problems we did with time zones or non-hierarchical geographical scopes, but they’ll have to deal with non-English locales and sampling issues with small regions.
I got a walkthrough of the current aggregations pipeline from L, then spent about four hours more than expected integrating it into the automation system. There was a bizarre problem adding the new filtering policy from F that mysteriously resulted in 1M of our 1.3M responses developing invalid household counts (normally less than 100k are invalid). I went to L for help after 1.5 of those hours, and.... the problem didn’t persist on his machine. Nor on my work desktop. So it’s just my laptop, sabotaging things. I finished debugging around 10:30pm, exported a fresh copy of all the aggregations, did spot checks on data for another half hour to make sure nothing was excessively out of whack, then dropped all the files in the magic API automation folder on the server. I poked automation bot for an off-cycle data import job, waited until it was done to verify nothing had exploded, then went the fuck to bed just before midnight.
I am not a night creature. Substantial credit to L for coaching me through being a neurotic mess. Proud of myself for asking for help well in advance of the deadline, though.