Data Quality & Testing – Some Thoughts from Others

I want to share two articles with you that I felt nicely threw spotlights on testing.

The first one, called TDD & “Professionalism” (I love that title!) by Jason Gorman, builds a Venn diagram based on 4 values, or 4 corner stones of what the author calls “professional”. A “professional”

  1. “doesn’t ship untested code”,
  2. “doesn’t make the customer wait for changes”,
  3. “minimises entropy in the code”, and
  4. “doesn’t write code the customer didn’t want”.

If you want to read the article and find out in detail what that is about, go ahead, I’ll wait.

I would love to tell you why I think that every single one of those 4 values is important in our jobs, too, and what exactly they mean for us.

“Doesn’t ship untested code”

At first, “doesn’t ship untested code” sounds a bit like “yeah, we know, you go on about testing quite a lot, yadda yadda yadda”, but when you read how Jason describes it — “it’s not a good thing to ship code that has been added or changed that hasn’t been tested since you added or changed it.” and “At the very least, it’s a courtesy to your customers.” — it sounds more like a “d’uh” kind of thing.

I’m actually sure all my colleagues do test. Unfortunately, I am also sure they do so using a browser, the console, and two or three F5s. The really good ones among us use more than one browser.

If we went for a Test-driven development (TDD) approach, we would automatically test our stuff, and I think we really should.

Or as Jason puts it:

There’s no magic bullet for rapidly testing (and re-testing) code. The only technique we’ve found after 70-odd years of writing software is to write programs that automate test execution.

“Doesn’t make the customer wait for change”

For Jason, this is mostly about release cycles being short.

Translated to our needs, it is either about using Tag Management — making yourself as independent as possible, or, in the worst case, it is just simply out of our hands.

The latter then turns into a political struggle, where our team should campaign for having shorter cycles, or a TMS.

Once you do have access to your code via a TMS, we’re back to you being able to quickly change things, and everything Jason writes applies.

“Minimises entropy in the code”

This is right down my alley, and I feel it is the big opportunity for us to improve: keeping the code clean and simple.

The rationaly behind is this: everytime we add to our code, we make it more complex. With complexity come problems like side-effects or that we simply might not understand pieces of our code anymore.

If that happens, we often have to break the second value, and we don’t want that.

Minimising entropy in our code is about keeping it clean, so that we can make quick changes for as long as possible. It is the opposite of patching spaghetti code.

But if I constantly clean and simplify my code, how do I know it still works? Well, the tests tell me, don’t they?

“Doesn’t write code the customer didn’t want”

Jason boils it down to cost. Don’t make your customer pay for stuff they don’t want.

Sounds like a pretty good rule to me.

And it helps with the entropy, the keeping it simple aspect.

There is a big cultural aspect to this, of course. As an analytics person, you might have opinions on what your business should look at. If you’re a consultant, you’re actually paid to have those opinions.

But in my experience, it makes more sense to convince people that they need something before I put it into place, rather than putting it in and then hoping they’ll need it.

This is a somewhat grey area.

The Testing V

The second article, by Tom, is about when testing misses the mark.

Tom describes an embarrassing situation, then comes up with the notion that testing happens on all levels, strategic down to function, and that all levels need testing, leading to a sort of v-shape of design and testing.

I agree with that wholeheartedly.

I also think that for us in the analytics industry, those levels exist, maybe a bit differently.

The top level would be about a given business requirement. Underneath that comes the question “what data do we need for that?”, and at the bottom level, there is Javascript that collects that data.

Notes

To make this really explicit: The first article is relevant in the context of deployment of analytics. The testing described here is about what the analytics tool collects. I am (for once!) not talking about giving developers a tool that helps keep data stable.

There are tools out there that help you with the testing.

You can probably write tools yourself. I wouldn’t be surprised if the Extensions mechanism in Launch leads to the birth of new tools, built right into your TMS.

This is about analysts taking responsibility for the quality of the data they create for the business. Nothing less.

There is a bonus, third, article by Jeremias Roessler, called Test Automation is not Automated Testing. Very true, and I have to admit that all of the stuff I have written in the past is indeed about test automation, not automated testing. But that’s fine! Test automation is already a big step!

I guess I don’t have to really say it, but I really like the last paragraph in the first article. It asks a very good question:

Like I said at the start, TDD isn’t mandatory. But I do have to wonder, when teams aren’t doing TDD, what are they doing to keep themselves in that sweet spot?

5 thoughts on “Data Quality & Testing – Some Thoughts from Others

  1. I am HUGE on the “Minimises entropy in the code”. I spend way, way too much of my time tracking backwords through complicated, convoluted code someone wrote in the past. If you are writing complex stuff, especially for analytics, you should always ask yourself these questions:
    – Is something this complicated really going to continue working for very long? Websites are in a constant state of change
    – Am I making enough notes that someone in the future is going to easily be able to figure out what is going on, and track all of the code dependencies and interactions?
    – Is the code this complicated because I’m trying to prove to people how smart of a developer I am, but I could do it simpler if I really needed to?
    – Is this something a junior level developer is going to be able to easily understand and update later on without having me to talk to? (most TMS maintenance/update work is not done by top level developers)

    Part of good coding is thinking through the long term maintenance needs and lifecycle, not just getting the code working for today.

    Like

    1. Exactly!

      There is also the evergreen “Is this complicated because I am trying to cover 100% of all imaginable edge cases, even though data in Analytics is far from being precise anyway”.

      Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.