I haven’t written about data quality for ages! But the subject is as present as ever, and there is still such a long way to go! If Analytics has a big, bright pink elephant in the room, data quality is it!
Molly Vorwerck posted a link to this article in the #measure slack about The Right Way to Measure ROI on Data Quality a couple of months ago, and if it says “data quality” in the title, chances are I’ll read it.
What I took away from this specific article are the two metrics “time to detection” (TTD) and “time to resolution” (TTR). They make a lot of sense, and if you can put an amount on them, you’ll have much more leverage, as the article explains.
For me, reading about TTD was a bit of a light bulb moment!
This is precisely what I always had in mind when I thought/spoke/wrote about why and where a testing framework would make most sense.
Your TTD, if you do not have any testing tool or framework can easily be weeks, and sometimes, with Analytics, errors will not be detected at all. Frankly, that is unacceptable!
We are in this because data is what we use to make informed choices! We want to use data! We must be able to trust our data!
So, what can we do?
“To infinity and beyond!”
Well, we ideally want TTD to be as small as possible.
Your TTD, if you are using an end-to-end tool (ObservePoint would be a well-known example), can be closer to days, or even minutes. That is a huge step forward!
Whatever you determine your cost is for bad data, you have just cut it down to maybe 1/10, probably even less than that.
A lot of people spend money on ObservePoint and similar tools, because overall, their data will be more valuable. They pay money so that they can reasonably say that they trust their data.
Can we go further? Yes, we can!
Your TTD, if you are using a tool for testing in UAT or during integration test (DataTrue would be an example for such a tool), or if you have a regression test that runs before go-live, is close to 0.
That is where I would want to be. Make sure that if it is wrong, it cannot go live undetected.
When I built the “Site Infrastructure Tests” framework and then added a Pupeteer version, back in the day, and when I was banging on about applying those in your CI/CD framework, if you have one, I was thinking about a 0 TTD.
I was thinking about a setup in which the Analytics team can be 100% sure that no one else breaks their stuff undetected.
Did I ever mention my favourite example? A retailer, where we tracked prices as net prices. They had a data layer, of sorts, and the net price was part of that DL. We were happy and it worked. Then, one day, revenue went up 19%. For a moment, some people were happy, but then someone noted that 19% is the same number as VAT in Germany, ha ha, what a coincidence!
Wait a minute…
As it turned out, a developer had decided that the price in the DL must have been wrong, and replaced it with gross price. They didn’t tell anyone, and so that change went live, and boom.
Two weeks later, with the next release, you guessed it, they had “fixed” the change, obviously without telling us, again. So for a couple of hours, we tracked net revenue minus a percentage, then we fixed it, again.
Any test run during integration, or in UAT, would have caught that, easily.
“Yes, we could!”
I’m not precious about my framework, others can do it better. But it is sad, and a little surprising, that testing hasn’t become an integral part of our culture by now.
Other people have made great progress, standardising implementations and making maintenance easier. Apollo, by the crazy guys at Search Discovery, is one such thing.
On top of that, we could, relatively easily, all have setups with 0 TTD! There is no technical reason why we couldn’t.
Data is so important, and especially data that can be trusted!
If your TTD is weeks, years, or potentially infinity, how can you be sure that your analysis is correct? How much time and effort do you put into second guessing, double-checking, cross-checking, filtering, and massaging your data?
How difficult is it for you to get people to use your data? To convince them it is right?
Or, if I want to really call it out: what is your work worth if you cannot trust your data?
Ah, well, you know what I mean.
If there is no technical reason why we couldn’t do this, and if a 0 TTD is such a valuable thing to aim for, what are we waiting for?
I would love to hear your reasons, and I have some ideas about what reasons people could have.
1 – Resources
You are part of a team that is already stretched. You do not have enough people, or enough means to really make a difference.
2 – Development don’t want to, or they have no resources
The second most common reason, I think. You may have spoken with dev, and they are agreeing, but they simply have too much to do to accommodate you.
3 – Don’t care
Maybe you don’t care. Maybe you think that what you have is good enough.
I am guessing that resources are the top problem, both in the Analytics team itself, and on the dev side.
The article I linked above has ideas about how to put a number against better data quality. Maybe it can help you make the case.
I fear that reason number 3 is more common than most of us want to admit. A lot of us probably “make it work”, somehow.
The value of your work as an analyst, and sorry if that sounds brutal, but I believe it is the truth: your work is worth only as much as the data that you work with.
If you have rubbish data, you will never be able to deliver gold.
So, read that article, estimate your TTD and what it costs you, then change it!
Opinions? Other reasons?
3 thoughts on “TTD is a big issue!”
This article is dear to my heart, thanks for it, Dr. Exner!!
But ‘We could “relatively easy” have a TTD of near 0′ – I don’t know. The tracking setups these days are so complex and so multi-dimensional. It can go wrong in so many places, and getting a plug on all of them is a huge ongoing maintenance effort.
It can fail at several milestones:
1. in the data layer events that come from the site (developer) -> e.g. is content.language set and in the right syntax
2. in the processing of the data layer (TMS part 1) -> e.g. do we normalize it correctly (e.g. always lower-case it, set fallbacks if it is not there)
3. the mapping of the data layer to the individual Tags’ parameters and the configuration of these Tags and Rules (TMS part 2)
–> all this is multiplied by differences between browsers, so it may work in Chrome but not FF, Safari (e.g. right now I am dealing with a tracking issue in mobile browsers on iOS below version 15)
4. in the journey of the request (configuration of the site, tracking preventions) -> e.g. Content Security Policies regularly kill all kinds of requests at some clients (https://lukas-oldenburg.medium.com/content-security-policies-need-to-stop-being-a-money-dump-33525867ec52 )
5. in the processing of the request (configuration of the tool) -> e.g. falsely configured Adobe Analytics Report Suite
Most of these milestones require a different approach for automated testing. I am mostly focusing on 1 and 2 via automated Data Layer tests in the TMS, because those are at the root and hardest to detect, and a bit with Adobe Analytics Alerts and QA scripts.
But even this way, I am not even covering 15% of data layer variables and logics, most rules e.g. just check if the value is set and if it is a string. Whether e.g. the “Site Section” is not only there, but also CORRECT requires a whole other set of rules. Etc. etc.
Anyway thanks again for this. I will see if I can establish TTD as a metric for our data quality efforts (e.g. we have a log of issues, and every time we see an issue, we could log the TTD as well to start with).
LikeLiked by 1 person
Hi Lukas, thanks for the thoughts! I may have stretched the sense of “relatively” a little, yes.
I see tests and automation, such as with the frameworks I wrote, mainly at your 1, maybe also part of the 2, so really at the foundation of “where dev meets analytics”. I still think that testing that, in the environment that the devs work in, will prevent a lot of regressions, and therefore be incredibly helpful.
But you’re right: it won’t help find or prevent any mistakes made in a TMS or by the analytics team.
We are still artisans.
LikeLiked by 1 person