If your friendly marketer asks you to implement purchase IDs or to serialise events, you’ll have to generate 20-character IDs that should be as unique as possible.
The straight-forward approach is to use some existing IDs that the back-end provides, but often those can be longer than 20 characters.
So what can you do?
Hash Them!
The obvious answer is to use a hash function, ideally a perfect one.
But even those hash functions return more than 20 characters, right? Even a simple SHA-1 represented in Base64 binary is longer.
The thing is, though, that With a good hash function, the unpredictability of any part of the hash is proportional to the part’s size. (see this article on stackoverflow). In other words, we can use any hash function we like and then just truncate the result to 20 characters.
Nice, hm?
For those of you who use Dynamic Tag Management (or any Tag Management System, really), you could think of loading a helper function that calculates a hash in Javascript, then truncates it.
You could use code like this, this or this, a library like CryptoJS, or roll your own. The latter was a joke, of course.
Happy new year Jan! Very interesting article.
Do you know if there is a rule of thumb or a function available to estimate under which circumstances it is possible to get a duplicate? For instance, for over 1 million hashed values there is a 1% possibility to get a duplicate value when truncating to 20 chars.
LikeLike
Hi Panagiotis!
I am in no way an expert on hashing, so no guarantees here.
A quick google found this calculator: http://davidjohnstone.net/pages/hash-collision-probability and this discussion on StackOverflow: http://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code-collisions
The thing to keep in mind: if you cut down your hash to 20 chars, the probability goes up. Cut down 40 chars to 20 chars and it doubles. Cut down 80 chars to 20 chars and it quadruples.
I think.
LikeLike
Thank you very much for the swift reply! Both links are very useful!
LikeLike