Last week, I started a discussion about a problem a client had with unique IDs generated by random numbers. They expected some collision, but the expected benefits of scalability outweighed the rare conflicts which would be lost in the insignificant digits of their analytics. In practice, the collisions were much more frequent and polluting their data. They needed to know why.
Having convinced the engineers and engineering managers about the underlying mathematics, they were actually pleased to see that their intuition about a very large random number (256 bits!) was a good solution. In fact, the chances of conflict was much lower than they expected. As they continued digging through the data, however, they found more and more evidence of duplication that they could not explain.
The millions of users coming in as flash crowds were real. The next thing to look at was the size of the random number.