Near certainty vs absolute certainty
So I was recently reading about Kullback-Leibler divergence in the hopes of finding something to help me with the problem I'm working on (I have a feeling my next presentation will need the subtitle "My descent into statistics") when I found a passage on wikipedia that resonated quite well with me (appropriate links mostly added by me):
On the entropy scale of information gain there is very little difference between near certainty and absolute certainty—coding according to a near certainty requires hardly any more bits than coding according to an absolute certainty. On the other hand, on the logit scale implied by weight of evidence, the difference between the two is enormous – infinite perhaps; this might reflect the difference between being almost sure (on a probabilistic level) that, say, the Riemann hypothesis is correct, compared to being certain that it is correct because one has a mathematical proof. These two different scales of loss function for uncertainty are both useful, according to how well each reflects the particular circumstances of the problem in question.