Kullback-Leibler divergence as histogram distance function

Go To StackoverFlow.com


I would like to use the Jensen-Shannon divergence as a histogram distance function. I'm implementing a simple image similarity search, and the histograms are normalized RGB color distributions.

I have a question on the Kullback-Leibler divergence formula (on which JS is based on): what should I return when Pi or Qi are zero?

Here is the implementation in F#:

let dKL p q =
    Array.map2 (fun pi qi -> if pi = 0. then ?   // ?
                             elif qi = 0. then ? // ?
                             else pi * log (pi / qi)) p q
    |> Array.sum

and the Jensen-Shannon distance that uses it:

let dJS p q =
    let m = Array.map2 (fun pi qi -> (pi + qi) / 2.) p q
    (dKL p m) / 2. + (dKL q m) / 2.

Wikipedia says that it should returns 0 when pi=0 and qi>0, and is not defined when qi=0, but for histogram distance it does not make much sense. What values would make sense in this case?


here's the correct version as per Whatang's answer, for future reference:

let dKL p q =
    Array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0.
                             else pi * log (pi / qi)) p q
    |> Array.sum
2012-04-03 22:18
by Francesco De Vittori
I'm curious, I've been taking some stats night classes (for reference: we're studying MLE/MVUE/Sufficiency/etc at this point), but I don't see how you can shoehorn this distribution distance into one about relative frequency. Keep in mind my limited knowledge before you call me silly - Ritch Melton 2012-04-03 22:39
There isn't really a good alternative from what I am reading pi=0 -> 0 is just to avoid 0 * log 0 which is undefined, and qi=0 -> undefined is because otherwise you have division by zero - Guvante 2012-04-03 23:19
There's a question similar to yours with a good answer on the Stats StackExchange: http://stats.stackexchange.com/a/1413 - Jack P. 2012-04-04 12:46
@Guvante the issue is what value is meaningful in those cases. When qi is 0 and pi is 0 there's no problem because 1) the values are equals thus distance is obviously 0, and 2) it's common to consider 0 log 0 as 0. On the other hand the problem is when only qi is 0, but as Whatang shown, this can never happen in this particular case - Francesco De Vittori 2012-04-04 14:13
@RitchMelton I'm not an expert, but the idea is that a relative frequency distribution is pretty much the same thing as a probability distribution, so Jensen-Shannon, Kullback-Leibler, chi-squared & co. are ok. The practical implementation I'm testing confirms this, JS works very well (slightly better than chi-squared) - Francesco De Vittori 2012-04-04 14:19
@FrancescoDeVittori - Yea, that makes sense - Ritch Melton 2012-04-04 14:25


Since you're using this to build the Jensen-Shannon divergence the only way that you can have qi equal to zero in the calculation of the Kullback-Leibler divergence is if the pi value is also zero. This is because really you're calculating the average of dKL(p,m) and dKL(q,m), where m=(p+q)/2. So mi=0 implies both pi=0 and qi=0.

Expand the definition of dKL to be p log p - p log m, and use the convention/limit that 0 log 0 = 0 and you'll see that there's no problem: m can only be zero when p also is.

To make a long story short, when you call dKL from dJS the second clause elif qi = 0 will never be executed: put whatever you like in there (probably a good idea to make it zero unless you're going to call dKL from somewhere else).

2012-04-03 23:44
by Whatang
Correct, hadn't thought of that. With that correction the algorithm works pretty well - Francesco De Vittori 2012-04-04 14:08