### Kullback-Leibler divergence as histogram distance function

2

I would like to use the Jensen-Shannon divergence as a histogram distance function. I'm implementing a simple image similarity search, and the histograms are normalized RGB color distributions.

I have a question on the Kullback-Leibler divergence formula (on which JS is based on): what should I return when Pi or Qi are zero?

Here is the implementation in F#:

``````let dKL p q =
Array.map2 (fun pi qi -> if pi = 0. then ?   // ?
elif qi = 0. then ? // ?
else pi * log (pi / qi)) p q
|> Array.sum
``````

and the Jensen-Shannon distance that uses it:

``````let dJS p q =
let m = Array.map2 (fun pi qi -> (pi + qi) / 2.) p q
(dKL p m) / 2. + (dKL q m) / 2.
``````

Wikipedia says that it should returns 0 when pi=0 and qi>0, and is not defined when qi=0, but for histogram distance it does not make much sense. What values would make sense in this case?

## Edit

here's the correct version as per Whatang's answer, for future reference:

``````let dKL p q =
Array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0.
else pi * log (pi / qi)) p q
|> Array.sum
``````
2012-04-03 22:18
by Francesco De Vittori
I'm curious, I've been taking some stats night classes (for reference: we're studying MLE/MVUE/Sufficiency/etc at this point), but I don't see how you can shoehorn this distribution distance into one about relative frequency. Keep in mind my limited knowledge before you call me silly - Ritch Melton 2012-04-03 22:39
There isn't really a good alternative from what I am reading `pi=0 -> 0` is just to avoid `0 * log 0` which is undefined, and `qi=0 -> undefined` is because otherwise you have division by zero - Guvante 2012-04-03 23:19
There's a question similar to yours with a good answer on the Stats StackExchange: http://stats.stackexchange.com/a/1413 - Jack P. 2012-04-04 12:46
@Guvante the issue is what value is meaningful in those cases. When qi is 0 and pi is 0 there's no problem because 1) the values are equals thus distance is obviously 0, and 2) it's common to consider 0 log 0 as 0. On the other hand the problem is when only qi is 0, but as Whatang shown, this can never happen in this particular case - Francesco De Vittori 2012-04-04 14:13
@RitchMelton I'm not an expert, but the idea is that a relative frequency distribution is pretty much the same thing as a probability distribution, so Jensen-Shannon, Kullback-Leibler, chi-squared & co. are ok. The practical implementation I'm testing confirms this, JS works very well (slightly better than chi-squared) - Francesco De Vittori 2012-04-04 14:19
@FrancescoDeVittori - Yea, that makes sense - Ritch Melton 2012-04-04 14:25

3

Since you're using this to build the Jensen-Shannon divergence the only way that you can have `qi` equal to zero in the calculation of the Kullback-Leibler divergence is if the `pi` value is also zero. This is because really you're calculating the average of `dKL(p,m)` and `dKL(q,m)`, where `m=(p+q)/2`. So `mi=0` implies both `pi=0` and `qi=0`.

Expand the definition of `dKL` to be `p log p - p log m`, and use the convention/limit that `0 log 0 = 0` and you'll see that there's no problem: `m` can only be zero when `p` also is.

To make a long story short, when you call `dKL` from `dJS` the second clause `elif qi = 0` will never be executed: put whatever you like in there (probably a good idea to make it zero unless you're going to call `dKL` from somewhere else).

2012-04-03 23:44
by Whatang
Correct, hadn't thought of that. With that correction the algorithm works pretty well - Francesco De Vittori 2012-04-04 14:08