I would like to use the Jensen-Shannon divergence as a histogram distance function. I'm implementing a simple image similarity search, and the histograms are normalized RGB color distributions.
I have a question on the Kullback-Leibler divergence formula (on which JS is based on): what should I return when Pi or Qi are zero?
Here is the implementation in F#:
let dKL p q = Array.map2 (fun pi qi -> if pi = 0. then ? // ? elif qi = 0. then ? // ? else pi * log (pi / qi)) p q |> Array.sum
and the Jensen-Shannon distance that uses it:
let dJS p q = let m = Array.map2 (fun pi qi -> (pi + qi) / 2.) p q (dKL p m) / 2. + (dKL q m) / 2.
Wikipedia says that it should returns 0 when pi=0 and qi>0, and is not defined when qi=0, but for histogram distance it does not make much sense. What values would make sense in this case?
here's the correct version as per Whatang's answer, for future reference:
let dKL p q = Array.map2 (fun pi qi -> if pi = 0. && qi = 0. then 0. else pi * log (pi / qi)) p q |> Array.sum
pi=0 -> 0is just to avoid
0 * log 0which is undefined, and
qi=0 -> undefinedis because otherwise you have division by zero - Guvante 2012-04-03 23:19
Since you're using this to build the Jensen-Shannon divergence the only way that you can have
qi equal to zero in the calculation of the Kullback-Leibler divergence is if the
pi value is also zero. This is because really you're calculating the average of
mi=0 implies both
Expand the definition of
dKL to be
p log p - p log m, and use the convention/limit that
0 log 0 = 0 and you'll see that there's no problem:
m can only be zero when
p also is.
To make a long story short, when you call
dJS the second clause
elif qi = 0 will never be executed: put whatever you like in there (probably a good idea to make it zero unless you're going to call
dKL from somewhere else).