Definition (Relative entropy): The relative entropy between two discrete probability distributions on the probability simplex is defined to be the positive quantity
The relative entropy is only finite when the distribution is absolutely continuous with respect to the distribution . Whenever is zero, the contribution of the th term is taken to be zero as .
Properties (Relative entropy): The relative entropy enjoys the following properties:
- Information inequality: for all , in , while if and only if .
- Convexity: is convex in .
- Lower semicontinuity: is lower semicontinuous in .
The relative entropy defines two convex measures of distance on the unit simplex. Indeed, we can define in this context two distinct kinds of pseudo balls due to the asymmetry of the relative entropy. The sublevel set of the first kind
and the sublevel set of the second kind
indeed characterize two distinct convex geometries of distance on the unit simplex. From the convexity property of the relative entropy it follows that either kind of pseudo ball or is convex for any positive .
Pseudo balls of the first kind
Pseudo balls of the first kind were encountered previously in the context of Sanov’s theorem. The figure below illustrates the pseudo balls of the first kind for various .
Theorem (Pseudo balls of the first kind): Pseudo balls of the first kind can be characterized as
using the entropy function .
The entropy function is a positive convex function which can be canonically represented using the exponential cone. We will see that the pseudo balls of the second kind admit a representation in terms of the geometric mean.
Pseudo balls of the second kind
The figure below illustrates the pseudo balls of the second kind for various .
In practice, pseudo balls of the second type are mostly encountered around empirical distributions
of data samples . In this case, the elements of the probability vector are fractions with as a common denominator. The set of all such distributions is denoted further as .
Theorem (Pseudo balls of the second kind): Pseudo balls of the second kind around distributions can be characterized as
Note that as becomes dense in with increasing , the previous theorem can be used to a construct second-order cone representation (of arbitrary precision) of the pseudo balls for any in . Indeed, the function is recognized as a geometric mean which is a positive concave function and is canonically represented using the second-order cone.