The Representability of the Relative Entropy

Definition (Relative entropy): The relative entropy between two discrete probability distributions on the probability simplex \mc P is defined to be the positive quantity

    \[   D(\mb P', \mb P) = \sum_{i\in \Xi} \mb P'(i) \log \left( \frac{\mb P'(i)}{\mb P(i)} \right). \]

The relative entropy is only finite when the distribution \mb P' is absolutely continuous with respect to the distribution \mb P. Whenever \mb P'(i) is zero, the contribution of the ith term is taken to be zero as \lim_{x\to 0} x \log x = 0.

Properties (Relative entropy): The relative entropy enjoys the following properties:

  1. Information inequality: D(\mb P', \mb P)\geq 0 for all \mb P', \mb P in \mc P, while D(\mb P', \mb P)=0 if and only if \mb P'=\mb P.
  2. Convexity: D(\mb P', \mb P) is convex in (\mb P', \mb P) \in \mc P\times \mc P.
  3. Lower semicontinuity: D(\mb P', \mb P) is lower semicontinuous in (\mb P', \mb P) \in \mc P\times \mc P.

The relative entropy defines two convex measures of distance on the unit simplex. Indeed, we can define in this context two distinct kinds of pseudo balls due to the asymmetry of the relative entropy. The sublevel set of the first kind

    \[   \mb B^1_r(\mb P) \defn \set{\mb P'\in\mc P}{D(\mb P', \mb P)\leq r}  \]

and the sublevel set of the second kind

    \[ \mb B^2_r(\mb P') \defn \set{\mb P\in\mc P}{D(\mb P', \mb P)\leq r} \]

indeed characterize two distinct convex geometries of distance on the unit simplex. From the convexity property of the relative entropy it follows that either kind of pseudo ball \mb B^1_r(\mb P) or \mb B^2_r(\mb P') is convex for any positive r.

Pseudo balls of the first kind

Pseudo balls \mb B^1_r(\mb P) of the first kind were encountered previously in the context of Sanov’s theorem. The figure below illustrates the pseudo balls of the first kind \mb B^1_r(\mb P) for various r.

Relative entropy contours on the unit simplex.
Relative entropy ball of the first kind

Theorem (Pseudo balls of the first kind): Pseudo balls of the first kind \mb B^1_r(\mb P) can be characterized as

    \[   \mb B^1_r(\mb P) = \{\mb P'\in \mc P : H(\mb P') \leq r + \textstyle\sum_{i\in\Xi} \mb P'(i) \log \mb P(i) \} \]

using the entropy function H(\mb P') \defn \sum_{i\in \Xi} \mb P'(i)\log \mb P'(i).

The entropy function H(\mb P') is a positive convex function which can be canonically represented using the exponential cone. We will see that the pseudo balls \mb B^2_r(\mb P') of the second kind admit a representation in terms of the geometric mean.

Pseudo balls of the second kind

The figure below illustrates the pseudo balls of the second kind \mb B^2_r(\mb P') for various r.

Relative entropy ball of the second kind

In practice, pseudo balls of the second type \mb B^2_r(\mb P') are mostly encountered around empirical distributions

    \[   \mb P'(i)=\textstyle\frac1T \sum_{k=1}^T \mb 1_{\xi_k=i} \]

of T data samples (\xi_1, \dots, \xi_T). In this case, the elements of the probability vector \mb P' are fractions with T as a common denominator. The set of all such distributions \mb P' is denoted further as \mc P_T.

Theorem (Pseudo balls of the second kind): Pseudo balls of the second kind \mb B^2_r(\mb P) around distributions \mb P' \in \mc P_T can be characterized as

    \[   \mb B^2_r(\mb P') = \{\mb P\in \mc P : (\textstyle\prod_{i\in\Xi} \mb P(i)^{T\cdot\mb P'(i)})^{1/T} \geq e^{-(r - H(\mb P'))} \}. \]

Note that as \mc P_T becomes dense in \mc P with increasing T, the previous theorem can be used to a construct second-order cone representation (of arbitrary precision) of the pseudo balls \mb B^2_r(\mb P') for any \mb P' in \mc P. Indeed, the function (\prod_{i\in\Xi} \mb P(i)^{T\cdot\mb P'(i)})^{1/T} is recognized as a geometric mean which is a positive concave function and is canonically represented using the second-order cone.

Leave a Reply

Your email address will not be published. Required fields are marked *