Csiszar and Sanov

It turns out that the Chernoff bound previously discussed has a neat generalization in large deviation theory. In this setting the object under study is the entire empirical distribution \hat{\mb P}\defn \frac{1}{n} \sum_{i=1}^n \delta_{Y_i} instead of merely the empirical average of the data. Now we can study extreme events in which the empirical distribution realizes in a convex set \mc C.

Csizar’s Theorem

Let Y_i be data samples in \Re^d drawn independently and identically from the distribution \mb P_0 with mean \mu_0. An important class of an undesirable events can be expressed as the empirical average of data Y_i realizing in a closed convex set \mc C. The closed convex set \mc C is a subset of the probability simplex \mc P_d of distributions on \Re^d. For the topology employed here and further technical conditions on the set \mc C, the reader is referred to the seminal paper of Csiszar. For the sake of readability, these technical points are supressed here.

Csizar’s Theorem : The probability that the empirical distribution of Y_i realizes in a closed convex set \mc C is bounded by

(1)   \begin{equation*}   \log \mb P_0^n \left( \hat{\mb P} \in \mc C \right) \leq -n \cdot \textstyle\inf_{\mb P\in \mc C} \, D(\mb P, \mb P_0) \end{equation*}

where D(\mb P, \mb P_0) is the Kullback-Leibler divergence between \mb P and \mb P_0.

Geometric Interpretation

Csizar’s theorem can be seen to subsume Chernoff’s result by recognizing that the empirical average \frac1n \sum_{i=1}^n Y_i realizing in closed convex set C \subset \Re^d is stated equivalently as the empirical distribution \hat{\mb P} realizing in the closed convex set \set{\mb P}{\int y \, \mb P(\d y) \in C}. In fact both bounds are exactly the same as it can be shown that

    \[   \inf_{\mu\in C} \Lambda^\star(\mu-\mu_0) = \inf_{\mb P} \{D(\mb P, \mb P_0) : \textstyle\int y \, \mb P(\d y) \in C\}, \]

where \Lambda^\star is the convex dual of the log moment generating function \Lambda(\lambda) \defn \log \mb E_{\mb P}[e^{\lambda\tpose Y}] of the distribution \mb P of Y_i - \mu_0. Observe that Csizar’s inequality (1) admits a nice geometrical interpretation as done in the figure below. The probability of the empirical distribution realizing in a set \mc C is bounded above by its distance as measured by the Kullback-Leibler divergence from the distribution generating the data \mb P and the set of extreme events \mc C.

As the Kullback-Leibler divergence is jointly convex in both its arguments, Csizar’s bound (1) is stated in terms of a convex optimization problem. That is, a convex optimization problem over distributions on \Re^n, rather than vectors in \Re^n.

Rendered by QuickLaTeX.com

Sanov’s Theorem

Sanov’s bound is furthermore exponentially tight as it correctly identifies the exponential rate \inf_{\mb P} \set{D(\mb P, \mb P_0)}{\mb P \in \mc C} with which the probability of the event \hat{\mb P}\in \mc C diminishes to zero.

Sanov’s Theorem : The probability that the empirical distribution of Y_i realizes in a open convex set \mc C diminishes with exponential rate

(2)   \begin{equation*}   \frac{1}{n} \log \mb P_0^n \left( \hat{\mb P} \in \mc C \right) \to - \textstyle\inf_{\mb P \in \mc C} \, D(\mb P, \mb P_0). \end{equation*}

The previous makes that the Chernoff and Csizar inequality accurately quantify the probability of extreme events taking place. Furthermore, due to the convexity of the objects involved, computing these bounds amounts to solving a convex optimization problem.


  1. Csiszár, I. “Sanov property, generalized I-projection and a conditional limit theorem.” The Annals of Probability (1984): 768-793.

Leave a Reply

Your email address will not be published. Required fields are marked *