It turns out that the Chernoff bound previously discussed has a neat generalization in large deviation theory. In this setting the object under study is the entire empirical distribution instead of merely the empirical average of the data. Now we can study extreme events in which the empirical distribution realizes in a convex set .
Let be data samples in drawn independently and identically from the distribution with mean . An important class of an undesirable events can be expressed as the empirical average of data realizing in a closed convex set . The closed convex set is a subset of the probability simplex of distributions on . For the topology employed here and further technical conditions on the set , the reader is referred to the seminal paper of Csiszar. For the sake of readability, these technical points are supressed here.
where is the Kullback-Leibler divergence between and .
Csizar’s theorem can be seen to subsume Chernoff’s result by recognizing that the empirical average realizing in closed convex set is stated equivalently as the empirical distribution realizing in the closed convex set . In fact both bounds are exactly the same as it can be shown that
where is the convex dual of the log moment generating function of the distribution of . Observe that Csizar’s inequality (1) admits a nice geometrical interpretation as done in the figure below. The probability of the empirical distribution realizing in a set is bounded above by its distance as measured by the Kullback-Leibler divergence from the distribution generating the data and the set of extreme events .
As the Kullback-Leibler divergence is jointly convex in both its arguments, Csizar’s bound (1) is stated in terms of a convex optimization problem. That is, a convex optimization problem over distributions on , rather than vectors in .
Sanov’s bound is furthermore exponentially tight as it correctly identifies the exponential rate with which the probability of the event diminishes to zero.
Sanov’s Theorem : The probability that the empirical distribution of realizes in a open convex set diminishes with exponential rate
The previous makes that the Chernoff and Csizar inequality accurately quantify the probability of extreme events taking place. Furthermore, due to the convexity of the objects involved, computing these bounds amounts to solving a convex optimization problem.
- Csiszár, I. “Sanov property, generalized -projection and a conditional limit theorem.” The Annals of Probability (1984): 768-793.