The aim is to estimate the regression function from data samples drawn independently and identically from the data distribution .
Among the most popular nonparametric regression approaches are the Nadaraya-Watson kernel regressors. This family of regressors was introduced by Nadaraya and Watson and traces back to the regressogram from Tukey. The Nadaraya-Watson kernel regressor is defined by
The kernel function is a bounded and integrable real-valued function such that
The previous condition ensures that the kernel estimate (2) is local. Indeed, the weight assigned to observation decreases at least inverse proportional to the distance with the point of interest. The constant is commonly referred to as the bandwidth. An appropriate choice of the bandwidth parameter is crucial in practice.
To make the Nadaraya-Watson estimator (2) more concrete, we list here four kernels which are popular in practice.
It should be remarked that all four stated kernels are positive. Although non negative kernels exists, they are not as easily interpretable. For a better understanding of the different properties among all four defined kernels, we have visualized them in the figure below.
Note that the naive kernel is discontinuous which implies that the regression function is discontinuous as well. Discontinuous regression functions may not be desirable in practice. The Gaussian kernel and all its derivatives are continuous functions which implies a smooth regression function . As the Gaussian kernel is unbounded however, the function regression function depends at every point on all observed samples .
Pointwise Bias Properties
It is assumed that the data distribution is well behaved. That is, it is continuous with respect to the Lebesgue measure on with continuous density function . The marginal density of the distribution of is denoted with the function . It should be clear that the regression function is only defined properly on the support set of , i.e. . The marginal density of the distribution of is assumed to have bounded support.
- A.1 There exists a constant such that .
Due to the local nature of the kernel estimate (2), the continuity of function to be estimated plays a prominent role in any statement of asymptotic properties. More precisely, we need the assumption that
- A.2 the functions and are continuous at and .
A condition on the bandwidth parameter must be introduced as well. The bandwidth parameter should be taken smaller as more data is available, but not faster than . To that end we introduce the following condition
- A.3 and .
Asymptotic bias: Suppose that assumptions A.1–A.3 hold. Then the kernel estimator is asymptotically unbiased at point :
- P. Sarda, and P. Vieu, “Kernel Regression, in Smoothing and Regression: Approaches, Computation, and Application”, John Wiley & Sons (2000)