Technically, one is interested in function approximation in finding
a function or hypothesis which minimizes
a given error/energy functional .
Using a Bayesian interpretation2
we understand the error functional
up to a constant as proportional to the posterior log-probability
for the function given training data
and prior informations , i.e.
(1) |
The error functional usually depends as well on a finite number of training data as also on additional prior informations . Special cases of function approximation include density estimation where has to fulfill an additional normalization condition or classification or pattern recognition where the function takes only discrete values representing the possible classes or patterns of .
Let us consider as example an error functional with
mean square data terms and a typical smoothness constraint
for -dimensional
For the smoothness constraint in (2) we find
the quite similar form
These examples motivate the following general definitions. Let denote a Hilbert space3of hypothesis functions of -dimensional .
Definition 1 ( (prior) concept with
(prior) template and template distance ):
A prior concept is a pair
with a function in
and
a (``distance'') functional with .
Note that this allows for .
We write .
The function will be called a (prior) template.
Template functions will be used to represent function prototypes. Notice that templates include standard training data which is the reason for the brackets around the word ``prior''. We are especially interested in distances quadratic in , for which the functional derivative with respect to is linear. Such distances can be defined by positive semi-definite operators . Such operators have a decomposition with invertible if positive definite. More precisely, = = defines a semi-norm on with if is in the zero space of , i.e. if . Typical are projectors into the space of training data like in and generators of infinitesimal transformations of continuous Lie groups, like the gradient for translations in (4) with under appropriate boundary conditions. Thus, we define:
Definition 2 (quadratic (prior) concept
with template distance operator ):
A quadratic (prior) concept is a pair
with a function in
and a symmetric and positive semi-definite operator
which will be called a template distance operator.
defines the square template distance:
(6) |
Thus, a quadratic concept defines a -dimensional Gaussian process with and covariance operator . Its matrix elements are sometimes also called Greenīs function, propagator or two-point correlation function. The Laplacian (4), for example, corresponds to the Wiener measure known from Brownian motion or diffusion and is also used as kinetic energy for Euclidean scalar fields in physics (see for example [2]. The zero modes of represent the projections of which do not contribute to . The projector in the mean square error term (3), for example, measures the distance only at (training data) point . Also continuous template functions may be restricted to subspaces, e.g. parts of an image or a specific resolution.
Definition 3 (template space
and template projectors ):
The maximal subspace on which the positive semi-definite
is positive definite
will be called the template space of .
The corresponding hermitian projector in this subspace , i.e.
,
,
and
will be called template projector.
Hence commutes with the template projector . Maximality of means that is the projector in the zero space of i.e. . Our aim is to built an error functional depending on over square distances .