Interpreting an energy or error functional probabilistically, i.e., assuming to be the logarithm of a posterior probability under study, the form of the training data term has to be . Technically, however, it would be easier to replace that data term by one which is quadratic in the function of interest.
Indeed, we have mentioned in Section 2.5
that such functionals can be justified
within the framework of empirical risk minimization.
From that Frequentist point of view an error functional ,
is not derived from a log-posterior,
but represents an empirical risk
,
approximating an
expected risk for action = .
This is possible under the assumption that
training data are sampled according to the true .
In that interpretation
one is therefore not restricted to
a log-loss for training data
but may as well choose for training data a quadratic loss like
Approximating a joint probability
the reference density
would have to be the joint empirical density
Hence, approximating conditional empirical densities
either non-data -values must be excluded
from the integration in (234)
by using an operator containing the projector
,
or
must be defined also for such non-data -values.
For existing
= =
,
a possible extension
of
would be to assume a uniform density for non-data values,
yielding
Instead of a quadratic term in ,
one might consider a quadratic term in the log-probability .
The log-probability, however,
is minus infinity at all non-data points
.
To work with a finite expression, one can choose
small and approximate
by
A quadratic data term in
results in an error functional
Positive (semi-)definite operators
have a square root and can be written
in the form
.
One possibility,
skipping for the sake of simplicity in the following,
is to choose
as square root
the integration operator, i.e.,
=
and
=
.
Thus,
transforms the density function
in the distribution function ,
and we have
.
Here the inverse is the differentiation operator
(with appropriate boundary conditions)
and
=
is the product of one-dimensional Laplacians
.
Adding for example a regularizing term
as in (165)
gives