Next: General loss functions and
Up: Bayesian decision theory
Previous: Loss and risk
  Contents
Loss functions for approximation
Log-loss:
A typical loss function for density estimation problems
is the log-loss
|
(46) |
with some -independent , and
actions describing probability densities
|
(47) |
Choosing =
and =
gives
which shows that minimizing log-loss is equivalent to minimizing
the (-averaged) Kullback-Leibler entropy
[122,123,13,46,53].
While the paper will concentrate on log-loss
we will also give a short summary of loss functions
for regression problems.
(See for example [16,201] for details.)
Regression problems are special density estimation problems
where the considered possible actions are restricted to
-independent functions .
Squared-error loss:
The most common loss function for regression problems
(see Sections 3.7, 3.7.2)
is the squared-error loss. It reads
for one-dimensional
|
(51) |
with arbitrary and .
In that case the optimal function is
the regression function of the posterior
which is the mean of the predictive density
|
(52) |
This can be easily seen by writing
where the first term in (54) is independent of
and the last term vanishes after integration over
according to the definition of .
Hence,
|
(55) |
This is minimized by
.
Notice that for Gaussian with fixed variance
log-loss and squared-error loss are equivalent.
For multi-dimensional
one-dimensional loss functions like Eq. (51)
can be used
when the component index of is considered part of the -variables.
Alternatively, loss functions depending explicitly on multidimensional
can be defined.
For instance, a general quadratic loss function would be
|
(56) |
with symmetric, positive definite kernel
.
Absolute loss:
For absolute loss
|
(57) |
with arbitrary and .
The risk becomes
where the integrals have been rewritten as
=
+
and
=
+
introducing a median function
which satisfies
|
(60) |
so that
|
(61) |
Thus the risk is minimized by any median function .
-loss and - loss :
Another possible loss function,
typical for classification tasks
(see Section 3.8),
like for example image segmentation
[153],
is
the -loss for continuous
or --loss for discrete
|
(62) |
with arbitrary and .
Here denotes
the Dirac -functional
for continuous
and the Kronecker for discrete .
Then,
|
(63) |
so the optimal
corresponds to any mode function
of the predictive density.
For Gaussians mode and median are unique,
and coincide with the mean.
Next: General loss functions and
Up: Bayesian decision theory
Previous: Loss and risk
  Contents
Joerg_Lemm
2001-01-21