Next: Exact posterior for hyperparameters
Up: Adapting prior covariances
Previous: Invariant determinants
  Contents
Next we consider the example
=
where has been denoted ,
representing a regularization parameter or
an inverse temperature variable for the specific prior.
For a -dimensional Gaussian integral
the normalization factor becomes
=
.
For positive (semi-)definite the dimension is
given by the rank of under a chosen discretization.
Skipping constants results in a normalization energy
=
.
With
|
(482) |
we obtain the stationarity equations
For compensating hyperprior
the right hand side of Eq. (484) vanishes,
giving thus no stationary point for .
Using however the condition
one sees that for positive definite Eq. (483)
is minimized for =
corresponding to the `prior-free' case.
For example, in the case of Gaussian regression the solution
would be the data template = = .
This is also known as ``-catastrophe''.
To get a nontrivial solution for
a noncompensating hyperparameter energy =
must be used
so that
is nonuniform
[16,24].
The other limiting case is
a vanishing
for which
Eq. (484) becomes
|
(485) |
For
one sees that
.
Moreover, in case represents a normalized probability,
is also a solution of the first stationarity equation (483)
in the limit
.
Thus, for vanishing
the `data-free' solution
is a selfconsistent solution
of the stationarity equations (483,484).
Fig.6 shows a posterior surface
for uniform and for compensating hyperprior
for a one-dimensional regression example.
The Maximum A Posteriori Approximation
corresponds to the highest point of the joint posterior
over , in that figures.
Alternatively one can treat
the -integral
by Monte-Carlo-methods [236].
Figure 6:
Shown is the joint posterior density of and , i.e.,
for a zero-dimensional example of Gaussian regression
with training data and prior data .
L.h.s:
For uniform prior
so that the joint posterior becomes
,
having its maximum is at = , .
R.h.s.:
For compensating hyperprior
so that
having its maximum is at = , .
|
Finally we remark that
in the setting of empirical risk minimization,
due to the different interpretation of the error functional,
regularization parameters are usually determined by
cross-validation or similar techniques
[166,6,230,216,217,81,39,211,228,54,83].
Next: Exact posterior for hyperparameters
Up: Adapting prior covariances
Previous: Invariant determinants
  Contents
Joerg_Lemm
2001-01-21