Next: Learning matrices
Up: Iteration procedures: Learning
Previous: Iteration procedures: Learning
  Contents
Due to the presence of the logarithmic
data term
and the normalization constraint in density estimation problems
the stationary equations are in general nonlinear,
even for Gaussian specific priors.
An exception are Gaussian regression problems
discussed in Section 3.7
for which becomes quadratic
and the normalization constraint can be skipped.
However, the nonlinearities
arising from the data term
are restricted to a finite number of
training data points
and for Gaussian specific priors one may expect them,
like those arising from the normalization constraint,
to be numerically not very harmful.
Clearly, severe nonlinearities can appear
for general non-Gaussian specific priors
or general nonlinear parameterizations .
As nonlinear equations the stationarity conditions
have in general to be solved by iteration.
In the context of empirical learning iteration procedures
to minimize an error functional represent possible learning algorithms.
In the previous sections we have encountered
stationarity equations
|
(607) |
for error functionals , e.g., = or = ,
written in a form
|
(608) |
with -dependent (and possibly ).
For the stationarity Eqs. (143), (172),
and (193)
the operator is a
-independent inverse covariance
of a Gaussian specific prior.
It has already been mentioned that for existing
(and not too ill-conditioned)
(representing the covariance of the prior process)
Eq. (608)
suggests an iteration scheme
|
(609) |
for discretized starting from some initial guess .
In general, like for the non-Gaussian specific priors discussed in
Section 6,
can be -dependent.
Eq. (368) shows that general nonlinear parameterizations
lead to nonlinear operators .
Clearly,
if allowing -dependent ,
the form (608) is no restriction of generality.
One always can choose an arbitrary invertible (and not too ill-conditioned)
,
define
|
(610) |
write a stationarity equation as
|
(611) |
discretize and iterate with .
To obtain a numerical iteration scheme
we will choose
a linear, positive definite learning matrix .
The learning matrix may depend on
and may also change during iteration.
To connect a stationarity equation given in form (608)
to an arbitrary iteration scheme with
a learning matrix
we define
|
(612) |
i.e., we split into two parts
|
(613) |
where we introduced for later convenience.
Then we obtain from the stationarity equation (608)
|
(614) |
To iterate we start by inserting an approximate solution
to the right-hand side
and obtain a new by calculating the left hand side.
This can be written in one of the following equivalent forms
where plays the role of a learning rate
or step width,
and =
may be iteration dependent.
The update equations (615-617) can be written
|
(618) |
with
=
.
Eq. (617) does not require
the calculation of or
so that instead of
directly can be given
without the need to calculate its inverse.
For example operators approximating
and being easy to
calculate may be good choices for .
For positive definite
(and thus also positive definite inverse)
convergence can be guaranteed, at least theoretically.
Indeed,
multiplying with
and projecting onto
an infinitesimal
gives
|
(619) |
In an infinitesimal neighborhood of
where
becomes equal to in first order
the left-hand side is for positive (semi-)definite
larger (or equal) to zero.
This shows that at least for small enough
the posterior log-probability increases
i.e., the differential is smaller or equal to zero
and the value of the error functional decreases.
Stationarity equation
(127) for minimizing
yields for (615,616,617),
The function
is also unknown
and is part of the variables
we want to solve for.
The normalization conditions
provide the necessary additional equations,
and the matrix can be extended
to include the iteration procedure for .
For example, we can insert the stationarity equation for
in (622) to get
|
(623) |
If normalizing at each iteration this
corresponds to an iteration procedure for
.
Similarly, for the functional
we have to solve (166)
and obtain for (617),
Again, normalizing at each iteration this is equivalent to
solving for ,
and the update procedure for can be varied.
Next: Learning matrices
Up: Iteration procedures: Learning
Previous: Iteration procedures: Learning
  Contents
Joerg_Lemm
2001-01-21