Next: Predictive density
Up: Basic model and notations
Previous: Energies, free energies, and
  Contents
Bayesian approaches require the calculation of posterior densities.
Model states are commonly specified
by giving the data generating probabilities or likelihoods .
Posteriors are linked to likelihoods
by Bayes' theorem
|
(14) |
which follows at once from
the definition of conditional probabilities, i.e.,
=
= .
Thus, one finds
|
(15) |
|
(16) |
using
=
for the training data likelihood of and
= .
The terms of Eq. (15)
are in a Bayesian context often referred to as
|
(17) |
Eqs.(16) show that
the posterior can be expressed equivalently
by the joint likelihoods
or conditional
likelihoods
.
When working with joint likelihoods, a distinction between
and variables is not necessary.
In that case can be included in and skipped from the notation.
If, however, is already known or is not of interest
working with conditional likelihoods is preferable.
Eqs.(15,16) can be interpreted
as updating (or learning) formula
used to obtain a new posterior
from a given prior probability
if new data arrive.
In terms of energies Eq. (16) reads,
|
(18) |
where the same temperature
has been chosen for both energies
and the normalization constants are
The predictive density we are interested in can be written as
the ratio of two correlation functions under ,
where
denotes the expectation
under the prior density
=
and the combined likelihood and prior energy
collects the -dependent energy and free energy terms
|
(24) |
with
|
(25) |
Going from Eq. (22)
to Eq. (23)
the normalization factor
appearing in numerator and denominator
has been canceled.
We remark that for continuous and/or
the likelihood energy term
describes an ideal,
non-realistic measurement because
realistic measurements cannot be arbitrarily sharp.
Considering the function
as element of a Hilbert space
its values may be written as scalar product
=
with a function being also an element in that Hilbert space.
For continuous and/or this notation is only formal
as becomes unnormalizable.
In practice a measurement of
corresponds to a normalizable
=
where the kernel
has to ensure normalizability.
(Choosing normalizable
as coordinates
the Hilbert space of
is also called a reproducing kernel Hilbert space
[183,112,113,228,144].)
The data terms then become
|
(26) |
The notation
is understood as limit
and means in practice
that
is very sharply centered.
We will assume that the discretization,
finally necessary to do numerical calculations,
will implement such an averaging.
Next: Predictive density
Up: Basic model and notations
Previous: Energies, free energies, and
  Contents
Joerg_Lemm
2001-01-21