Regularization parameters

Next: Exact posterior for hyperparameters Up: Adapting prior covariances Previous: Invariant determinants Contents

Regularization parameters

Next we consider the example ${{\bf K}}(\gamma)$ = $\gamma {{\bf K}}_0$ where $\theta \ge 0$ has been denoted $\gamma$ , representing a regularization parameter or an inverse temperature variable for the specific prior. For a -dimensional Gaussian integral the normalization factor becomes $Z_\phi (\gamma)$ = $(\frac{2\pi}{\gamma})^\frac{d}{2}(\det {{\bf K}}_0)^{-1/2}$ . For positive (semi-)definite ${{\bf K}}$ the dimension is given by the rank of ${{\bf K}}$ under a chosen discretization. Skipping constants results in a normalization energy $E_N(\gamma)$ = $-\frac{d}{2}\ln \gamma$ . With

$\begin{displaymath} \frac{\partial {{\bf K}}} {\partial \gamma} = {{\bf K}}_0 \end{displaymath}$

(482)

we obtain the stationarity equations

$\displaystyle \gamma {{\bf K}}_0(\phi-t)$	$\textstyle =$	$\displaystyle {\bf P}^\prime(\phi) {\bf P}^{-1}(\phi) N - {\bf P}^\prime (\phi) \Lambda_X ,$	(483)
$\displaystyle \frac{1}{2} (\phi-t ,\,{{\bf K}}_0\,(\phi-t))$	$\textstyle =$	$\displaystyle \frac{d}{2\, \gamma} -E_\gamma^\prime .$	(484)

For compensating hyperprior the right hand side of Eq. (484) vanishes, giving thus no stationary point for $\gamma$ . Using however the condition $\gamma\ge 0$ one sees that for positive definite ${{\bf K}}_0$ Eq. (483) is minimized for $\gamma$ =

corresponding to the `prior-free' case. For example, in the case of Gaussian regression the solution would be the data template $\phi$ =

. This is also known as `` $\delta$ -catastrophe''. To get a nontrivial solution for $\gamma$ a noncompensating hyperparameter energy $E_\gamma$ = $E_\theta$ must be used so that $\ln Z_\phi + E_N$ is nonuniform [16,24].

The other limiting case is a vanishing $E_\gamma^\prime$ for which Eq. (484) becomes

$\begin{displaymath} \gamma = \frac{d}{ (\phi-t ,\,{{\bf K}}_0\,(\phi-t)) }. \end{displaymath}$

(485)

For $\phi\rightarrow t$ one sees that $\gamma\rightarrow \infty$ . Moreover, in case

represents a normalized probability, $\phi=t$ is also a solution of the first stationarity equation (483) in the limit $\gamma\rightarrow \infty$ . Thus, for vanishing $E_\gamma^\prime$ the `data-free' solution $\phi=t$ is a selfconsistent solution of the stationarity equations (483,484).

Fig.6 shows a posterior surface for uniform and for compensating hyperprior for a one-dimensional regression example. The Maximum A Posteriori Approximation corresponds to the highest point of the joint posterior over $\gamma$ , in that figures. Alternatively one can treat the $\gamma$ -integral by Monte-Carlo-methods [236].

**Figure 6:** Shown is the joint posterior density of and $\gamma$ , i.e., $p({h},\gamma \vert D,D_0)$ $\propto p(y_D\vert{h})p({h}\vert\gamma ,D_0)p(\gamma )$ for a zero-dimensional example of Gaussian regression with training data and prior data $y_{D_0}=1$ . L.h.s: For uniform prior $p(\gamma ) \propto 1$ so that the joint posterior becomes $p \propto e^{-\frac{1}{2} {h}^2 -\frac{\gamma}{2} ({h}-1)^2 +\frac{1}{2}\ln \gamma}$ , having its maximum is at $\gamma$ = $\infty$ , . R.h.s.: For compensating hyperprior $p(\gamma ) \propto 1/\sqrt {\gamma }$ so that $p \propto e^{-\frac{1}{2} {h}^2 -\frac{\gamma}{2} ({h}-1)^2}$ having its maximum is at $\gamma$ = , .
$\begin{figure}\begin{center} \epsfig{file=ps/betaPU.eps, width= 65mm}\epsfig{file=ps/betaPC.eps, width= 65mm}\end{center}\end{figure}$

Finally we remark that in the setting of empirical risk minimization, due to the different interpretation of the error functional, regularization parameters are usually determined by cross-validation or similar techniques [166,6,230,216,217,81,39,211,228,54,83].

Next: Exact posterior for hyperparameters Up: Adapting prior covariances Previous: Invariant determinants Contents

Joerg_Lemm 2001-01-21