In the previous sections the error functionals we will try to minimize in the following have been given a Bayesian interpretation in terms of the log-posterior density. There is, however, an alternative justification of error functionals using the Frequentist approach of empirical risk minimization [224,225,226].
Common to both approaches is the aim to minimize
the expected risk for action
(104) |
(105) |
From that Frequentist point of view one is not restricted to logarithmic data terms as they arise from the posterior-related Bayesian interpretation. However, like in the Bayesian approach, training data terms are not enough to make the minimization problem well defined. Indeed this is a typical inverse problem [224,115,226] which can, according to the classical regularization approach [220,221,165], be treated by including additional regularization (stabilizer) terms in the loss function . Those regularization terms, which correspond to the prior terms in a Bayesian approach, are thus from the point of view of empirical risk minimization a technical tool to make the minimization problem well defined.
The empirical generalization error for a test or validation data set independent from the training data , on the other hand, is measured using only the data terms of the error functional without regularization terms. In empirical risk minimization this empirical generalization error is used, for example, to determine adaptive (hyper-)parameters of regularization terms. A typical example is a factor multiplying the regularization terms controlling the trade-off between data and regularization terms. Common techniques using the empirical generalization error to determine such parameters are cross-validation or bootstrap like techniques [166,6,230,216,217,81,39,228,54]. From a strict Bayesian point of view those parameters would have to be integrated out after defining an appropriate prior [16,147,149,24].