Publikationsansicht

OVERVIEW (2007)

Abstract
Modeling with flexible models, such as neural networks, requires careful control of the model complexity and generalization ability of the resulting model. Whereas general asymptotic estimators of generalization ability have been developed over recent years (e.g., [9]), it is widely acknowledged that in most modeling scenarios there isn't sufficient data available to reliably use these estimators for assessing generalization, or select/optimize models. As a consequence, one resorts to resampling techniques like cross-validation [3, 8, 14], jackknife or bootstrap [2]. In this paper, we address a crucial problem of cross-validation estimators: how to split the data into various sets. The set D of all available data is usually split into two parts: the design set E and the test set F. The test set is exclusively reserved to a final assessment of the model which has been designed on E (using e.g., optimization and model selection). This usually requires that the design set in turn is split in two parts: training set T and validation set V. The objective of the design/test split is to both obtain a model with high generalization ability and to assess the generalization error reliably. The second split is the training /validation split of the design set. Model parameters are trained on the training data, while the validation set provides an estimator of generalization error used to e.g., choose between alternative models or optimize additional (hyper) parameters such as regularization or robustness parameters [10, 12]. The aim is to select the split so that the generalization ability of the resulting model is as high as possible. This paper is concerned with studying the very different behavior of the two data splits using hold-out cross-validation, K-fold cross-validation [3, 14] and randomized permutation cross-validation

Details der Publikation
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.28.9692
Quelle http://eivind.imm.dtu.dk/publications/1999/larsen.nnsp99.ps.gz
Mitarbeiter CiteSeerX
Archiv CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Typ text
Sprache Englisch
Verknüpfungen 10.1.1.40.5708, 10.1.1.40.1718, 10.1.1.20.7156, 10.1.1.28.8481, 10.1.1.88.5750, 10.1.1.29.5180, 10.1.1.28.8190