Overfitting

This MedLibrary.org supplementary page on Overfitting is provided directly from the open source Wikipedia as a service to our readers. Please see the note below on authorship of this content, as well as the Wikipedia usage guidelines. To search for other content from our encyclopedia supplement, please use the form below:

Noisy (roughly linear) data is fit to both linear and polynomial functions.  Although the polynomial function passes through each data point, and the line passes through few, the line is a better fit because it does not have the large excursions at the ends.  If the regression curves were used to extrapolate the data, the overfit would do much worse.
Noisy (roughly linear) data is fit to both linear and polynomial functions. Although the polynomial function passes through each data point, and the line passes through few, the line is a better fit because it does not have the large excursions at the ends. If the regression curves were used to extrapolate the data, the overfit would do much worse.
Overfitting/Overtraining in supervised learning (e.g. neural network). Training error is shown in blue, validation error in red. If the validation error increases while the training error steadily decreases then a situation of overfitting may have occurred.
Overfitting/Overtraining in supervised learning (e.g. neural network). Training error is shown in blue, validation error in red. If the validation error increases while the training error steadily decreases then a situation of overfitting may have occurred.

In statistics, overfitting is fitting a statistical model that has too many parameters. An absurd and false model may fit perfectly if the model has enough complexity by comparison to the amount of data available. Overfitting is generally recognized to be a violation of Occam's razor. When the degrees of freedom in parameter selection exceed the information content of the data, this leads to arbitrariness in the final (fitted) model parameters which reduces or destroys the ability of the model to generalize beyond the fitting data. The likelihood of overfitting depends not only on the number of parameters and data but also the conformability of the model structure with the data shape, and the magnitude of model error compared to the expected level of noise or error in the data.

The concept of overfitting is important also in machine learning. Usually a learning algorithm is trained using some set of training examples, i.e. exemplary situations for which the desired output is known. The learner is assumed to reach a state where it will also be able to predict the correct output for other examples, thus generalizing to situations not presented during training (based on its inductive bias). However, especially in cases where learning was performed too long or where training examples are rare, the learner may adjust to very specific random features of the training data, that have no causal relation to the target function. In this process of overfitting, the performance on the training examples still increases while the performance on unseen data becomes worse.

In both statistics and machine learning, in order to avoid overfitting, it is necessary to use additional techniques (e.g. cross-validation, early stopping, Bayesian priors on parameters or model comparison), that can indicate when further training is not resulting in better generalization. The process of overfitting of neural network during the training is also known as overtraining. In treatment learning, overfitting is avoided by using a minimum best support value.

Literature

See also

External links

Wikipedia content modification information:

  • This page was last modified on 20 July 2008, at 07:48.

Wikipedia Authorship and Review

Wikipedia content provided here is not reviewed directly by MedLibrary.org. Wikipedia content is authored by an open community of volunteers and is not produced by or in any way affiliated with MedLibrary.org.

Wikipedia Usage Guidelines

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article on "Overfitting".

The URL for this specific entry is:

All Wikipedia text is available under the terms of the GNU Free Documentation License. (See Copyrights for details). Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.