Regularization - Nejati Notes

In machine learning algorithms we try to fit a model to a data and based on that model, predict other features. Sometimes this model can have overly high bias and does not change much by data (underfitting). Conversely, the model can have overly high variance to the data and change rapidly (overfitting). For example, in regression overfitting and underfitting looks like: ![[Pasted image 20250322172355.png]] And here is what they might look like in classification: ![[Pasted image 20250322172622.png]] ### Combat overfitting 1. *collecting more training data* helps overfitting. Because as we have more data, the effect of each data points become lower thus less probable to be overfitted. 2. *using fewer features* also helps this issue especially if we don't have enough training data for it to be diverse and thorough enough. (feature selection) 3. *regularization* is the practice of **reducing** the effect of some features. Feature selection sets the effects of some features to 0. But in regularization we tune their effect to avoid overfitting ![[Pasted image 20250322173535.png]] ### Conception In order to combat overfitting and keep the parameters ($w_j$) near zero, we have to increase their importance in the cost function. Ideally the parameters with higher degree are regularized as such: ![[Pasted image 20250322191601.png]] However, in order to make it a formal algorithm, all the parameters are regularized by adding a regularization term in the cost function: $\frac{\lambda}{2m}\sum_{j=i}^{n}w_j^2$ where $\lambda$ is the rate of regularization to tinker. ![[Pasted image 20250322191849.png]] > [!info] > Choosing a big $\lambda$ will underfit the model and choosing a low $\lambda$ will how little effect and overfit the model.