Master regularization techniques like Ridge and Lasso to prevent overfitting. Learn how to tune alpha and build simpler, more reliable machine learning models.
Previously in this course, we explored the Bias-Variance Tradeoff and how excessive model complexity leads to Overfitting and Underfitting. In this lesson, we move from diagnosis to treatment: we will use regularization to mathematically penalize complex models, forcing them to favor simplicity and better generalization.
In linear regression, our goal is to minimize the sum of squared errors between predictions and actual targets. When we have many features—or features that are highly correlated—the model often assigns large weights to specific coefficients to "chase" noise in the training data. This is classic overfitting.
Regularization addresses this by adding a penalty term to the loss function. Instead of just minimizing the error, we minimize: Loss = (Model Error) + (Penalty for Large Weights)
By constraining how large the weights (coefficients) can grow, we prevent the model from relying too heavily on any single feature or noise pattern.
The primary difference between Ridge and Lasso lies in how they penalize the coefficients:
The alpha parameter controls the strength of this penalty. A high alpha increases the penalty (simpler, more biased model), while an alpha near zero behaves like standard linear regression.
Let’s apply these techniques to our project pipeline. We will use Ridge and Lasso from sklearn.linear_model.
PYTHONfrom sklearn.linear_model import Ridge, Lasso from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler # Assume CE9178">'X_train' and CE9178">'y_train' are already prepared # We use a pipeline to scale features—crucial for regularization! ridge_pipe = Pipeline([ (CE9178">'scaler', StandardScaler()), (CE9178">'regressor', Ridge(alpha=1.0)) ]) lasso_pipe = Pipeline([ (CE9178">'scaler', StandardScaler()), (CE9178">'regressor', Lasso(alpha=0.1)) ]) ridge_pipe.fit(X_train, y_train) lasso_pipe.fit(X_train, y_train) # Checking the impact: print(f"Ridge coefficients: {ridge_pipe.named_steps[CE9178">'regressor'].coef_}") print(f"Lasso coefficients: {lasso_pipe.named_steps[CE9178">'regressor'].coef_}")
Why the StandardScaler? Regularization is scale-sensitive. If one feature is measured in "millions" and another in "decimals," the penalty will disproportionately punish the larger-scale feature. Always scale your data before applying Ridge or Lasso.
LinearRegression step with a Ridge regressor.alpha values of [0.01, 0.1, 1, 10, 100].alpha provides the best balance? (Hint: You are looking for the point where the gap between training and test scores narrows without significant drops in accuracy.)StandardScaler, your regularization penalty will be biased towards features with smaller numerical ranges.alpha is too large, you will over-penalize, leading to underfitting. Your model will become too simple to capture the underlying signal.Regularization is your primary defense against overfitting. By choosing Ridge (for stability) or Lasso (for sparsity/feature selection) and carefully tuning your alpha hyperparameter, you ensure your model focuses on the signal rather than the noise. After Evaluating Feature Importance, regularization serves as the final step in refining a lean, production-ready model.
Up next: We will benchmark these linear models against tree-based algorithms to see if we can squeeze out more performance.
Master feature selection with RFECV. Learn how to automate the removal of noisy, irrelevant features to build simpler, more robust machine learning models.
Read moreMaster the art of managing model complexity. Learn how to use tree pruning and regularization to keep your ML models performant, stable, and easy to maintain.
Regularization Techniques