Learn how models measure performance using loss functions. Master Mean Squared Error and Log Loss to understand the core mechanics of machine learning optimization.
Previously in this course, we explored the mechanics of linear regression and the mechanics of classification. We now know how models generate predictions, but we haven't yet addressed the "how" of improvement: how does a model know if it’s doing a good job?
To build a functional model, we need a mathematical compass. That compass is the loss function.
A loss function is a mathematical formula that quantifies the difference between a model's prediction ($\hat{y}$) and the actual ground truth ($y$). During training, the model's goal is optimization: it iteratively adjusts its internal parameters (weights) to minimize this error.
Think of it as a penalty system. If the model predicts a house price of $500k but the actual value is $600k, the loss function assigns a "cost" to that $100k gap. If the model predicts $599k, the loss is tiny. The training process is simply an exhaustive search for the parameter values that produce the lowest possible total loss across your entire dataset.
When predicting continuous values—like housing prices or temperature—we use Mean Squared Error (MSE). The logic is straightforward: square the difference between the prediction and the actual value, then average those squares across all samples.
$$MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$
Imagine you have three houses. You predict their prices, and you have the actual sales data.
| House | Actual ($y$) | Predicted ($\hat{y}$) | Error ($y - \hat{y}$) | Squared Error |
|---|---|---|---|---|
| 1 | 300k | 310k | -10 | 100 |
| 2 | 450k | 440k | 10 | 100 |
| 3 | 600k | 650k | -50 | 2500 |
Total Squared Error = 2700. MSE = 2700 / 3 = 900.
In classification tasks (e.g., "Is this email spam?"), MSE is ineffective because the target is binary (0 or 1). Instead, we use Log Loss (or Binary Cross-Entropy).
Log Loss measures the uncertainty of your model. It doesn't just care if you got the answer right; it cares about your confidence. If your model predicts a 0.9 probability of "Spam" for a real spam email, the loss is very low. If it predicts 0.1 probability for that same spam email, the penalty is extreme.
Mathematically, it looks like this: $$LogLoss = -(y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y}))$$
Because the logarithm of a number between 0 and 1 is negative, the leading negative sign makes the total loss positive. It forces the model to push probabilities as close to 1 (for the correct class) or 0 (for the incorrect class) as possible.
Using a Python snippet, let's calculate the MSE for a small set of predictions.
PYTHONimport numpy as np # Ground truth values y_true = np.array([100, 200, 300]) # Model predictions y_pred = np.array([110, 190, 350]) # Calculate squared differences squared_errors = (y_true - y_pred) ** 2 # Calculate the mean mse = np.mean(squared_errors) print(f"The MSE is: {mse}")
Exercise: Change the y_pred values so that one prediction is significantly further away (e.g., 400 instead of 350). Observe how the MSE increases disproportionately. This highlights why MSE is sensitive to outliers.
Up next, we will move beyond simple calculations and look at how to properly organize our data using Training and Testing Data Splits to ensure our model doesn't just memorize the past but learns to generalize to new information.
Learn to measure model accuracy with essential regression metrics. We break down RMSE, MAE, and R-squared so you can evaluate your predictions like a pro.
Read moreMaster advanced hyperparameter tuning with RandomizedSearchCV and Bayesian optimization. Learn to scale your experiments efficiently for better ML models.
Loss Functions and Model Objectives