Cost-Sensitive Learning: Optimize for Profit, Not Just Accuracy

Learn how to align your ML models with business objectives by moving beyond accuracy to cost-sensitive learning. Define custom cost matrices and maximize profit.

machine learningmodel evaluationbusiness logicpythonscikit-learnaimachine-learning

Previously in this course, we explored Confusion Matrices and Beyond to diagnose model errors and used Mastering Precision-Recall Curves for Production ML Pipelines to tune classification thresholds. While these tools show how a model errs, they don't explicitly tell you what those errors cost the business.

In this lesson, we shift from optimizing for statistical metrics like F1-score or accuracy to optimizing for actual profit.

The Problem with Default Metrics

Standard metrics treat a False Positive (FP) and a False Negative (FN) as equally "bad" or simply balance them via the F1-score. In production, this is rarely true.

Imagine a fraud detection model. A False Negative (missing a fraudulent transaction) costs the bank the full transaction amount, while a False Positive (blocking a legitimate user) costs only the customer service overhead of a support call. If the average fraud amount is $500 and the support cost is $20, treating these errors equally is a massive, expensive mistake.

Cost-sensitive learning allows us to inject these business realities directly into the model evaluation and training process.

Defining a Cost Matrix

A cost matrix is a simple table that assigns a dollar value (or utility score) to every outcome in your confusion matrix: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

	Predicted Negative	Predicted Positive
Actual Negative	$0 (TN)	-$20 (FP)
Actual Positive	-$500 (FN)	+$50 (TP)

To optimize for profit, your goal is to maximize the expected value: $ExpectedValue = (TP \times Profit_{TP}) + (TN \times Cost_{TN}) + (FP \times Cost_{FP}) + (FN \times Cost_{FN})$

Worked Example: Optimizing for Profit

Let’s implement a custom scorer in scikit-learn. We will take a hypothetical fraud dataset and evaluate a model based on our cost matrix above.


PYTHON
import numpy as np
from sklearn.metrics import confusion_matrix

def total_business_cost(y_true, y_pred):
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    
    # Costs/Profits
    cost_fp = -20
    cost_fn = -500
    profit_tp = 50
    profit_tn = 0
    
    return (tp * profit_tp) + (tn * profit_tn) + (fp * cost_fp) + (fn * cost_fn)

# Example usage with a dummy prediction
y_true = np.array([0, 0, 1, 1, 1])
y_pred = np.array([0, 1, 0, 1, 1])

profit = total_business_cost(y_true, y_pred)
print(f"Total Expected Profit: ${profit}")

Optimizing the Threshold

The classifier's default threshold is usually 0.5. However, if False Negatives are expensive, we should lower the threshold to catch more fraud, even if it increases False Positives. We can use make_scorer with greater_is_better=True to integrate this into GridSearchCV or cross_val_score.


PYTHON
from sklearn.metrics import make_scorer

# Use our function as a custom scorer
profit_scorer = make_scorer(total_business_cost, greater_is_better=True)

# Now you can use profit_scorer in GridSearchCV to find the 
# model(or threshold) that maximizes profit.

Hands-on Exercise

Define your costs: Assume you are building a churn prediction model. If you predict a customer will churn (Positive), you offer a $50 incentive. If they actually churned, you save $200 in customer lifetime value. If they didn't, you wasted $50. Calculate the cost matrix for this scenario.
Implementation: Create a profit_scorer function for this scenario and evaluate a simple LogisticRegression model using cross_val_score with your custom scorer.
Comparison: Compare the profit achieved at a 0.5 threshold vs. the profit achieved at a threshold of 0.3.

Common Pitfalls

Ignoring Non-Linear Costs: Sometimes costs aren't static. A False Negative might cost $500, but 1,000 False Negatives might cost you your reputation and your business license. Ensure your cost matrix captures the "tail risk."
Overfitting to the Matrix: If your cost matrix is based on noisy data or bad estimates, you will optimize your model toward a "fantasy" profit. Always perform sensitivity analysis: vary your costs by ±20% to see if your best model remains the same.
Ignoring Calibration: If you are adjusting thresholds, you must ensure your model is well-calibrated. A model that predicts 0.9 probability but is only 50% accurate will lead to disastrous threshold decisions.

Recap

Accuracy is a vanity metric: In business, we care about the bottom line.
Cost Matrices are maps: They translate domain knowledge into a mathematical objective.
Thresholding is dynamic: By shifting the classification threshold, you can balance the trade-off between FP and FN costs without retraining the model.
Custom Scorers: Use make_scorer to wrap your business logic so that standard tools like GridSearchCV can optimize for your specific profit goals.

Up next: Handling Class Imbalance with Resampling, where we look at how to prepare your data so your models learn the minority class effectively before we apply these cost-sensitive metrics.

Back to Blog

Cost-Sensitive Learning: Optimize for Profit, Not Just Accuracy

The Problem with Default Metrics

Defining a Cost Matrix

Worked Example: Optimizing for Profit

Optimizing the Threshold

Hands-on Exercise

Common Pitfalls

Recap

Similar Posts

Introduction to Cross-Validation: Robust Model Evaluation

Mastering Regression Evaluation Metrics: RMSE, MAE, and R-squared

Training Error vs Generalization Error: A Practical Guide