ArchitectureJune 22, 20264 min read

API Design: Implementing Dry-Run Modes for Safe State Mutations

API design dry-run modes allow you to validate complex state mutations before execution. Learn to implement safe validation for your distributed systems.

API designdistributed systemsdefensive programmingbackend engineeringarchitectureAPIBackendSystem Design

When you’re pushing a complex state change to a production system, the "hope for the best" strategy eventually fails. We’ve all been there: a massive batch job or a nested resource update triggers an unintended side effect, leaving the database in an inconsistent state. Implementing an API dry-run mode is a form of defensive programming that allows clients to verify if a request is valid, authorized, and logically sound without actually committing the mutation to the underlying store.

Why Dry-Run Matters for Distributed Systems

In distributed systems, the cost of a failed mutation is high. You aren't just dealing with a simple row update; you're coordinating cache invalidations, event bus emissions, and third-party webhooks.

A dry-run mode acts as a pre-flight check. It executes the validation logic—schema checks, business rule validation, and permission verification—but short-circuits the process before the transaction commits. This is particularly useful when building complex API design patterns where the client needs to know if a sequence of operations will succeed before they lock the resource.

The Wrong Turn: The "Simulation" Trap

When we first added dry-run capabilities to a core billing service, we tried to mock the entire persistence layer. We created a "Shadow Repository" that implemented our interface but redirected writes to /dev/null.

It failed spectacularly.

The complexity of maintaining two versions of the data access layer—one real, one mock—was unsustainable. We ended up with "drift," where the validation logic in the mock repository wasn't strictly identical to the production code. We eventually abandoned the mock approach in favor of a transaction-based rollback strategy.

Pragmatic Implementation: The "Validation-Only" Pattern

Instead of mocking the infrastructure, we designed our services to support a dry_run boolean flag in the request body (or as a custom header). Here is the pattern we settled on:


Go
func (s *OrderService) CreateOrder(ctx context.Context, req OrderRequest, dryRun bool) (*OrderResponse, error) {
    // 1. Validation Logic
    if err := s.validate(req); err != nil {
        return nil, err
    }

    // 2. Business Logic (Check state, calculate totals)
    order, err := s.process(req)
    if err != nil {
        return nil, err
    }

    // 3. Short-circuit if dry-run
    if dryRun {
        return &OrderResponse{Status: "VALIDATED", Details: order.Summary()}, nil
    }

    // 4. Persistence
    return s.repo.Save(order)
}

This approach ensures that your validation logic remains identical for both real and dry-run requests. If the business logic changes, the dry-run behavior updates automatically.

Handling Side Effects

The biggest challenge with this pattern is ensuring that non-database side effects—like firing an event to Kafka or calling a payment gateway—don't trigger.

You must wrap these calls in a conditional check. If you're using dependency injection, consider injecting a "No-op" implementation of your event publisher when the dryRun flag is true. This keeps your core service logic clean and avoids accidental production noise.

When to Avoid Dry-Run

Don’t treat dry-run as a silver bullet. If your system requires heavy locking or complex distributed transactions, a dry-run might provide a false sense of security. Validating that a request is structurally valid doesn't guarantee that the system state won't change between the dry-run and the actual execution.

For highly volatile systems, you might find more value in API traffic shadowing to test how your system handles real-world requests in a safe environment, rather than relying on client-initiated dry-runs.

FAQ

Does a dry-run consume API rate limits? Yes, it should. Since a dry-run still executes significant backend validation logic, it incurs a performance cost. Counting it against the user's quota prevents abuse.

Should I return a 200 or a 204 for a successful dry-run? I prefer a 200 OK with a metadata field indicating that the operation was validated but not executed. A 204 No Content is technically correct, but it provides no feedback to the client about what the system actually "saw."

How do I handle state-dependent dry-runs? If your dry-run depends on the current state of the database, be aware that the state might change by the time the actual request arrives. Always include a version or timestamp check to ensure the dry-run results are still relevant.

Final Thoughts

We’ve found that implementing a dry_run flag is about roughly 1.5 days of work for a mid-sized service, but the reduction in support tickets and manual data cleanups is worth every hour. I'm still not entirely happy with how we handle validation errors in dry-run mode—sometimes the error messages are too generic—but it's a massive step up from the "execute and pray" method.

Start by exposing the flag in your most volatile endpoints. You’ll be surprised how quickly your clients start building better, more resilient integrations once they have a way to verify their payloads before they hit your production database.

Back to Blog

API Design: Implementing Dry-Run Modes for Safe State Mutations

Why Dry-Run Matters for Distributed Systems

The Wrong Turn: The "Simulation" Trap

Pragmatic Implementation: The "Validation-Only" Pattern

Handling Side Effects

When to Avoid Dry-Run

FAQ

Final Thoughts

Similar Posts

API Request Batching: Reduce Network Overhead and Latency

API Design Caching Strategies: Mastering Read-Through and Consistency

API Design: Standardizing Microservices with a Robust Response Envelope