API design dry-run modes allow you to validate complex state mutations before execution. Learn to implement safe validation for your distributed systems.
When you’re pushing a complex state change to a production system, the "hope for the best" strategy eventually fails. We’ve all been there: a massive batch job or a nested resource update triggers an unintended side effect, leaving the database in an inconsistent state. Implementing an API dry-run mode is a form of defensive programming that allows clients to verify if a request is valid, authorized, and logically sound without actually committing the mutation to the underlying store.
In distributed systems, the cost of a failed mutation is high. You aren't just dealing with a simple row update; you're coordinating cache invalidations, event bus emissions, and third-party webhooks.
A dry-run mode acts as a pre-flight check. It executes the validation logic—schema checks, business rule validation, and permission verification—but short-circuits the process before the transaction commits. This is particularly useful when building complex API design patterns where the client needs to know if a sequence of operations will succeed before they lock the resource.
When we first added dry-run capabilities to a core billing service, we tried to mock the entire persistence layer. We created a "Shadow Repository" that implemented our interface but redirected writes to /dev/null.
It failed spectacularly.
The complexity of maintaining two versions of the data access layer—one real, one mock—was unsustainable. We ended up with "drift," where the validation logic in the mock repository wasn't strictly identical to the production code. We eventually abandoned the mock approach in favor of a transaction-based rollback strategy.
Instead of mocking the infrastructure, we designed our services to support a dry_run boolean flag in the request body (or as a custom header). Here is the pattern we settled on:
Gofunc (s *OrderService) CreateOrder(ctx context.Context, req OrderRequest, dryRun bool) (*OrderResponse, error) { // 1. Validation Logic if err := s.validate(req); err != nil { return nil, err } // 2. Business Logic (Check state, calculate totals) order, err := s.process(req) if err != nil { return nil, err } // 3. Short-circuit if dry-run if dryRun { return &OrderResponse{Status: "VALIDATED", Details: order.Summary()}, nil } // 4. Persistence return s.repo.Save(order) }
This approach ensures that your validation logic remains identical for both real and dry-run requests. If the business logic changes, the dry-run behavior updates automatically.
The biggest challenge with this pattern is ensuring that non-database side effects—like firing an event to Kafka or calling a payment gateway—don't trigger.
You must wrap these calls in a conditional check. If you're using dependency injection, consider injecting a "No-op" implementation of your event publisher when the dryRun flag is true. This keeps your core service logic clean and avoids accidental production noise.
Don’t treat dry-run as a silver bullet. If your system requires heavy locking or complex distributed transactions, a dry-run might provide a false sense of security. Validating that a request is structurally valid doesn't guarantee that the system state won't change between the dry-run and the actual execution.
For highly volatile systems, you might find more value in API traffic shadowing to test how your system handles real-world requests in a safe environment, rather than relying on client-initiated dry-runs.
Does a dry-run consume API rate limits? Yes, it should. Since a dry-run still executes significant backend validation logic, it incurs a performance cost. Counting it against the user's quota prevents abuse.
Should I return a 200 or a 204 for a successful dry-run? I prefer a 200 OK with a metadata field indicating that the operation was validated but not executed. A 204 No Content is technically correct, but it provides no feedback to the client about what the system actually "saw."
How do I handle state-dependent dry-runs? If your dry-run depends on the current state of the database, be aware that the state might change by the time the actual request arrives. Always include a version or timestamp check to ensure the dry-run results are still relevant.
We’ve found that implementing a dry_run flag is about roughly 1.5 days of work for a mid-sized service, but the reduction in support tickets and manual data cleanups is worth every hour. I'm still not entirely happy with how we handle validation errors in dry-run mode—sometimes the error messages are too generic—but it's a massive step up from the "execute and pray" method.
Start by exposing the flag in your most volatile endpoints. You’ll be surprised how quickly your clients start building better, more resilient integrations once they have a way to verify their payloads before they hit your production database.
Master API request batching to reduce network overhead and slash latency in high-traffic systems. Learn pragmatic patterns for building efficient APIs.
Read moreMaster API design caching strategies to balance performance and consistency. Learn how to implement read-through caching and handle invalidation in systems.