LaravelPHPJune 22, 20264 min read

Laravel Queues: Building a Dead Letter Queue for Production Jobs

Master Laravel Queues by implementing a robust Dead Letter Queue (DLQ) pattern. Learn how to use Redis for reliable job failure handling and automated replay.

LaravelRedisQueuesArchitecturePHPBackend

When a production job hits its final attempt limit in Laravel, it usually just vanishes into the failed_jobs table. For a long time, I treated that table as a graveyard—a place where data went to die until someone noticed a support ticket. If you're running high-throughput distributed systems, that "fire and forget" mentality eventually causes a major incident.

I recently refactored a payment processing pipeline where we were losing roughly 0.4% of failed jobs due to silent upstream API timeouts. We needed a better way to handle these failures, so I moved away from standard retries and toward a deterministic dead-letter routing strategy.

Rethinking Laravel Queues and Failure Handling

Standard Laravel retries are fine for transient issues like a momentary network hiccup. But if you have a job that fails because of a malformed payload or a circuit-breaker trip, retrying it immediately is a waste of CPU cycles.

We first tried using a simple failed() method in our jobs to log errors to Sentry. That didn't work because it didn't provide a mechanism to replay the data once the underlying issue was fixed. We needed a proper Dead Letter Queue that acts as a buffer.

Instead of letting jobs die, we now catch them, serialize the state, and push them into a Redis-backed holding area. This allows us to inspect the failure, apply a fix to the code, and trigger a bulk replay without touching the database directly.

Implementing a Redis-Backed Dead Letter Queue

To build this, I use a combination of a custom failed_jobs handler and a Redis sorted set. Redis is perfect here because we can attach a TTL (Time-To-Live) to the failed job, ensuring our storage doesn't grow indefinitely.

If you’re interested in the storage side of things, I’ve previously written about Database TTL Strategies: Optimizing Expiring Data Workflows to keep these buffers clean.

Here is how I structure the capture process in a base job class:


PHP
public function failed(\Throwable $exception)
{
    $payload = [
        'job' => get_class($this),
        'data' => $this->serialize(),
        'error' => $exception->getMessage(),
        'failed_at' => now()->timestamp,
    ];

    #6A9955">// Push to Redis with a 7-day TTL
    Redis::zadd('dlq:pending', now()->addDays(7)->timestamp, json_encode($payload));
}

By pushing to a Redis sorted set (ZSET), I can use the timestamp as the score. This makes it trivial to query jobs that have been sitting in the "dead" state for too long.

Automating the Replay Workflow

Capturing the failure is only half the battle. You need a reliable way to get those jobs back into your Laravel Queues.

I created a custom Artisan command that reads from the dlq:pending set. It filters by the job class name so we can replay specific types of failures without flushing the entire queue.


PHP
public function handle()
{
    $jobs = Redis::zrange('dlq:pending', 0, -1);

    foreach ($jobs as $rawJob) {
        $data = json_decode($rawJob, true);
        
        #6A9955">// Dispatch back to the queue
        app(Dispatcher::class)->dispatch(unserialize($data['data']));
        
        #6A9955">// Remove from DLQ
        Redis::zrem('dlq:pending', $rawJob);
    }
}

This approach gives us a deterministic way to handle Laravel API integration idempotency: Handling Webhooks with Redis as well. Since we are re-dispatching the exact serialized object, we maintain the integrity of the job state.

Why This Beats Standard Retries

The main advantage here is decoupling. When a service goes down, you don't want your workers spinning up constantly, hitting the same failing API endpoint.

By pushing failed jobs to a dedicated Dead Letter Queue, you:

Clear the main queue: Your workers stay free to process healthy jobs.
Gain visibility: You can monitor the size of the DLQ to alert on systemic failures.
Control the timing: You can wait for the upstream service to recover before triggering the replay.

One caveat: ensure your jobs are fully idempotent. If you’re replaying a payment job, you must check if the payment was actually processed before attempting it again. We use a unique job ID stored in Redis to check for existing transactions before the job executes its logic.

Final Thoughts

Building this custom routing wasn't without its headaches. We initially tried storing the failed jobs in a separate database table, but the overhead of querying and cleaning up that table under load was roughly 1.5x slower than just using Redis.

I’m still experimenting with using Laravel Workflow: Architecting Asynchronous State Machines for Reliability to handle the retry logic itself, as it offers a more declarative way to define what happens after a failure. For now, the Redis-backed buffer is keeping our production systems stable and our data loss at nearly zero.

If you're dealing with high-volume background tasks, stop relying on the default failed_jobs table. Build something that allows you to control the lifecycle of your failures.

Back to Blog

Laravel Queues: Building a Dead Letter Queue for Production Jobs

Rethinking Laravel Queues and Failure Handling

Implementing a Redis-Backed Dead Letter Queue

Automating the Replay Workflow

Why This Beats Standard Retries

Final Thoughts

Similar Posts

Laravel Queues and Redis Lua for Atomic Job Batching

Laravel Job Queuing: Architecting Weighted Fair Queuing with Redis

Laravel Horizon Job Pre-emption: Managing Priority Queues with Lua