Database performance drops when hot rows cause lock contention. Learn how to identify bottlenecks and implement effective throttling to keep your system stable.
Last month, our primary orders table hit a wall during a flash sale. We were seeing query latency spike from 40ms to over 2 seconds, and the Postgres logs were flooded with lock wait timeouts. It turned out that a single row—the global counter for our inventory tracking—was being hammered by thousands of concurrent transactions, leading to severe hot row contention.
When your application architecture relies on frequent updates to a single record, you’re eventually going to hit the limits of standard row-level locking. In high-concurrency environments, Postgres must serialize these updates. If 500 connections are trying to increment the same integer, 499 of them are sitting in a queue, waiting for the first one to commit. This creates a convoy effect that destroys your database performance.
We initially tried to solve this by simply wrapping our updates in shorter transactions. We thought that if we committed faster, the lock would be held for less time. That was a mistake. Even with a sub-10ms transaction, the overhead of context switching and lock management at the database engine level becomes the bottleneck.
If you're dealing with similar issues, you've likely looked into database deadlocks: how to minimize lock contention in production, but deadlocks are only half the battle. High-concurrency update workloads don't always deadlock—they just slow down until the entire application becomes unresponsive.
Before you change a single line of code, you need to see the contention in real-time. I use the pg_stat_activity view to catch these processes in the act:
SQLSELECT pid, wait_event_type, wait_event, query FROM pg_stat_activity WHERE wait_event_type = 'Lock';
If you see a long list of PIDs waiting on tuple locks, you’ve confirmed that your application is suffering from hot row contention. At this point, tweaking your transaction isolation levels won't help. READ COMMITTED is the default, and moving to REPEATABLE READ or SERIALIZABLE will actually increase the frequency of serialization failures, forcing your application to handle more retries.
Instead of hammering the database, we shifted our strategy. We stopped trying to update the row for every single event and started aggregating updates in the application layer.
UPDATE query immediately, we push the increment to a Redis list.counter_shard_1 through counter_shard_10). Each update hits a random shard, spreading the locking pressure.This approach is similar to the logic used in database performance: how to implement write-combining for hot rows, where you prioritize throughput over strict real-time consistency.
If you're using a message queue, you might be tempted to use database queueing with SELECT FOR UPDATE SKIP LOCKED. While that’s excellent for distributing tasks, it won't solve the problem of a single row being the destination for all those tasks.
If your workload is truly write-heavy, consider using an atomic increment approach with UPDATE ... SET count = count + 1. This is faster than a SELECT followed by an UPDATE because it minimizes the time the row is locked. However, even atomic increments eventually hit a ceiling when the contention is extreme.
Looking back, we spent too much time trying to tune Postgres parameters like max_locks_per_transaction or deadlock_timeout. Those are useful for specific edge cases, but they don't fix the underlying architectural flaw of having a "hot" record.
Next time, I'd implement the Redis-based batching pattern sooner. It’s more complex to manage, but it decouples the application's write volume from the database's locking capacity. If you're still seeing issues, check if your concurrency control strategy is actually fighting the database rather than working with it.
We’re still debating whether to move our inventory counters to a dedicated time-series database or an in-memory store permanently. For now, the batching approach keeps the orders table fast, but it’s definitely something we'll keep iterating on as our traffic grows.
Database partitioning is the key to high-performance scale. Learn how to optimize your execution plans and use partition pruning to slash query latency.
Read moreDatabase queueing with SELECT FOR UPDATE SKIP LOCKED is the secret to high-concurrency task processing. Learn to eliminate race conditions in your workers.