DatabasesJune 24, 20264 min read

Database performance: Mitigating Hot Row Contention in Postgres

Database performance drops when hot rows cause lock contention. Learn how to identify bottlenecks and implement effective throttling to keep your system stable.

databasespostgresperformancescalingconcurrencyPostgreSQLMySQLRedisDatabase

Last month, our primary orders table hit a wall during a flash sale. We were seeing query latency spike from 40ms to over 2 seconds, and the Postgres logs were flooded with lock wait timeouts. It turned out that a single row—the global counter for our inventory tracking—was being hammered by thousands of concurrent transactions, leading to severe hot row contention.

When your application architecture relies on frequent updates to a single record, you’re eventually going to hit the limits of standard row-level locking. In high-concurrency environments, Postgres must serialize these updates. If 500 connections are trying to increment the same integer, 499 of them are sitting in a queue, waiting for the first one to commit. This creates a convoy effect that destroys your database performance.

Why Row-Level Locking Fails at Scale

We initially tried to solve this by simply wrapping our updates in shorter transactions. We thought that if we committed faster, the lock would be held for less time. That was a mistake. Even with a sub-10ms transaction, the overhead of context switching and lock management at the database engine level becomes the bottleneck.

If you're dealing with similar issues, you've likely looked into database deadlocks: how to minimize lock contention in production, but deadlocks are only half the battle. High-concurrency update workloads don't always deadlock—they just slow down until the entire application becomes unresponsive.

Identifying the Bottleneck

Before you change a single line of code, you need to see the contention in real-time. I use the pg_stat_activity view to catch these processes in the act:


SQL
SELECT pid, wait_event_type, wait_event, query 
FROM pg_stat_activity 
WHERE wait_event_type = 'Lock';

If you see a long list of PIDs waiting on tuple locks, you’ve confirmed that your application is suffering from hot row contention. At this point, tweaking your transaction isolation levels won't help. READ COMMITTED is the default, and moving to REPEATABLE READ or SERIALIZABLE will actually increase the frequency of serialization failures, forcing your application to handle more retries.

Implementing Throttling Strategies

Instead of hammering the database, we shifted our strategy. We stopped trying to update the row for every single event and started aggregating updates in the application layer.

Application-Level Buffering: We introduced a local queue in our Node.js services. Instead of executing an UPDATE query immediately, we push the increment to a Redis list.
Batch Processing: A background worker pulls from that Redis list every 200ms and performs a single bulk update. This reduces the number of database roundtrips by roughly 90%.
Database-Level Sharding of Counters: If we absolutely need the data in the database, we split the counter into multiple rows (e.g., counter_shard_1 through counter_shard_10). Each update hits a random shard, spreading the locking pressure.

This approach is similar to the logic used in database performance: how to implement write-combining for hot rows, where you prioritize throughput over strict real-time consistency.

A Note on Tooling

If you're using a message queue, you might be tempted to use database queueing with SELECT FOR UPDATE SKIP LOCKED. While that’s excellent for distributing tasks, it won't solve the problem of a single row being the destination for all those tasks.

If your workload is truly write-heavy, consider using an atomic increment approach with UPDATE ... SET count = count + 1. This is faster than a SELECT followed by an UPDATE because it minimizes the time the row is locked. However, even atomic increments eventually hit a ceiling when the contention is extreme.

What I’d Do Differently

Looking back, we spent too much time trying to tune Postgres parameters like max_locks_per_transaction or deadlock_timeout. Those are useful for specific edge cases, but they don't fix the underlying architectural flaw of having a "hot" record.

Next time, I'd implement the Redis-based batching pattern sooner. It’s more complex to manage, but it decouples the application's write volume from the database's locking capacity. If you're still seeing issues, check if your concurrency control strategy is actually fighting the database rather than working with it.

We’re still debating whether to move our inventory counters to a dedicated time-series database or an in-memory store permanently. For now, the batching approach keeps the orders table fast, but it’s definitely something we'll keep iterating on as our traffic grows.

Back to Blog

Database performance: Mitigating Hot Row Contention in Postgres

Why Row-Level Locking Fails at Scale

Identifying the Bottleneck

Implementing Throttling Strategies

A Note on Tooling

What I’d Do Differently

Similar Posts

Database Partitioning Strategies: Optimizing Your Query Execution Plans

Database queueing with SELECT FOR UPDATE SKIP LOCKED

When to denormalize your database for production performance