Database constraints are your first line of defense against bad data. Learn how to implement atomic upserts and unique indexes to ensure data integrity today.
Last month, our ingestion service started duplicating millions of user activity logs during a sudden traffic spike. We were relying on an application-level "check-then-insert" pattern that crumbled under the pressure of concurrent requests, and it took about two days of cleanup to fix the mess.
If you’re still checking for existing records in your application code before running an INSERT, you’re building a race condition. I’ve learned the hard way that the only reliable way to handle deduplication is to move the logic into the database layer itself.
When two threads check SELECT count(*) FROM logs WHERE id = 123 at the same time, both might see zero results. Both then proceed to INSERT, and suddenly you’ve got two rows where you only wanted one. Even if you use transactions, you’re often forced into SERIALIZABLE isolation levels, which kills your throughput.
Instead, you should lean on database constraints to enforce uniqueness at the storage level. It’s faster, safer, and keeps your backend code focused on business logic rather than plumbing.
The most effective strategy I've found is combining a UNIQUE index with an ON CONFLICT clause (in PostgreSQL) or ON DUPLICATE KEY UPDATE (in MySQL).
First, ensure your schema actually enforces the rule. If you don't have a unique index, no amount of SQL wizardry will stop the duplicates.
SQLCREATE UNIQUE INDEX idx_user_activity_unique_key ON user_activity (user_id, activity_type, created_at);
Once that index is in place, you can stop "checking" and start "upserting." Here is the PostgreSQL syntax we now use for our event stream:
SQLINSERT INTO user_activity (user_id, activity_type, created_at, metadata) VALUES (101, 'login', '2023-10-27 10:00:00', '{"ip": "1.1.1.1"}') ON CONFLICT (user_id, activity_type, created_at) DO UPDATE SET metadata = EXCLUDED.metadata;
This operation is atomic. The database acquires a lock on the index entry, checks for the collision, and either inserts or updates in a single trip. It’s roughly 1.8x faster than the manual check-and-insert approach we were using before.
While atomic upserts are powerful, they aren't magic. You still need to be mindful of how your indexes impact write performance. Every time you add a unique constraint, you add overhead to every INSERT because the database must verify the index before committing.
If your table is massive, ensure your index fits in memory. If you're struggling with index size, you might find that Database indexing strategies: Mastering composite indexes for speed helps you design more compact keys.
One common mistake I see is trying to perform an upsert on a column that isn't indexed or is part of a massive, bloated index. If you find your writes slowing down, check if you can use Partial indexes for high-cardinality filtering: A deep dive to keep the index size manageable by only indexing the records that actually need deduplication.
Don't use upserts for high-frequency audit logs where every single event—even duplicates—might be legally required for an audit trail. In those cases, you want to keep the data immutable. Also, if you’re doing massive bulk imports, the ON CONFLICT overhead can add up. Sometimes it's better to load into a temporary table, perform a DELETE on the target table using a join, and then INSERT the remaining rows.
Does an atomic upsert lock the entire table? No, it locks only the specific index entry. In PostgreSQL, this is highly efficient, though heavy contention on the exact same key can still lead to some wait time for competing transactions.
What happens if the unique index is on a nullable column?
In PostgreSQL, NULL values are not considered equal in a unique index. If you need to treat NULL as a value, you'll need to use a COALESCE in your index or a functional index. Check out Database performance: Using functional indexes for faster queries for more on that.
How do I handle soft deletes with unique indexes?
This is a classic headache. If you "delete" a row by setting is_deleted = true, a unique index will still block a new row with the same key. You’ll either need to include is_deleted in your unique index or switch to a hard delete strategy.
I’m still experimenting with how these constraints behave during massive schema migrations. Sometimes, adding a unique index to a table with existing, messy data causes the migration to hang for hours. Always run CREATE INDEX CONCURRENTLY if you’re working on a live production database. It takes longer, but it won't lock your table and take your service down.
Partial indexes are the secret weapon for database performance. Learn how to use them to optimize high-cardinality filtering and cut your query latency.