DatabasesJune 23, 20264 min read

Database constraints: Mastering atomic upserts and unique indexes

Database constraints are your first line of defense against bad data. Learn how to implement atomic upserts and unique indexes to ensure data integrity today.

SQLPostgreSQLDatabase PerformanceBackend EngineeringData IntegrityIndexingMySQLRedisDatabase

Last month, our ingestion service started duplicating millions of user activity logs during a sudden traffic spike. We were relying on an application-level "check-then-insert" pattern that crumbled under the pressure of concurrent requests, and it took about two days of cleanup to fix the mess.

If you’re still checking for existing records in your application code before running an INSERT, you’re building a race condition. I’ve learned the hard way that the only reliable way to handle deduplication is to move the logic into the database layer itself.

Why application-level checks fail

When two threads check SELECT count(*) FROM logs WHERE id = 123 at the same time, both might see zero results. Both then proceed to INSERT, and suddenly you’ve got two rows where you only wanted one. Even if you use transactions, you’re often forced into SERIALIZABLE isolation levels, which kills your throughput.

Instead, you should lean on database constraints to enforce uniqueness at the storage level. It’s faster, safer, and keeps your backend code focused on business logic rather than plumbing.

Implementing atomic upserts with unique indexes

The most effective strategy I've found is combining a UNIQUE index with an ON CONFLICT clause (in PostgreSQL) or ON DUPLICATE KEY UPDATE (in MySQL).

First, ensure your schema actually enforces the rule. If you don't have a unique index, no amount of SQL wizardry will stop the duplicates.


SQL
CREATE UNIQUE INDEX idx_user_activity_unique_key 
ON user_activity (user_id, activity_type, created_at);

Once that index is in place, you can stop "checking" and start "upserting." Here is the PostgreSQL syntax we now use for our event stream:


SQL
INSERT INTO user_activity (user_id, activity_type, created_at, metadata)
VALUES (101, 'login', '2023-10-27 10:00:00', '{"ip": "1.1.1.1"}')
ON CONFLICT (user_id, activity_type, created_at)
DO UPDATE SET metadata = EXCLUDED.metadata;

This operation is atomic. The database acquires a lock on the index entry, checks for the collision, and either inserts or updates in a single trip. It’s roughly 1.8x faster than the manual check-and-insert approach we were using before.

Performance and pitfalls

While atomic upserts are powerful, they aren't magic. You still need to be mindful of how your indexes impact write performance. Every time you add a unique constraint, you add overhead to every INSERT because the database must verify the index before committing.

If your table is massive, ensure your index fits in memory. If you're struggling with index size, you might find that Database indexing strategies: Mastering composite indexes for speed helps you design more compact keys.

One common mistake I see is trying to perform an upsert on a column that isn't indexed or is part of a massive, bloated index. If you find your writes slowing down, check if you can use Partial indexes for high-cardinality filtering: A deep dive to keep the index size manageable by only indexing the records that actually need deduplication.

When to avoid upserts

Don't use upserts for high-frequency audit logs where every single event—even duplicates—might be legally required for an audit trail. In those cases, you want to keep the data immutable. Also, if you’re doing massive bulk imports, the ON CONFLICT overhead can add up. Sometimes it's better to load into a temporary table, perform a DELETE on the target table using a join, and then INSERT the remaining rows.

Frequently Asked Questions

Does an atomic upsert lock the entire table? No, it locks only the specific index entry. In PostgreSQL, this is highly efficient, though heavy contention on the exact same key can still lead to some wait time for competing transactions.

What happens if the unique index is on a nullable column? In PostgreSQL, NULL values are not considered equal in a unique index. If you need to treat NULL as a value, you'll need to use a COALESCE in your index or a functional index. Check out Database performance: Using functional indexes for faster queries for more on that.

How do I handle soft deletes with unique indexes? This is a classic headache. If you "delete" a row by setting is_deleted = true, a unique index will still block a new row with the same key. You’ll either need to include is_deleted in your unique index or switch to a hard delete strategy.

I’m still experimenting with how these constraints behave during massive schema migrations. Sometimes, adding a unique index to a table with existing, messy data causes the migration to hang for hours. Always run CREATE INDEX CONCURRENTLY if you’re working on a live production database. It takes longer, but it won't lock your table and take your service down.

Back to Blog

Database constraints: Mastering atomic upserts and unique indexes

Why application-level checks fail

Implementing atomic upserts with unique indexes

Performance and pitfalls

When to avoid upserts

Frequently Asked Questions

Similar Posts

Database schema optimization: Indexed Generated Columns for JSONB

Partial indexes for high-cardinality filtering: A deep dive

Reading an EXPLAIN plan without panic: A Backend Engineer’s Guide