DatabasesJune 22, 20264 min read

Database Schema Design: Choosing Primary Keys for Performance

Master database schema design by choosing the right primary keys. Learn how UUID vs BigInt affects indexing performance and storage in your production database.

databasepostgresqlperformanceschema designprimary keysengineeringMySQLRedis

I remember sitting at my desk at 3:00 AM, watching our primary database instance hit 98% CPU utilization during a routine bulk import. We were using random UUIDs as primary keys across our core tables, and the index fragmentation had reached a point where the B-tree was essentially thrashing. We had prioritized global uniqueness over physical storage efficiency, and it was costing us roughly 200ms of additional latency on every single write operation.

Choosing a primary key seems trivial until you're dealing with millions of rows. It’s the foundation of your database schema design, and getting it wrong creates technical debt that’s notoriously expensive to refactor later.

Understanding UUID vs BigInt Trade-offs

When you're building a system, you’ll usually choose between a BIGINT (auto-incrementing integer) and a UUID.

BIGINT is simple. It uses 8 bytes of storage and keeps your index compact. Because it’s sequential, it offers excellent indexing performance because new rows are always appended to the "right" side of the B-tree. This minimizes page splits and keeps the index tree balanced with minimal effort from the database engine.

UUID (v4), on the other hand, is 16 bytes. It’s globally unique, which is great for distributed systems where you don't want to coordinate key generation between services. However, because v4 UUIDs are random, they wreak havoc on index locality. Every insert potentially requires a random page to be loaded into memory, modified, and written back to disk. This is a massive overhead compared to appending to a tail page.

The Problem with Random Indexing Performance

If you’ve read my thoughts on database indexing strategies, you know that index bloat is the silent killer of performance. When you insert a random UUID, the database cannot simply append the data. It has to find the correct leaf node in the B-tree, which might already be full. This triggers a page split—a process where the engine creates a new page and redistributes existing records.

In our case, the overhead wasn't just about disk space. The randomized writes meant our database cache (the Buffer Pool in PostgreSQL) was constantly being invalidated. We were spending more time fetching index pages from disk than actually performing the business logic of the query.

Is UUIDv7 the Middle Ground?

We eventually experimented with UUIDv7. Unlike v4, UUIDv7 includes a timestamp component at the start of the identifier. This makes them roughly sequential.

Using v7 gives you the best of both worlds:

Global Uniqueness: You don't lose the ability to generate IDs in your application layer.
Index Locality: Because the leading bytes are time-based, inserts are naturally sorted. This keeps your B-tree balanced and significantly reduces page splits.

If you’re moving toward a multi-tenancy database schema design, using v7 can also help keep your tenant-specific data clustered together if you include the tenant ID in your composite keys.

Storage Overhead and Real-World Constraints

Don't ignore the storage cost. A BIGINT is 8 bytes. A UUID is 16 bytes. If you have a table with 100 million rows, the difference in the primary key column alone is 800MB. When you add that to your secondary indexes—which also store the primary key as a pointer—that overhead multiplies.

For a table with five secondary indexes, that 800MB difference becomes 4.8GB of wasted space. That’s cache space you could have used for actual query data. If you’re constrained by memory, that extra space might be the difference between a query running in memory or hitting the SSD.

Lessons Learned

We didn’t migrate every table to BIGINT, and we didn't switch everything to UUIDv7 overnight. We started by identifying our most write-heavy tables and refactoring those first.

If I were starting from scratch today, I’d default to BIGINT for internal tables where the application doesn't need to expose IDs to the outside world. For public-facing entities where we need to hide the total count of records (preventing ID enumeration attacks), I’d use UUIDv7.

I’m still not entirely convinced that UUIDv7 is a perfect replacement for BIGINT in high-throughput systems, though. We’ve seen minor fragmentation over extremely long periods compared to a strictly monotonic integer. It's a trade-off I'm willing to make for now, but I’m keeping a close eye on pg_stat_user_indexes to monitor the bloat factor.

Frequently Asked Questions

Q: Should I use a surrogate key or a natural key? A: Almost always use a surrogate key. Natural keys (like email addresses or usernames) are rarely immutable, and updating a primary key that's referenced by foreign keys is a nightmare.

Q: Does foreign key performance suffer with UUIDs? A: Yes, foreign key performance is definitely impacted by the size of the key. Larger keys mean larger indexes, which means more I/O during join operations. Keep your foreign keys as small as possible.

Q: Can I convert a primary key from UUID to BigInt later? A: You can, but it’s a heavy operation. You’ll need to create a new column, backfill it, update all foreign keys, and then swap the column names. It’s a multi-step migration that requires significant downtime or complex shadow-write logic. Do it right the first time.

Back to Blog

Database Schema Design: Choosing Primary Keys for Performance

Understanding UUID vs BigInt Trade-offs

The Problem with Random Indexing Performance

Is UUIDv7 the Middle Ground?

Storage Overhead and Real-World Constraints

Lessons Learned

Frequently Asked Questions

Similar Posts

Database schema design: Implementing Shadow Columns for Soft Deletes

Database performance: Using functional indexes for faster queries

Foreign key performance: Balancing Indexing and Write Throughput