DatabasesJune 21, 20264 min read

Database schema design: Implementing Shadow Columns for Soft Deletes

Database schema design matters. Learn how to implement shadow columns for efficient soft deletes and auditing without sacrificing query performance.

databasepostgresqlperformanceschema-designbackendengineeringMySQLRedis

Last month, our primary order processing service started crawling during high-traffic windows. We had a massive orders table littered with deleted_at timestamps, causing our index scans to drag as the table grew into the millions of rows.

I realized we were paying a heavy performance tax just to keep "dead" data around. We needed a way to handle soft deletes and auditing that didn't turn every SELECT query into a slow, index-bloated mess. That’s when I turned to shadow columns and partial indexing.

The Problem with Traditional Soft Deletes

Most developers default to a simple is_deleted boolean or a deleted_at timestamp. It’s easy to implement, but it’s a trap for database schema design. Every query in your application suddenly needs a WHERE deleted_at IS NULL clause.

If you forget that clause once, you get bugs. If you include it everywhere, your database engine has to navigate a bloated index that contains both active and inactive records. When your table size hits the 5-10 million row mark, the overhead of scanning these "ghost" records becomes non-trivial.

We tried adding a standard composite index on (user_id, deleted_at). It helped, but it still forced the engine to keep metadata for records that weren't even relevant to the user's current session.

Moving to Shadow Columns

Instead of cluttering the main table, I moved the deletion metadata to a shadow table (or a set of dedicated shadow columns if the volume is low). The goal is to keep the "hot" table lean.

Think of it this way:

Main Table: Contains only active, valid records.
Shadow Table: Contains the primary key of the deleted record, the timestamp, and the actor ID (for auditing).

When a user deletes an item, we move the row ID to the deleted_orders table. Now, your main orders query doesn't need a WHERE clause at all. It just hits the clean, active set of data.

Implementation Strategy

If you're on PostgreSQL 14+, you can use a trigger to handle this silently, but I prefer keeping this logic in the service layer to maintain visibility. Here’s how the schema structure looks:


SQL
-- The lean main table
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL,
    amount DECIMAL(12, 2),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- The shadow audit table
CREATE TABLE deleted_orders (
    order_id UUID PRIMARY KEY,
    deleted_at TIMESTAMPTZ DEFAULT NOW(),
    deleted_by UUID,
    reason TEXT
);

This setup allows for a much cleaner performance optimization approach. You aren't forcing the database to filter out deleted rows during every read operation.

Indexing Strategies for Shadow Data

The beauty of this approach is that it enables database indexing strategies that are far more effective. Since your main table is now strictly "active" data, your indexes are smaller and fit better into RAM.

If you still need to query deleted data for reports or admin views, you can join against the shadow table. Because the shadow table only contains the deleted IDs, the join performance remains predictable.

We also found that for high-frequency auditing, implementing a Database TTL Strategy on the shadow table itself helps keep storage costs down. We set a policy to purge the deleted_orders table after 90 days, effectively moving old, deleted data into cold storage (like S3/Parquet files) instead of bloating our primary RDS instance.

The Trade-offs

This isn't a silver bullet. You’re trading storage complexity for read performance.

Atomicity: You’ll need a database transaction to move the row from the main table to the shadow table. If your application crashes between the DELETE and the INSERT, you could end up with orphaned data. Always wrap these in a single BEGIN...COMMIT block.
Complexity: Your ORM might not handle this "move to shadow" pattern out of the box. You’ll likely need to write custom repository methods rather than relying on standard softDelete() helpers.
Reporting: If you need a global view of everything (active + deleted), you'll need a UNION query. UNION is slower than a simple SELECT, so don't use this pattern if your primary use case is "show me all history" rather than "show me my current active items."

Final Thoughts

We saw our p99 latency for order lookups drop from around 320ms down to about 45ms after cleaning up the index bloat. Was it worth the extra code? Absolutely.

Next time, I might experiment with partitioning the main table by created_at alongside this shadow pattern. I’m still not entirely convinced that moving the data is better than just using a partial index, but for high-churn tables where you delete frequently, the shadow column approach provides a much cleaner separation of concerns.

FAQ

Does this hurt write performance? Yes, slightly. You’re performing two writes instead of one. However, in most backend systems, read performance is the bottleneck. The trade-off is usually worth it.

Can I use this for non-deleted auditing? Definitely. You can use shadow columns to track status changes (e.g., order_status_history) without polluting the main orders table.

What about foreign keys? This is the tricky part. You’ll need to decide if you want to enforce hard foreign keys to the shadow table or use application-level logic. I prefer application-level checks to keep the database schema flexible.

Back to Blog

Database schema design: Implementing Shadow Columns for Soft Deletes

The Problem with Traditional Soft Deletes

Moving to Shadow Columns

Implementation Strategy

Indexing Strategies for Shadow Data

The Trade-offs

Final Thoughts

FAQ

Similar Posts

Foreign key performance: Balancing Indexing and Write Throughput

Database indexing strategies: Mastering composite indexes for speed

Killing N+1 queries at the database layer: A practical guide