DatabasesJune 22, 20264 min read

Database schema optimization: Indexed Generated Columns for JSONB

Database schema design with JSONB indexing is critical for performance. Learn how PostgreSQL generated columns can speed up your queries by orders of magnitude.

PostgreSQLDatabase DesignJSONBPerformance TuningSQLBackend EngineeringMySQLRedisDatabase

Last month, I was debugging an endpoint that took nearly 800ms to return a simple list of orders. The culprit? A massive JSONB column named metadata where we stored everything from tracking numbers to user-agent strings. We were filtering by metadata->>'order_type', and the database was doing a full table scan on every single request.

If you're dealing with JSONB in PostgreSQL, you’ve likely felt this pain. You want the flexibility of schemaless storage, but you eventually pay the "performance tax" when your table grows beyond a few hundred thousand rows.

The Wrong Turn: GIN Indexes

My first instinct was to slap a GIN (Generalized Inverted Index) on the entire metadata column. It’s the standard advice you see on StackOverflow.


SQL
CREATE INDEX idx_metadata_gin ON orders USING GIN (metadata);

It worked—initially. Query times dropped, but our write latency spiked. Because the GIN index has to track every single key and value inside that JSON blob, every INSERT or UPDATE became significantly heavier. We saw our write duration jump by roughly 1.5x. Plus, the index size was massive, bloating our disk usage unnecessarily.

We needed a surgical approach to JSONB indexing that didn't require indexing the entire object.

Enter Generated Columns

PostgreSQL 12 introduced a game-changer: stored generated columns. These allow you to extract a specific field from a JSON object and treat it as a first-class citizen in your database schema.

The beauty here is that the value is computed at write time and stored on disk. It’s essentially a materialized view of a single field.

Here is how we refactored our table to optimize query performance:


SQL
ALTER TABLE orders 
ADD COLUMN order_type TEXT 
GENERATED ALWAYS AS (metadata->>'order_type') STORED;

CREATE INDEX idx_orders_order_type ON orders (order_type);

By creating a standard B-tree index on the order_type column, the query planner can now use a standard index scan instead of a jsonb_path_ops search. The performance difference was immediate. Our 800ms query dropped to about 12ms.

Why this beats raw JSONB querying

When you query a raw JSONB column, the database has to deserialize the JSON and perform a comparison for every row. Even with an expression index, the planner often struggles to estimate costs correctly.

Using a generated column provides several benefits:

Type Safety: You can cast the result. If your order_type is actually an integer, you can cast it to INT during generation, making sorting and range queries (WHERE order_type > 10) significantly faster.
Standard Indexing: You aren't forced into using GIN or GiST indexes. A standard B-tree index is usually smaller, faster, and more efficient for equality or range searches.
Storage Efficiency: You only store the specific data you need to query, rather than an inverted map of the entire JSON blob.

If you’re interested in how this fits into a broader strategy, I’ve previously written about how Database schema design: Implementing Shadow Columns for Soft Deletes can help manage data lifecycles without hurting your performance. Similarly, if you're working with an ORM, you might want to look at Laravel Eloquent Performance: Mastering PostgreSQL Generated Columns to see how to abstract this away in your application code.

A Caveat on Write Overhead

While this approach is faster for reads, remember that STORED generated columns do cost extra CPU cycles during writes. Every time you write to the metadata column, PostgreSQL calculates the expression and writes it to the new order_type column.

If your table has a high write-to-read ratio, monitor your iowait metrics. In my experience, the trade-off is almost always worth it for analytical or dashboard-heavy tables, but test it on a replica first if your system is write-heavy.

FAQ

Does this work with nested JSON? Yes. Your expression can be as complex as you need, e.g., (metadata->'user'->>'id')::int. Just keep in mind that the expression must be immutable.

What happens if the JSON key is missing? The generated column will store NULL. You can handle this by using COALESCE in the generation expression if you need a default value.

Can I update the generated column directly? No. The column is read-only. You must update the source JSONB column, and PostgreSQL will automatically update the generated column for you.

Looking Ahead

I’m still experimenting with how this interacts with partitioning. If you have a massive table partitioned by date, generating columns across all partitions can add up in terms of storage. Next time, I plan to test whether indexing these columns per-partition provides a better balance than a global index.

Database performance is rarely a "set it and forget it" task. This generated column pattern is just one of many tools in the shed. Start small, verify with EXPLAIN ANALYZE, and don't be afraid to drop the index if your access patterns change.

Back to Blog

Database schema optimization: Indexed Generated Columns for JSONB

The Wrong Turn: GIN Indexes

Enter Generated Columns

Why this beats raw JSONB querying

A Caveat on Write Overhead

FAQ

Looking Ahead

Similar Posts

Partial indexes for high-cardinality filtering: A deep dive

Reading an EXPLAIN plan without panic: A Backend Engineer’s Guide

Transactional Outbox Pattern: Using WAL for Reliable Event-Driven Systems