How to Write Incremental Database Migrations for PostgreSQL Clusters

Introduction

Database migrations are the backbone of evolving database schemas in production environments. With the advent of ORMs, Data Migration and Schema Management tools like Atlas, schema migrations are somewhat assumed to be immune to failures. Yet, as the number of dependent services grow and more and more developers are onboarded, backward-incompatible database migrations become a leading cause for inconsistent and partial rollbacks in case of failures.

When working with PostgreSQL clusters, incremental migrations become critical for maintaining consistency across multiple nodes while ensuring zero-downtime deployments. This article explores best practices, common pitfalls, and practical implementation strategies for PostgreSQL database migrations.

This article will be geared more, towards, developers, and we will focus our attention to PostgreSQL because, a wise person once said “PostgreSQL is always the second best Database for every usecase“

Understanding Incremental Migrations

Incremental migrations are versioned database changes that can be applied sequentially to transform your database schema from one state to another. Unlike destructive schema changes that require complete rebuilds, incremental migrations allow you to evolve your database structure gradually while preserving data integrity.

Key Principles

Immutability: Once applied, migrations should never be modified
Idempotency: Migrations should be safe to run multiple times
Reversibility: Each migration should have a corresponding rollback strategy
Atomicity: Each migration should be a complete, self-contained unit of change

Setting Up Migration Infrastructure

Migration Table Schema

First, let's create a migration tracking table that will keep track of applied migrations across your cluster:

-- migrations/000001_create_migrations_table.up.sql
CREATE TABLE IF NOT EXISTS schema_migrations (
    version BIGINT PRIMARY KEY,
    filename VARCHAR(255) NOT NULL,
    checksum VARCHAR(64) NOT NULL,
    applied_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    execution_time_ms INTEGER,
    success BOOLEAN DEFAULT TRUE
);

CREATE INDEX idx_schema_migrations_applied_at ON schema_migrations(applied_at);
CREATE INDEX idx_schema_migrations_success ON schema_migrations(success);

-- migrations/000001_create_migrations_table.down.sql
DROP INDEX IF EXISTS idx_schema_migrations_success;
DROP INDEX IF EXISTS idx_schema_migrations_applied_at;
DROP TABLE IF EXISTS schema_migrations;

Migration Naming Convention

Adopt a consistent naming convention for migration files:

{version}_{description}.{direction}.sql

Examples:

000001_create_users_table.up.sql
000001_create_users_table.down.sql
000002_add_user_email_index.up.sql
000002_add_user_email_index.down.sql

Case Study: E-commerce Platform Migration

Let's walk through a realistic case study of migrating an e-commerce platform's database schema across a PostgreSQL cluster.

Migration 1: Creating Base Tables

-- migrations/000001_create_base_tables.up.sql
BEGIN;

-- Users table
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(100) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Products table
CREATE TABLE products (
    id BIGSERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    description TEXT,
    price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    inventory_count INTEGER DEFAULT 0 CHECK (inventory_count >= 0),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Orders table
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount >= 0),
    status VARCHAR(50) DEFAULT 'pending' CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Create indexes for performance
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_products_name ON products(name);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);

COMMIT;

-- migrations/000001_create_base_tables.down.sql
BEGIN;

DROP INDEX IF EXISTS idx_orders_created_at;
DROP INDEX IF EXISTS idx_orders_status;
DROP INDEX IF EXISTS idx_orders_user_id;
DROP INDEX IF EXISTS idx_products_name;
DROP INDEX IF EXISTS idx_users_username;
DROP INDEX IF EXISTS idx_users_email;

DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS users;

COMMIT;

Migration 2: Adding Order Items Table

-- migrations/000002_create_order_items.up.sql
BEGIN;

CREATE TABLE order_items (
    id BIGSERIAL PRIMARY KEY,
    order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

-- Composite index for efficient lookups
CREATE INDEX idx_order_items_order_product ON order_items(order_id, product_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Add constraint to ensure order total consistency
ALTER TABLE order_items 
ADD CONSTRAINT check_positive_quantity CHECK (quantity > 0);

COMMIT;

-- migrations/000002_create_order_items.down.sql
BEGIN;

DROP INDEX IF EXISTS idx_order_items_product_id;
DROP INDEX IF EXISTS idx_order_items_order_product;
DROP TABLE IF EXISTS order_items;

COMMIT;

Advanced Migration Patterns

Safe Column Additions

When adding columns to large tables in production, use this pattern to avoid long locks:

-- migrations/000003_add_user_profile_columns.up.sql
BEGIN;

-- Add columns with default values (fast operation)
ALTER TABLE users 
ADD COLUMN first_name VARCHAR(100),
ADD COLUMN last_name VARCHAR(100),
ADD COLUMN phone VARCHAR(20),
ADD COLUMN is_active BOOLEAN DEFAULT TRUE NOT NULL;

-- Create partial indexes for selective queries
CREATE INDEX CONCURRENTLY idx_users_active ON users(id) WHERE is_active = TRUE;
CREATE INDEX CONCURRENTLY idx_users_name ON users(first_name, last_name) WHERE first_name IS NOT NULL;

COMMIT;

Data Migration with Batching

For large data transformations, use batching to avoid long-running transactions:

-- migrations/000004_normalize_user_names.up.sql
BEGIN;

-- Add temporary column for migration tracking
ALTER TABLE users ADD COLUMN migration_batch_processed BOOLEAN DEFAULT FALSE;

-- Create function for batch processing
CREATE OR REPLACE FUNCTION migrate_user_names_batch(batch_size INTEGER DEFAULT 1000)
RETURNS INTEGER AS $$
DECLARE
    processed_count INTEGER := 0;
    total_processed INTEGER := 0;
BEGIN
    LOOP
        -- Process batch
        WITH batch_update AS (
            UPDATE users 
            SET 
                first_name = TRIM(SPLIT_PART(username, '.', 1)),
                last_name = TRIM(SPLIT_PART(username, '.', 2)),
                migration_batch_processed = TRUE
            WHERE migration_batch_processed = FALSE
                AND username LIKE '%.%'
            LIMIT batch_size
            RETURNING id
        )
        SELECT COUNT(*) INTO processed_count FROM batch_update;

        total_processed := total_processed + processed_count;

        -- Exit if no more rows to process
        EXIT WHEN processed_count = 0;

        -- Small delay to prevent overwhelming the system
        PERFORM pg_sleep(0.1);
    END LOOP;

    RETURN total_processed;
END;
$$ LANGUAGE plpgsql;

COMMIT;

-- Run the migration (this would typically be done by your migration runner)
-- SELECT migrate_user_names_batch(1000);

Zero-Downtime Column Type Changes

Changing column types safely requires a multi-step approach:

-- migrations/000005_change_price_precision.up.sql
-- Step 1: Add new column with desired type
BEGIN;

ALTER TABLE products ADD COLUMN price_new DECIMAL(12,4);

-- Step 2: Populate new column with converted data
UPDATE products SET price_new = price::DECIMAL(12,4);

-- Step 3: Add NOT NULL constraint after data is populated
ALTER TABLE products ALTER COLUMN price_new SET NOT NULL;

-- Step 4: Create new indexes on new column
CREATE INDEX CONCURRENTLY idx_products_price_new ON products(price_new);

COMMIT;

-- Step 5: Application deployment would happen here to use price_new
-- Step 6: In next migration, drop old column and rename new one

-- migrations/000006_complete_price_migration.up.sql
BEGIN;

-- Drop old index
DROP INDEX IF EXISTS idx_products_price;

-- Drop old column
ALTER TABLE products DROP COLUMN price;

-- Rename new column
ALTER TABLE products RENAME COLUMN price_new TO price;

-- Rename index
ALTER INDEX idx_products_price_new RENAME TO idx_products_price;

COMMIT;

Cluster-Specific Considerations

Replica Lag Management

When working with PostgreSQL clusters, consider replica lag during migrations:

-- migrations/000007_add_audit_columns.up.sql
BEGIN;

-- Function to check replica lag before proceeding
CREATE OR REPLACE FUNCTION check_replica_lag(max_lag_seconds INTEGER DEFAULT 30)
RETURNS BOOLEAN AS $$
DECLARE
    current_lag INTEGER;
BEGIN
    -- Check replication lag (simplified example)
    SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::INTEGER 
    INTO current_lag;

    IF current_lag > max_lag_seconds THEN
        RAISE NOTICE 'Replica lag is % seconds, waiting...', current_lag;
        RETURN FALSE;
    END IF;

    RETURN TRUE;
END;
$$ LANGUAGE plpgsql;

-- Only proceed if replica lag is acceptable
DO $$
BEGIN
    IF NOT check_replica_lag(30) THEN
        RAISE EXCEPTION 'Migration aborted due to high replica lag';
    END IF;
END $$;

-- Add audit columns to all main tables
ALTER TABLE users 
ADD COLUMN created_by INTEGER,
ADD COLUMN updated_by INTEGER,
ADD COLUMN version INTEGER DEFAULT 1;

ALTER TABLE products 
ADD COLUMN created_by INTEGER,
ADD COLUMN updated_by INTEGER,
ADD COLUMN version INTEGER DEFAULT 1;

ALTER TABLE orders 
ADD COLUMN created_by INTEGER,
ADD COLUMN updated_by INTEGER,
ADD COLUMN version INTEGER DEFAULT 1;

COMMIT;

Migration Execution Strategies

Transactional vs Non-Transactional Operations

Some operations cannot be run within transactions. Handle these carefully:

-- migrations/000008_create_concurrent_indexes.up.sql
-- This migration contains operations that cannot be run in a transaction

-- Create indexes concurrently (cannot be done in transaction)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_full_name 
ON users(first_name, last_name) WHERE first_name IS NOT NULL AND last_name IS NOT NULL;

CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_products_inventory 
ON products(inventory_count) WHERE inventory_count > 0;

-- Analyze tables after index creation
ANALYZE users;
ANALYZE products;

Validation and Rollback Strategies

Implement validation checks within migrations:

-- migrations/000009_add_constraints_with_validation.up.sql
BEGIN;

-- Add check constraint with validation
ALTER TABLE products 
ADD CONSTRAINT check_price_range 
CHECK (price BETWEEN 0.01 AND 999999.99) NOT VALID;

-- Validate constraint (this scans the table but doesn't block writes)
ALTER TABLE products VALIDATE CONSTRAINT check_price_range;

-- Add foreign key constraint
ALTER TABLE order_items 
ADD CONSTRAINT fk_order_items_order_id 
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE;

-- Validate data integrity
DO $$
DECLARE
    invalid_count INTEGER;
BEGIN
    SELECT COUNT(*) INTO invalid_count
    FROM order_items oi
    LEFT JOIN orders o ON oi.order_id = o.id
    WHERE o.id IS NULL;

    IF invalid_count > 0 THEN
        RAISE EXCEPTION 'Found % orphaned order items. Migration aborted.', invalid_count;
    END IF;

    RAISE NOTICE 'Data validation passed. Migration completed successfully.';
END $$;

COMMIT;

Best Practices and Common Pitfalls

1. Always Use Transactions When Possible

-- Good: Wrapped in transaction
BEGIN;
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
UPDATE users SET email_verified = TRUE WHERE created_at < '2024-01-01';
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
COMMIT;

-- Avoid: Multiple separate transactions
-- ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
-- UPDATE users SET email_verified = TRUE WHERE created_at < '2024-01-01';
-- ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;

2. Handle Large Tables Carefully

-- migrations/000010_partition_large_table.up.sql
BEGIN;

-- Create partitioned table for orders
CREATE TABLE orders_partitioned (
    LIKE orders INCLUDING ALL
) PARTITION BY RANGE (created_at);

-- Create partitions for current and future months
CREATE TABLE orders_2024_q1 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

CREATE TABLE orders_2024_q3 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');

-- Migration strategy would involve gradually moving data
-- This is typically done in multiple migrations to minimize impact

COMMIT;

3. Implement Proper Error Handling

-- migrations/000011_complex_data_migration.up.sql
BEGIN;

-- Create backup table for safety
CREATE TABLE users_backup AS SELECT * FROM users;

-- Set up error handling
DO $$
DECLARE
    error_count INTEGER := 0;
    total_count INTEGER;
BEGIN
    -- Get total count for progress tracking
    SELECT COUNT(*) INTO total_count FROM users;

    -- Perform complex migration with error handling
    BEGIN
        -- Your complex migration logic here
        UPDATE users SET email = LOWER(TRIM(email));

        -- Validate results
        SELECT COUNT(*) INTO error_count 
        FROM users 
        WHERE email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$';

        IF error_count > 0 THEN
            RAISE EXCEPTION 'Found % invalid email addresses after migration', error_count;
        END IF;

        RAISE NOTICE 'Successfully migrated % user records', total_count;

    EXCEPTION WHEN OTHERS THEN
        RAISE NOTICE 'Migration failed: %', SQLERRM;
        RAISE;
    END;
END $$;

COMMIT;

-- migrations/000011_complex_data_migration.down.sql
BEGIN;

-- Restore original data from backup table created during UP migration
DO $$
DECLARE
    backup_exists BOOLEAN := FALSE;
    backup_count INTEGER := 0;
    current_count INTEGER := 0;
BEGIN
    -- Check if backup table exists
    SELECT EXISTS (
        SELECT 1 FROM information_schema.tables 
        WHERE table_name = 'users_backup' 
        AND table_schema = current_schema()
    ) INTO backup_exists;

    IF NOT backup_exists THEN
        RAISE EXCEPTION 'Backup table "users_backup" not found. Cannot safely rollback migration.';
    END IF;

    -- Get counts for validation
    SELECT COUNT(*) INTO backup_count FROM users_backup;
    SELECT COUNT(*) INTO current_count FROM users;

    RAISE NOTICE 'Found backup table with % records, current table has % records', backup_count, current_count;

    -- Restore original data
    TRUNCATE users;
    INSERT INTO users SELECT * FROM users_backup;

    -- Validate restoration
    SELECT COUNT(*) INTO current_count FROM users;

    IF current_count != backup_count THEN
        RAISE EXCEPTION 'Restoration failed: expected % records, got %', backup_count, current_count;
    END IF;

    -- Clean up backup table
    DROP TABLE users_backup;

    RAISE NOTICE 'Successfully restored % user records to original state', current_count;

EXCEPTION WHEN OTHERS THEN
    RAISE NOTICE 'Rollback failed: %', SQLERRM;
    RAISE;
END $$;

COMMIT;

Monitoring and Observability

Migration Timing and Performance

-- Function to log migration performance
CREATE OR REPLACE FUNCTION log_migration_performance(
    migration_version BIGINT,
    migration_name VARCHAR(255),
    start_time TIMESTAMP WITH TIME ZONE
)
RETURNS VOID AS $$
DECLARE
    end_time TIMESTAMP WITH TIME ZONE := CURRENT_TIMESTAMP;
    duration_ms INTEGER;
BEGIN
    duration_ms := EXTRACT(EPOCH FROM (end_time - start_time)) * 1000;

    INSERT INTO schema_migrations (version, filename, checksum, applied_at, execution_time_ms)
    VALUES (migration_version, migration_name, 'calculated_checksum', start_time, duration_ms);

    RAISE NOTICE 'Migration % completed in % ms', migration_name, duration_ms;
END;
$$ LANGUAGE plpgsql;

Conclusion

Incremental database migrations are essential for maintaining PostgreSQL clusters in production environments. Key takeaways include:

Plan migrations carefully: Consider the impact on cluster performance and replica lag
Use transactions wisely: Group related changes but be aware of operations that cannot be transactional
Implement proper validation: Always verify data integrity after migrations
Design for rollbacks: Ensure every migration has a corresponding down migration
Monitor performance: Track execution times and system impact
Test thoroughly: Validate migrations in staging environments that mirror production

By following these patterns and best practices, you can safely evolve your PostgreSQL database schema while maintaining high availability and data integrity across your cluster.

Remember

The goal of incremental migrations is not just to change the schema, but to do so in a way that minimizes risk, maintains performance, and preserves the ability to quickly recover from issues. Always err on the side of caution and implement comprehensive monitoring to ensure your migrations are successful.

How to Write Incremental Database Migrations for PostgreSQL Clusters

Introduction

Understanding Incremental Migrations

Key Principles

Setting Up Migration Infrastructure

Migration Table Schema

Migration Naming Convention

Case Study: E-commerce Platform Migration

Migration 1: Creating Base Tables

Migration 2: Adding Order Items Table

Advanced Migration Patterns

Safe Column Additions

Data Migration with Batching

Zero-Downtime Column Type Changes

Cluster-Specific Considerations

Replica Lag Management

Migration Execution Strategies

Transactional vs Non-Transactional Operations

Validation and Rollback Strategies

Best Practices and Common Pitfalls

1. Always Use Transactions When Possible

2. Handle Large Tables Carefully

3. Implement Proper Error Handling

Monitoring and Observability

Migration Timing and Performance

Conclusion

Remember

Comments

More from this blog

Live Intercept Network Traffic in a Kubernetes Pod - No Service Mesh Needed.

To the Layman: Observability is not APM

Why I ditched Ubuntu?

What YouTube Tutorials Don't Tell You About DevOps & SRE (A Fresher's POV).

Command Palette

Introduction

Understanding Incremental Migrations

Key Principles

Setting Up Migration Infrastructure

Migration Table Schema

Migration Naming Convention

Case Study: E-commerce Platform Migration

Migration 1: Creating Base Tables

Migration 2: Adding Order Items Table

Advanced Migration Patterns

Safe Column Additions

Data Migration with Batching

Zero-Downtime Column Type Changes

Cluster-Specific Considerations

Replica Lag Management

Migration Execution Strategies

Transactional vs Non-Transactional Operations

Validation and Rollback Strategies

Best Practices and Common Pitfalls

1. Always Use Transactions When Possible

2. Handle Large Tables Carefully

3. Implement Proper Error Handling

Monitoring and Observability

Migration Timing and Performance

Conclusion

Remember

Comments

More from this blog