How to Write Incremental Database Migrations for PostgreSQL Clusters

I am just a student. My passion includes tinkering with the latest technologies. I love to dive into Distributed Systems, DevOps, Operating Systems, Backend Engineering and Networking.
Introduction
Database migrations are the backbone of evolving database schemas in production environments. With the advent of ORMs, Data Migration and Schema Management tools like Atlas, schema migrations are somewhat assumed to be immune to failures. Yet, as the number of dependent services grow and more and more developers are onboarded, backward-incompatible database migrations become a leading cause for inconsistent and partial rollbacks in case of failures.
When working with PostgreSQL clusters, incremental migrations become critical for maintaining consistency across multiple nodes while ensuring zero-downtime deployments. This article explores best practices, common pitfalls, and practical implementation strategies for PostgreSQL database migrations.
This article will be geared more, towards, developers, and we will focus our attention to PostgreSQL because, a wise person once said “PostgreSQL is always the second best Database for every usecase“
Understanding Incremental Migrations
Incremental migrations are versioned database changes that can be applied sequentially to transform your database schema from one state to another. Unlike destructive schema changes that require complete rebuilds, incremental migrations allow you to evolve your database structure gradually while preserving data integrity.
Key Principles
Immutability: Once applied, migrations should never be modified
Idempotency: Migrations should be safe to run multiple times
Reversibility: Each migration should have a corresponding rollback strategy
Atomicity: Each migration should be a complete, self-contained unit of change
Setting Up Migration Infrastructure
Migration Table Schema
First, let's create a migration tracking table that will keep track of applied migrations across your cluster:
-- migrations/000001_create_migrations_table.up.sql
CREATE TABLE IF NOT EXISTS schema_migrations (
version BIGINT PRIMARY KEY,
filename VARCHAR(255) NOT NULL,
checksum VARCHAR(64) NOT NULL,
applied_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
execution_time_ms INTEGER,
success BOOLEAN DEFAULT TRUE
);
CREATE INDEX idx_schema_migrations_applied_at ON schema_migrations(applied_at);
CREATE INDEX idx_schema_migrations_success ON schema_migrations(success);
-- migrations/000001_create_migrations_table.down.sql
DROP INDEX IF EXISTS idx_schema_migrations_success;
DROP INDEX IF EXISTS idx_schema_migrations_applied_at;
DROP TABLE IF EXISTS schema_migrations;
Migration Naming Convention
Adopt a consistent naming convention for migration files:
{version}_{description}.{direction}.sql
Examples:
000001_create_users_table.up.sql000001_create_users_table.down.sql000002_add_user_email_index.up.sql000002_add_user_email_index.down.sql
Case Study: E-commerce Platform Migration
Let's walk through a realistic case study of migrating an e-commerce platform's database schema across a PostgreSQL cluster.
Migration 1: Creating Base Tables
-- migrations/000001_create_base_tables.up.sql
BEGIN;
-- Users table
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Products table
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
inventory_count INTEGER DEFAULT 0 CHECK (inventory_count >= 0),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Orders table
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount >= 0),
status VARCHAR(50) DEFAULT 'pending' CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Create indexes for performance
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);
CREATE INDEX idx_products_name ON products(name);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);
COMMIT;
-- migrations/000001_create_base_tables.down.sql
BEGIN;
DROP INDEX IF EXISTS idx_orders_created_at;
DROP INDEX IF EXISTS idx_orders_status;
DROP INDEX IF EXISTS idx_orders_user_id;
DROP INDEX IF EXISTS idx_products_name;
DROP INDEX IF EXISTS idx_users_username;
DROP INDEX IF EXISTS idx_users_email;
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS users;
COMMIT;
Migration 2: Adding Order Items Table
-- migrations/000002_create_order_items.up.sql
BEGIN;
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
quantity INTEGER NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
-- Composite index for efficient lookups
CREATE INDEX idx_order_items_order_product ON order_items(order_id, product_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
-- Add constraint to ensure order total consistency
ALTER TABLE order_items
ADD CONSTRAINT check_positive_quantity CHECK (quantity > 0);
COMMIT;
-- migrations/000002_create_order_items.down.sql
BEGIN;
DROP INDEX IF EXISTS idx_order_items_product_id;
DROP INDEX IF EXISTS idx_order_items_order_product;
DROP TABLE IF EXISTS order_items;
COMMIT;
Advanced Migration Patterns
Safe Column Additions
When adding columns to large tables in production, use this pattern to avoid long locks:
-- migrations/000003_add_user_profile_columns.up.sql
BEGIN;
-- Add columns with default values (fast operation)
ALTER TABLE users
ADD COLUMN first_name VARCHAR(100),
ADD COLUMN last_name VARCHAR(100),
ADD COLUMN phone VARCHAR(20),
ADD COLUMN is_active BOOLEAN DEFAULT TRUE NOT NULL;
-- Create partial indexes for selective queries
CREATE INDEX CONCURRENTLY idx_users_active ON users(id) WHERE is_active = TRUE;
CREATE INDEX CONCURRENTLY idx_users_name ON users(first_name, last_name) WHERE first_name IS NOT NULL;
COMMIT;
Data Migration with Batching
For large data transformations, use batching to avoid long-running transactions:
-- migrations/000004_normalize_user_names.up.sql
BEGIN;
-- Add temporary column for migration tracking
ALTER TABLE users ADD COLUMN migration_batch_processed BOOLEAN DEFAULT FALSE;
-- Create function for batch processing
CREATE OR REPLACE FUNCTION migrate_user_names_batch(batch_size INTEGER DEFAULT 1000)
RETURNS INTEGER AS $$
DECLARE
processed_count INTEGER := 0;
total_processed INTEGER := 0;
BEGIN
LOOP
-- Process batch
WITH batch_update AS (
UPDATE users
SET
first_name = TRIM(SPLIT_PART(username, '.', 1)),
last_name = TRIM(SPLIT_PART(username, '.', 2)),
migration_batch_processed = TRUE
WHERE migration_batch_processed = FALSE
AND username LIKE '%.%'
LIMIT batch_size
RETURNING id
)
SELECT COUNT(*) INTO processed_count FROM batch_update;
total_processed := total_processed + processed_count;
-- Exit if no more rows to process
EXIT WHEN processed_count = 0;
-- Small delay to prevent overwhelming the system
PERFORM pg_sleep(0.1);
END LOOP;
RETURN total_processed;
END;
$$ LANGUAGE plpgsql;
COMMIT;
-- Run the migration (this would typically be done by your migration runner)
-- SELECT migrate_user_names_batch(1000);
Zero-Downtime Column Type Changes
Changing column types safely requires a multi-step approach:
-- migrations/000005_change_price_precision.up.sql
-- Step 1: Add new column with desired type
BEGIN;
ALTER TABLE products ADD COLUMN price_new DECIMAL(12,4);
-- Step 2: Populate new column with converted data
UPDATE products SET price_new = price::DECIMAL(12,4);
-- Step 3: Add NOT NULL constraint after data is populated
ALTER TABLE products ALTER COLUMN price_new SET NOT NULL;
-- Step 4: Create new indexes on new column
CREATE INDEX CONCURRENTLY idx_products_price_new ON products(price_new);
COMMIT;
-- Step 5: Application deployment would happen here to use price_new
-- Step 6: In next migration, drop old column and rename new one
-- migrations/000006_complete_price_migration.up.sql
BEGIN;
-- Drop old index
DROP INDEX IF EXISTS idx_products_price;
-- Drop old column
ALTER TABLE products DROP COLUMN price;
-- Rename new column
ALTER TABLE products RENAME COLUMN price_new TO price;
-- Rename index
ALTER INDEX idx_products_price_new RENAME TO idx_products_price;
COMMIT;
Cluster-Specific Considerations
Replica Lag Management
When working with PostgreSQL clusters, consider replica lag during migrations:
-- migrations/000007_add_audit_columns.up.sql
BEGIN;
-- Function to check replica lag before proceeding
CREATE OR REPLACE FUNCTION check_replica_lag(max_lag_seconds INTEGER DEFAULT 30)
RETURNS BOOLEAN AS $$
DECLARE
current_lag INTEGER;
BEGIN
-- Check replication lag (simplified example)
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::INTEGER
INTO current_lag;
IF current_lag > max_lag_seconds THEN
RAISE NOTICE 'Replica lag is % seconds, waiting...', current_lag;
RETURN FALSE;
END IF;
RETURN TRUE;
END;
$$ LANGUAGE plpgsql;
-- Only proceed if replica lag is acceptable
DO $$
BEGIN
IF NOT check_replica_lag(30) THEN
RAISE EXCEPTION 'Migration aborted due to high replica lag';
END IF;
END $$;
-- Add audit columns to all main tables
ALTER TABLE users
ADD COLUMN created_by INTEGER,
ADD COLUMN updated_by INTEGER,
ADD COLUMN version INTEGER DEFAULT 1;
ALTER TABLE products
ADD COLUMN created_by INTEGER,
ADD COLUMN updated_by INTEGER,
ADD COLUMN version INTEGER DEFAULT 1;
ALTER TABLE orders
ADD COLUMN created_by INTEGER,
ADD COLUMN updated_by INTEGER,
ADD COLUMN version INTEGER DEFAULT 1;
COMMIT;
Migration Execution Strategies
Transactional vs Non-Transactional Operations
Some operations cannot be run within transactions. Handle these carefully:
-- migrations/000008_create_concurrent_indexes.up.sql
-- This migration contains operations that cannot be run in a transaction
-- Create indexes concurrently (cannot be done in transaction)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_full_name
ON users(first_name, last_name) WHERE first_name IS NOT NULL AND last_name IS NOT NULL;
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_products_inventory
ON products(inventory_count) WHERE inventory_count > 0;
-- Analyze tables after index creation
ANALYZE users;
ANALYZE products;
Validation and Rollback Strategies
Implement validation checks within migrations:
-- migrations/000009_add_constraints_with_validation.up.sql
BEGIN;
-- Add check constraint with validation
ALTER TABLE products
ADD CONSTRAINT check_price_range
CHECK (price BETWEEN 0.01 AND 999999.99) NOT VALID;
-- Validate constraint (this scans the table but doesn't block writes)
ALTER TABLE products VALIDATE CONSTRAINT check_price_range;
-- Add foreign key constraint
ALTER TABLE order_items
ADD CONSTRAINT fk_order_items_order_id
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE;
-- Validate data integrity
DO $$
DECLARE
invalid_count INTEGER;
BEGIN
SELECT COUNT(*) INTO invalid_count
FROM order_items oi
LEFT JOIN orders o ON oi.order_id = o.id
WHERE o.id IS NULL;
IF invalid_count > 0 THEN
RAISE EXCEPTION 'Found % orphaned order items. Migration aborted.', invalid_count;
END IF;
RAISE NOTICE 'Data validation passed. Migration completed successfully.';
END $$;
COMMIT;
Best Practices and Common Pitfalls
1. Always Use Transactions When Possible
-- Good: Wrapped in transaction
BEGIN;
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
UPDATE users SET email_verified = TRUE WHERE created_at < '2024-01-01';
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
COMMIT;
-- Avoid: Multiple separate transactions
-- ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT FALSE;
-- UPDATE users SET email_verified = TRUE WHERE created_at < '2024-01-01';
-- ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
2. Handle Large Tables Carefully
-- migrations/000010_partition_large_table.up.sql
BEGIN;
-- Create partitioned table for orders
CREATE TABLE orders_partitioned (
LIKE orders INCLUDING ALL
) PARTITION BY RANGE (created_at);
-- Create partitions for current and future months
CREATE TABLE orders_2024_q1 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
CREATE TABLE orders_2024_q3 PARTITION OF orders_partitioned
FOR VALUES FROM ('2024-07-01') TO ('2024-10-01');
-- Migration strategy would involve gradually moving data
-- This is typically done in multiple migrations to minimize impact
COMMIT;
3. Implement Proper Error Handling
-- migrations/000011_complex_data_migration.up.sql
BEGIN;
-- Create backup table for safety
CREATE TABLE users_backup AS SELECT * FROM users;
-- Set up error handling
DO $$
DECLARE
error_count INTEGER := 0;
total_count INTEGER;
BEGIN
-- Get total count for progress tracking
SELECT COUNT(*) INTO total_count FROM users;
-- Perform complex migration with error handling
BEGIN
-- Your complex migration logic here
UPDATE users SET email = LOWER(TRIM(email));
-- Validate results
SELECT COUNT(*) INTO error_count
FROM users
WHERE email !~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$';
IF error_count > 0 THEN
RAISE EXCEPTION 'Found % invalid email addresses after migration', error_count;
END IF;
RAISE NOTICE 'Successfully migrated % user records', total_count;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Migration failed: %', SQLERRM;
RAISE;
END;
END $$;
COMMIT;
-- migrations/000011_complex_data_migration.down.sql
BEGIN;
-- Restore original data from backup table created during UP migration
DO $$
DECLARE
backup_exists BOOLEAN := FALSE;
backup_count INTEGER := 0;
current_count INTEGER := 0;
BEGIN
-- Check if backup table exists
SELECT EXISTS (
SELECT 1 FROM information_schema.tables
WHERE table_name = 'users_backup'
AND table_schema = current_schema()
) INTO backup_exists;
IF NOT backup_exists THEN
RAISE EXCEPTION 'Backup table "users_backup" not found. Cannot safely rollback migration.';
END IF;
-- Get counts for validation
SELECT COUNT(*) INTO backup_count FROM users_backup;
SELECT COUNT(*) INTO current_count FROM users;
RAISE NOTICE 'Found backup table with % records, current table has % records', backup_count, current_count;
-- Restore original data
TRUNCATE users;
INSERT INTO users SELECT * FROM users_backup;
-- Validate restoration
SELECT COUNT(*) INTO current_count FROM users;
IF current_count != backup_count THEN
RAISE EXCEPTION 'Restoration failed: expected % records, got %', backup_count, current_count;
END IF;
-- Clean up backup table
DROP TABLE users_backup;
RAISE NOTICE 'Successfully restored % user records to original state', current_count;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Rollback failed: %', SQLERRM;
RAISE;
END $$;
COMMIT;
Monitoring and Observability
Migration Timing and Performance
-- Function to log migration performance
CREATE OR REPLACE FUNCTION log_migration_performance(
migration_version BIGINT,
migration_name VARCHAR(255),
start_time TIMESTAMP WITH TIME ZONE
)
RETURNS VOID AS $$
DECLARE
end_time TIMESTAMP WITH TIME ZONE := CURRENT_TIMESTAMP;
duration_ms INTEGER;
BEGIN
duration_ms := EXTRACT(EPOCH FROM (end_time - start_time)) * 1000;
INSERT INTO schema_migrations (version, filename, checksum, applied_at, execution_time_ms)
VALUES (migration_version, migration_name, 'calculated_checksum', start_time, duration_ms);
RAISE NOTICE 'Migration % completed in % ms', migration_name, duration_ms;
END;
$$ LANGUAGE plpgsql;
Conclusion
Incremental database migrations are essential for maintaining PostgreSQL clusters in production environments. Key takeaways include:
Plan migrations carefully: Consider the impact on cluster performance and replica lag
Use transactions wisely: Group related changes but be aware of operations that cannot be transactional
Implement proper validation: Always verify data integrity after migrations
Design for rollbacks: Ensure every migration has a corresponding down migration
Monitor performance: Track execution times and system impact
Test thoroughly: Validate migrations in staging environments that mirror production
By following these patterns and best practices, you can safely evolve your PostgreSQL database schema while maintaining high availability and data integrity across your cluster.
Remember
The goal of incremental migrations is not just to change the schema, but to do so in a way that minimizes risk, maintains performance, and preserves the ability to quickly recover from issues. Always err on the side of caution and implement comprehensive monitoring to ensure your migrations are successful.



