Schema Evolution

Schema evolution is the practice of changing data structures, API contracts, or message formats over time while maintaining compatibility with existing clients and services. This is critical for zero-downtime deployments in distributed systems.

Forward Compatibility

Forward compatibility means that old code can read data written by new code. The old system can safely ignore new fields it doesn’t understand.

When adding new fields to a schema:

New fields should be optional with sensible defaults
Old services can process new messages by ignoring unknown fields
Allows deploying new producers before updating consumers

Backward Compatibility

Backward compatibility means that new code can read data written by old code. The new system must handle the absence of fields that didn’t exist in older versions.

When reading old data:

New code must provide defaults for missing fields
New services can process old messages correctly
Allows deploying new consumers before updating producers

Breaking Changes

Breaking changes destroy compatibility and require coordinated deployments. Avoid these whenever possible:

Removing required fields
Changing field types
Renaming fields without aliasing
Changing field semantics
Making optional fields required

Safe Schema Changes

Safe changes maintain compatibility:

Adding optional fields with defaults
Removing optional fields
Adding new enum values at the end
Adding new message types
Deprecating fields instead of removing them

Versioning Strategies

URL Versioning: Different versions in the URL path like /api/v1/users and /api/v2/users

Header Versioning: Version specified in request headers like Accept: application/vnd.api.v2+json

Content Negotiation: Different media types for different versions

No Versioning: Evolve schema compatibly without explicit versions. Requires discipline but provides best flexibility.

Migration Strategies

Expand-Contract Pattern: Three-phase deployment for breaking changes:

Expand: Add new field alongside old field
Migrate: Update all services to use new field
Contract: Remove old field after migration complete

Shadow Reading: New code reads both old and new formats, writes only new format

Feature Flags: Toggle between old and new behavior at runtime

Database Schema Evolution

Database schemas require special care because data persists:

Use migrations that can run without downtime
Add columns as nullable first, backfill data, then make required
Drop columns in separate deployment after code stops using them
Use views or aliases to maintain old column names during transitions

Schema evolution is not optional in production systems. Every change must consider compatibility to avoid outages during deployments.

Why you need to know this?

All backend systems have databases. Backend systems should never share database access with other services. We need to maintain and evolve database schemas without breaking existing application. By following good schema evolution patterns we can ensure smooth deployments and maintain system reliability.

Backward compatibility is more important than forward compatibility in most backend systems, because backend systems are the source of truth for data. Also if you need to rollback the code on the application for a bug or some mistake, or the business just changes their minds, you can do that easily without breaking the database. If you are smart enough you might be able to escape database migrations in some cases even.

Forward compatibility is more important in event-driven systems, where multiple services consume the same events. In this case, you want to make sure that old services can still process new events without issues.

Keyboard shortcuts

Diego Pacheco's Software Architecture Library