Schema Evolution
Schema evolution is the practice of changing data structures, API contracts, or message formats over time while maintaining compatibility with existing clients and services. This is critical for zero-downtime deployments in distributed systems.
Forward Compatibility
Forward compatibility means that old code can read data written by new code. The old system can safely ignore new fields it doesn’t understand.
When adding new fields to a schema:
- New fields should be optional with sensible defaults
- Old services can process new messages by ignoring unknown fields
- Allows deploying new producers before updating consumers
Backward Compatibility
Backward compatibility means that new code can read data written by old code. The new system must handle the absence of fields that didn’t exist in older versions.
When reading old data:
- New code must provide defaults for missing fields
- New services can process old messages correctly
- Allows deploying new consumers before updating producers
Breaking Changes
Breaking changes destroy compatibility and require coordinated deployments. Avoid these whenever possible:
- Removing required fields
- Changing field types
- Renaming fields without aliasing
- Changing field semantics
- Making optional fields required
Safe Schema Changes
Safe changes maintain compatibility:
- Adding optional fields with defaults
- Removing optional fields
- Adding new enum values at the end
- Adding new message types
- Deprecating fields instead of removing them
Versioning Strategies
URL Versioning: Different versions in the URL path like /api/v1/users and /api/v2/users
Header Versioning: Version specified in request headers like Accept: application/vnd.api.v2+json
Content Negotiation: Different media types for different versions
No Versioning: Evolve schema compatibly without explicit versions. Requires discipline but provides best flexibility.
Migration Strategies
Expand-Contract Pattern: Three-phase deployment for breaking changes:
- Expand: Add new field alongside old field
- Migrate: Update all services to use new field
- Contract: Remove old field after migration complete
Shadow Reading: New code reads both old and new formats, writes only new format
Feature Flags: Toggle between old and new behavior at runtime
Database Schema Evolution
Database schemas require special care because data persists:
- Use migrations that can run without downtime
- Add columns as nullable first, backfill data, then make required
- Drop columns in separate deployment after code stops using them
- Use views or aliases to maintain old column names during transitions
Schema evolution is not optional in production systems. Every change must consider compatibility to avoid outages during deployments.
Why you need to know this?
All backend systems have databases. Backend systems should never share database access with other services. We need to maintain and evolve database schemas without breaking existing application. By following good schema evolution patterns we can ensure smooth deployments and maintain system reliability.
Backward compatibility is more important than forward compatibility in most backend systems, because backend systems are the source of truth for data. Also if you need to rollback the code on the application for a bug or some mistake, or the business just changes their minds, you can do that easily without breaking the database. If you are smart enough you might be able to escape database migrations in some cases even.
Forward compatibility is more important in event-driven systems, where multiple services consume the same events. In this case, you want to make sure that old services can still process new events without issues.