Multi-tenancy design
Learn how to serve multiple customers from shared infrastructure without data leakage, using silo, bridge, and pool isolation models with tenant-aware routing.
TL;DR
- Multi-tenancy means multiple customers (tenants) share the same system infrastructure while their data remains logically or physically isolated.
- Three isolation models: silo (one database per tenant, maximum isolation, highest cost), bridge (shared database, schema per tenant), pool (shared tables with a tenant_id column, minimum cost, maximum risk).
- Data leakage between tenants is the single most dangerous failure mode. One missed WHERE clause can expose customer data to the wrong tenant.
- Noisy neighbor is the operational risk: one tenant's heavy usage degrades performance for everyone sharing the same resources.
- Most production SaaS systems use a hybrid: small tenants in the pool model, enterprise tenants in silo, with automated migration tooling between tiers.
The Problem It Solves
Your SaaS application has 2,000 paying customers. During a routine database query, an engineer runs a report without a WHERE clause on tenant_id. The resulting CSV contains order data from all 2,000 tenants. It gets attached to an email thread. An enterprise customer's confidential pricing data is now exposed.
Or the less dramatic but equally painful version: one of your largest tenants runs an expensive analytics query that locks shared tables for 30 seconds. During that window, your 1,999 other tenants experience timeouts. Your support queue fills up, but the tenant running the query doesn't even notice because their workload completed successfully.
These are the two fundamental multi-tenancy risks: data leakage (one tenant sees another's data) and noisy neighbor (one tenant's workload degrades everyone else's performance). Both stem from sharing infrastructure between customers.
The single-tenant approach (deploy a completely separate stack per customer) avoids these risks but doesn't scale. Managing 2,000 independent deployments means 2,000 database upgrades, 2,000 schema migrations, and 2,000 monitoring dashboards. The operational cost makes it economically unviable for all but the largest enterprise contracts.
Multi-tenancy is the engineering discipline of sharing infrastructure safely. The question isn't whether to share, it's how much isolation each customer needs, and at what cost.
What Is It?
Multi-tenancy is an architecture where a single instance of software serves multiple customers (tenants), with mechanisms to ensure each tenant's data, performance, and configuration are isolated from every other tenant.
Analogy: Think of an apartment building. Each tenant has their own unit with a lock on the door (data isolation). They share the building's plumbing, electrical, and elevator (shared infrastructure). Some tenants pay extra for a penthouse with a private elevator (silo model). Most share the common elevator but have a maximum occupancy limit so one large family doesn't monopolize it (noisy neighbor controls). The building management company runs one maintenance team for the whole building, not one per unit (operational efficiency).
The isolation level you choose is the central architecture decision. There's a spectrum from full isolation (expensive, simple to secure) to full sharing (cheap, operationally risky).
No single model is correct. The right answer depends on your customer mix, regulatory requirements, and cost constraints. Most mature SaaS products use a hybrid.
How It Works
Let's trace a request through a multi-tenant system. A user at Acme Corp hits acme.myapp.com/api/orders. The system must: (1) identify the tenant, (2) route to the right data, (3) enforce isolation.
Tenant routing middleware
Every request passes through tenant resolution before reaching business logic:
class TenantMiddleware:
def process_request(self, request):
# Strategy 1: Subdomain
tenant_slug = request.host.split('.')[0] # "acme" from acme.myapp.com
# Strategy 2: JWT claim
# tenant_id = request.auth.claims["tenant_id"]
# Strategy 3: API key prefix
# tenant_slug = request.headers["X-API-Key"].split("-")[1]
tenant = cache.get(f"tenant:{tenant_slug}")
if not tenant:
tenant = db.query("SELECT * FROM tenants WHERE slug = %s", tenant_slug)
cache.set(f"tenant:{tenant_slug}", tenant, ttl=3600)
request.tenant = tenant
request.db = TenantScopedDB(tenant.id, tenant.isolation_model)
The middleware caches tenant lookups (they're effectively read-only) and sets a tenant context that all downstream code uses. I'll often see teams skip the caching step, which adds a database round-trip to every single request for data that changes maybe once a month.
Data isolation by model
Silo model routes the entire database connection to a dedicated instance:
class TenantScopedDB:
def get_connection(self):
if self.isolation_model == "silo":
return connection_pool.get(self.tenant.dedicated_db_url)
elif self.isolation_model == "bridge":
conn = connection_pool.get(shared_db_url)
conn.execute(f"SET search_path TO tenant_{self.tenant.id}")
return conn
else: # pool
return connection_pool.get(shared_db_url)
# All queries auto-filtered by tenant_id
Pool model enforces tenant_id at the query layer. This is the most critical code path in any multi-tenant system:
class TenantScopedQuerySet:
"""All database access goes through this. No bypass allowed."""
def __init__(self, tenant_id: UUID):
self._tenant_id = tenant_id
def orders(self) -> QuerySet:
return Order.objects.filter(tenant_id=self._tenant_id)
def users(self) -> QuerySet:
return User.objects.filter(tenant_id=self._tenant_id)
# Never expose the raw ORM to application code
request.db = TenantScopedQuerySet(tenant_id=request.tenant.id)
The pool model's existential risk: missing WHERE clauses
In the pool model, a single query missing WHERE tenant_id = ... returns data from all tenants. This is not a theoretical risk. It's the most common multi-tenancy bug, and it happens when a developer writes a raw SQL query, uses an ORM method that bypasses the scoped wrapper, or forgets the filter in a background job. Enforce tenant scoping at the infrastructure layer (row-level security, query rewriting middleware), not just by convention.
Row-level security as a safety net
PostgreSQL's Row-Level Security (RLS) provides database-enforced isolation for the pool model:
-- Enable RLS on the orders table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
-- Create a policy: each session can only see rows matching its tenant
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.current_tenant_id')::uuid);
-- Middleware sets the session variable before every query
SET app.current_tenant_id = 'acme-tenant-uuid';
SELECT * FROM orders; -- RLS automatically filters to Acme's rows only
Even if application code forgets the WHERE clause, the database itself enforces the filter. My recommendation: use RLS as a safety net alongside application-layer filtering, not as a replacement. RLS adds query overhead (the policy evaluation) and complicates EXPLAIN plans, but the data leakage prevention is worth it.
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.