How We Built Tenant Isolation for hackaws.cloud. And How Dam Secure Finds What We Miss
hackaws.cloud is an autonomous AWS penetration testing platform. Customers connect their AWS accounts, configure a foothold identity, and our agent performs real lateral movement and privilege escalation, building a live attack graph as it works.
That means we store AWS account IDs, IAM role ARNs, assumed-role credentials, and full attack path graphs. If tenant isolation fails, one customer could see another customer's AWS infrastructure map, or perform confused deputy attacks to assume roles in other customers' accounts. In an extreme scenario, chained with another bug, an attacker could even access stored credentials. The stakes are about as high as they get for a multi-tenant SaaS.
Tenant isolation isn't new territory for us. At Plerion, where I'm Chief Innovation Officer, we face the same class of problem: our cloud security platform ingests customer cloud configurations, vulnerability data, and identity graphs across thousands of accounts. The compliance requirements alone (SOC 2, ISO 27001) demand rigorous tenant boundaries. hackaws.cloud gave us a greenfield opportunity to apply those lessons from day one rather than retrofitting them.
This post covers our tenant isolation architecture, the technical decisions behind it, and how we use Dam Secure's team rules to continuously systematically verify the security of our implementation. We deployed 6 rules in Dam Secure, like "Never expose tenant ID to the frontend", and because Dam Secure doesn't suffer from context rot, the rules were applied every time. The rules also got automatically applied directly to an agentic planning session to remind the LLM to use these rules before a line of code was generated.
The Architecture
Tenant ID: Server-Generated, Never Client-Supplied
Every tenant gets a UUID v4 identifier, generated server-side on sign-up:
// user-repository.ts
const newUser: UserRecord = {
PK: `USER#${cognitoSub}`,
userId: cognitoSub,
tenantId: generateTenantId(), // crypto.randomUUID()
email,
createdAt: new Date().toISOString(),
};
The tenant ID is stored in a dynamo db table, keyed by the Cognito sub claim. It never appears in JWTs, never comes from the client, and never leaves our backend.
This was a deliberate decision. Many multi-tenant apps embed a tenantId in the JWT or accept it as a header. That creates a trust boundary problem: you're relying on the client to tell you who it is. We made the boundary explicit: the tenant ID exists only in the database and is resolved on every request.
Auth Middleware: The Single Entry Point
Every API handler is wrapped with withAuth(), a higher-order function that:
- Extracts the Bearer token from the Authorization header
- Verifies the JWT against Cognito's JWKS
- Looks up the user record by Cognito
sub - Builds an
AuthContextwith the tenant ID from the database - Passes it to the handler
// auth-middleware.ts
const authContext: AuthContext = {
userId: claims.sub,
tenantId: user.tenantId, // From DB, never from claims
email: user.email,
};
return await handler(event, authContext);
The AuthContext interface is intentionally readonly:
export interface AuthContext {
readonly userId: string;
readonly tenantId: string;
readonly email: string;
}
Handlers never construct their own context. They receive it from the middleware or they don't get one at all.
DynamoDB: Tenant ID as Partition Key
For tables that are purely tenant-scoped, the DynamoDB partition key is the tenant ID:
// data-stack.ts (CDK)
this.accountsTable = new dynamodb.Table(this, "AccountsTable", {
partitionKey: { name: "tenantId", type: dynamodb.AttributeType.STRING },
sortKey: { name: "accountId", type: dynamodb.AttributeType.STRING },
});
this.assessmentsTable = new dynamodb.Table(this, "AssessmentsTable", {
partitionKey: { name: "tenantId", type: dynamodb.AttributeType.STRING },
sortKey: { name: "assessmentId", type: dynamodb.AttributeType.STRING },
});
This means a Query on the accounts table physically cannot return another tenant's data. DynamoDB's partition key is the first filter, before any application logic runs.
For tables with composite keys (identities, graph nodes, events), the tenant ID is embedded in the partition key string: {tenantId}#{assessmentId} or {tenantId}#{accountId}.
TenantDb: The Guard at the Data Layer
Even with partition keys, we wanted defense in depth. TenantDb is a wrapper around the DynamoDB document client that automatically enforces tenant scoping on every operation:
- Put: Injects
tenantIdinto the item automatically - Get: Returns
nullif the item'stenantIddoesn't match the caller's - Query: Appends a
tenantId = :__tenantId__filter expression - Update/Delete: Adds a
tenantId = :__tenantId__condition expression (the operation fails if ownership doesn't match)
// tenant-db.ts
async get(tenantId: string, params: GetCommandInput) {
this.assertValidTenantId(tenantId);
const result = await this.docClient.send(new GetCommand(params));
if (!result.Item) return null;
if (result.Item.tenantId !== tenantId) return null; // Silent rejection
return result.Item;
}
async delete(tenantId: string, params: DeleteCommandInput) {
this.assertValidTenantId(tenantId);
// Appends: ConditionExpression = "tenantId = :__tenantId__"
// DynamoDB rejects the delete if the tenant doesn't own the item
}
Every method validates the tenant ID format (UUID v4 regex) before touching DynamoDB. An invalid or missing tenant ID throws a ForbiddenError immediately.
Response Sanitization
Tenant IDs are stripped from every API response before it reaches the frontend:
// accounts-list.ts
const accounts = await getAccountRepo().listAccounts(authContext.tenantId);
const sanitized = accounts.map(({ tenantId: _, ...rest }) => rest);
return jsonResponse(200, { accounts: sanitized }, origin);
The whoami endpoint returns userId and email, but not tenantId. The frontend identifies users by their Cognito sub, never by tenant ID.
Error Handling
Backend errors return generic messages. If an assessment doesn't belong to your tenant, you get 404 Assessment not found, not 403 You don't own this assessment. AWS SDK errors are caught and replaced with Internal server error rather than forwarded to the client, where they could leak table names, ARNs, or account structure.
How Dam Secure Helped
Building all of this is one thing. Verifying it stays correct as the codebase grows is another.
We built hackaws.cloud almost entirely with AI coding agents. Claude Code, Cursor, etc. These tools are incredibly productive, but they share a fundamental limitation. They lose context. A long session drifts. A new session starts fresh. You can write perfect rules in CLAUDE.md or .cursorrules telling the agent exactly how tenant isolation works, but those instructions compete with everything else in the context window. By the time the agent is deep in a feature branch, wrestling with some unrelated DynamoDB pagination bug, those tenant isolation rules are a distant memory.
This isn't a tooling bug. It's an inherent property of how LLM-based agents work. Context windows are finite, attention degrades over distance, and the agent optimizes for the immediate task, not the architectural invariant you defined three thousand tokens ago. The result is subtle drift: a handler that constructs its own DynamoDB query instead of going through TenantDb, a new endpoint that reads tenantId from the request body because that's what was in the test fixture. Each violation looks reasonable in isolation. None of them trigger an error. They just quietly erode your security model.
This is why you need something external to the coding agent. Something that doesn't lose context, doesn't get distracted, and checks every line against the rules you actually care about. That's what Dam Secure gives us: a persistent, stateless verification layer that re-evaluates the entire codebase against our tenant isolation rules on every scan, regardless of how the code was written or which agent wrote it.
The Rules We Wrote
We created six team rules in Dam Secure, each targeting a specific class of tenant isolation failure. Here's what they are and why each one matters.
1. Never bypass TenantDb
Never bypass TenantDb. No direct DocumentClient.send() on tenant-scoped data. All access goes through TenantDb or a repository that enforces tenant ownership.
This is the foundational rule. TenantDb exists to make tenant isolation automatic, but it only works if developers actually use it. A single docClient.send(new GetCommand(...)) without tenant checking creates a bypass. This rule flags any direct DynamoDB operations on tenant-scoped tables that skip the tenant wrapper.
2. Never expose tenant ID to the frontend
Never expose tenant ID to the frontend. Strip it from all API responses. The frontend identifies users by userId (Cognito sub), never by tenantId.
If the tenant ID leaks to the client, it becomes an attack vector. An attacker who knows another tenant's ID could attempt IDOR attacks, even if the backend validates ownership. Keeping the ID server-side-only eliminates this class of attack entirely. This rule catches any API response that includes a tenantId field.
3. Return generic errors
Return generic errors. Never leak table names, config values, tenant IDs, or internal error details in responses.
AWS SDK errors contain rich debugging information: table names, condition expression details, request IDs. Forwarding these to the client tells an attacker exactly what your data model looks like. This rule flags any handler that re-throws or forwards raw error objects rather than returning sanitized messages.
4. Scope every DB operation to the tenant
Scope every DB operation to the tenant. Use TenantDb for shared tables (it injects/validates tenantId automatically). For tenant-keyed tables like accounts, use tenantId from AuthContext as the partition key.
This is the operational detail behind rule 1. We have two table patterns: shared tables (where TenantDb injects filtering) and tenant-keyed tables (where tenantId is the partition key). This rule checks that both patterns are followed correctly: that shared table access goes through TenantDb, and that tenant-keyed table access uses authContext.tenantId as the key, not a value from the request body.
5. Always use withAuth()
Always use withAuth(). Every handler must be wrapped with withAuth(). This is the only way to get an AuthContext.
If a handler skips withAuth(), it has no AuthContext, which means it has no verified tenant ID. It might still work (by reading tenant info from the request body or query params), but that tenant info is attacker-controlled. This rule flags any exported Lambda handler that isn't wrapped in withAuth().
6. Never read tenant ID from the client
Never read tenant ID from the client. Not from body, headers, query params, path params, or cookies. It comes from AuthContext only.
The complement to rule 5. Even within a withAuth()-wrapped handler, a developer might accidentally read event.body.tenantId instead of authContext.tenantId. This rule catches that pattern: any reference to tenant ID from the request object rather than the auth context.
Using Extra Notes to Descope Non-Auth Operations
Not every DynamoDB operation in the codebase goes through withAuth(). There are generic backend-to-backend Lambda functions. They're invoked by Step Functions or DynamoDB Streams, not by API Gateway. They don't have a user session, so they don't have an AuthContext.
Rather than creating exceptions that weaken the rules, we used Dam Secure's extra notes feature to descope these operations. We annotated the non-auth handlers with context explaining why they're exempt: they run in backend-only execution contexts with no user-facing API surface, and the tenant ID is passed in the Lambda invocation payload from a trusted upstream handler that did verify it.
This keeps the rules strict for the API surface (where the actual attack surface lives) without generating false positives on internal plumbing.
What Dam Secure Found
The scan surfaced several categories of issues across our repositories:
- DynamoDB operations lacking tenant-scoping: direct
DocumentClient.send()calls that bypassedTenantDb, particularly in newer handlers that hadn't been reviewed yet - Backend error messages leaking to the frontend: raw AWS SDK errors being re-thrown instead of sanitized
- Production secrets as plain strings: infrastructure code consuming secrets without going through Secrets Manager
None of these were actively exploited vulnerabilities. They were structural weaknesses, places where the pattern was broken and tenant data could leak if the wrong conditions aligned. That's exactly what codified rules are good at catching: not the spectacular failures, but the quiet drift away from the architecture you designed.
Lessons Learned
Make the safe path the easy path. TenantDb exists so developers don't have to remember to add tenant filters. withAuth() exists so developers don't have to parse JWTs. The rules enforce that these easy paths are the only paths.
Tenant isolation is a property of the system, not individual handlers. Any single handler can look correct in isolation. The value of codified rules is verifying the property holds across the entire codebase, including code written by new team members who weren't there when the architecture was designed.
Descoping is better than weakening. When we hit false positives on backend-to-backend handlers, the temptation was to soften the rules. Using extra notes to descope specific contexts kept the rules strict where they matter most: on the API surface.
Defense in depth pays off. Partition keys, TenantDb wrappers, AuthContext resolution, response sanitization, generic errors. Any one of these could fail and the others would still prevent a cross-tenant data leak. The rules verify that all layers are present, not just one.
--
Written by Daniel Grzelak, Chief Innovation Officer at Plerion - a leading cloud security platform with an AI security engineer you can hire.
