Solving the Shared Ownership Alerting Challenge
January 19, 2025 · By · Filed underAt Endowus, our tech ecosystem is built on a microservices architecture, with multiple stream-aligned teams owning subsets of the backend services. These teams are responsible for the features and domain capabilities that power our frontend web and mobile applications.
To connect these applications with our microservices, we rely on a Backend for Frontend (BFF) platform. Built with the NestJS framework, the BFF acts as a gateway, exposing modularized endpoints that aggregate and simplify interactions between frontend and backend layers.
This platform, while central to our architecture, is inherently a shared component. Each endpoint is tied to specific backend services owned by different teams. This shared ownership introduces complexity when managing operational concerns, such as error monitoring and alerting. How do we ensure that when something goes wrong, the right team is notified and can take action promptly?
The Challenge: Clear Ownership in a Shared Platform
With multiple teams relying on the BFF platform, accountability for production issues becomes a challenge. Consider the typical lifecycle of a production incident:
- A user encounters an issue that generates an error.
- The error is logged and sent as a Sentry alert.
- Teams need to triage the alert and resolve the issue quickly.
Without clear routing of alerts, issues like the following can arise:
- Missed Alerts: Alerts may not reach the correct team if ownership isn’t explicitly defined, delaying resolution.
- Alert Fatigue: Teams receiving irrelevant alerts can become desensitized, leading to genuine issues being overlooked.
- Inconsistent Processes: With no standard mechanism, different teams might implement custom solutions, increasing maintenance overhead and reducing cohesion.
What’s needed is a system that enforces ownership and guarantees that every alert is automatically routed to the appropriate team without manual intervention.
Our Approach: Building a Team-Based Alerting Mechanism
To address these challenges, we designed and implemented a robust alert routing mechanism within the BFF platform.
These were our design considerations:
- Enforcing Ownership:
Each endpoint in the BFF must be explicitly associated with a team. This metadata must be mandatory, ensuring that no endpoint is left unowned. - Tagging Alerts with Metadata:
We leverage Sentry’s tagging functionality to include team metadata in all alerts. This makes it possible to programmatically route alerts based on ownership and filter them in Sentry’s dashboard. - Seamless Developer Experience:
Adding team metadata to endpoints must be straightforward for developers. We integrated this requirement into our NestJS modules in a way that aligns with existing workflows, reducing cognitive load and increasing adoption. - Pre-Runtime Validation:
To prevent misconfigurations, we implemented a validation step that ensures all endpoints are tagged with team metadata before deployment. This guarantees no alert falls through the cracks.
This approach not only resolves the immediate alerting problem but also lays the foundation for additional metadata-driven validations, such as enforcing domain boundaries and public endpoint checks.
Alerting Mechanism Overview
The flow begins when a request is received by the controller.
- The controller processes and tags the request with team-specific metadata.
- If an error occurs during the request’s lifecycle, an exception event is generated and captured by the Sentry App.
- The Sentry App is configured to automatically trigger and route a corresponding alert to the relevant team’s Slack channel.
Implementation Details
Let’s dive into the practical implementation of this functionality within our NestJS BFF. We’ll walk through each step, highlighting key NestJS concepts along the way. For clarity, we’ll illustrate with an example involving three teams: Team A, Team B, and Team C..
1. Creating the Team Decorator
First, we create a custom NestJS decorator that allows developers to bind team metadata to a method.
// team.decorator.ts
export enum TeamTag {
TeamA = 'team-a',
TeamB = 'team-b',
TeamC = 'team-c',
}
export const Team = (tag: TeamTag): CustomDecorator<string> => {
return SetMetadata('TEAM_TAG_METADATA_KEY', tag);
};
2. Developer Usage: Decorating the Controllers with Team Metadata
In NestJS, controllers are components in which endpoints are defined. Developers can attach team metadata to these controllers by using the Team
Decorator. For instance, a developer from Team A would apply the decorator as follows:
@Team(TeamTag.TeamA)
@Controller({ path:'team-a-service', version: '1' })
export class TeamAController {
// ...controller endpoint methods
}
Note: For greater flexibility, we can also override the controller level metadata by applying the Team Decorator to individual methods.
@Team(TeamTag.TeamB)
@Get('data-endpoint')
async getData(@Query() queryParams: QueryParams): Promise<ResponseData> {
return this.teamBService.getData(queryParams);
}
3. Adding Team Metadata to the Request Payload
Bundling the metadata with the request payload will allow us to cohesively reference this data for our alert routing scenario.
To accomplish this, we’ll use a NestJS guard that intercepts the request pre-controller and adds the team metadata to the request payload for use later.
// add-team-tag.guard.ts
@Injectable()
export class AddTeamTagGuard implements CanActivate {
constructor(private readonly reflector: Reflector) {}
canActivate(context: ExecutionContext): boolean {
const req = context.switchToHttp().getRequest();
const handler = context.getHandler();
const endpointClass = context.getClass();
const team = this.reflector.getAllAndOverride(TEAM_TAG_METADATA_KEY, [handler, endpointClass]);
req.team = team;
return true;
}
}
To enable the guard for all controllers, we add it as a global guard in the main module file (app.module.ts
). Global guards will apply to all controllers within the main module.
// app.module
...
providers: [
{ provide: APP_GUARD, useClass: AddTeamTagGuard },
]
4. Implementing Exception Handling Alert Logic
We implement a NestJS exception filter which will handle errors by sending an exception event to the Sentry App.
Using the official package @sentry/node
, we can easily generate and capture Sentry events. Relevant fields, including the team metadata, are added to the event scope. This is essential for Slack Channel routing done by the Sentry Application in later steps.
// exceptions.filter.ts
import { ArgumentsHost, Catch, ExceptionFilter } from '@nestjs/common';
import * as Sentry from '@sentry/node';
@Catch()
export class AllExceptionFilter implements ExceptionFilter {
catch(exception: any, host: ArgumentsHost): any {
const ctx = host.switchToHttp();
Sentry.withScope((scope) => {
const request = ctx.getRequest();
scope.setTag('team', request.team);
Sentry.captureException(exception);
});
const response = ctx.getResponse();
return response.status(exception.statusCode).json(exception);
}
}
As with the global guard previously, we enable the filter as a global filter for all modules.
// main.ts
import { AppModule } from './app.module';
const app = await NestFactory.create(AppModule);
app.useGlobalFilters(new AllExceptionFilter());
5. Enforcing Ownership through Validation
While decorators offer a convenient way to specify team ownership, it’s easy for developers to overlook them. To ensure no endpoint is left untagged, we implement a validation module that runs during application startup.
The validation logic utilizes internal NestJS modules to iterate through each method/route:
DiscoveryService
: Traverses the NestJS app module graph and retrieves submodule components of the applicationMetadataScanner
: Scans and retrieves metadata from a submodule component
During application startup, ModuleValidationService.validate
is invoked. DiscoveryService
and MetadataScanner
iterate through the controller routes and retrieve team metadata. If any endpoint is missing the required metadata, the application will gracefully terminate, preventing deployment of a misconfigured BFF.
// module-validation.service.ts
import { Injectable } from '@nestjs/common';
import { Controller, Type } from '@nestjs/common/interfaces';
import { DiscoveryService, MetadataScanner, Reflector } from '@nestjs/core';
import { InstanceWrapper } from '@nestjs/core/injector/instance-wrapper';
import { Subject } from 'rxjs';
@Injectable()
export class ModuleValidationService {
constructor(
private readonly metadataScanner: MetadataScanner,
private readonly reflector: Reflector,
private readonly discoveryService: DiscoveryService,
) {}
private subject: Subject<void> = new Subject();
// subscribe to the shutdown in main.ts
subscribe(callback: () => void): void {
this.subject.subscribe(() => callback());
}
validate(): void {
const controllers: InstanceWrapper<Controller>[] = this.discoveryService.getControllers();
// validate each controller
controllers.forEach((controller) => {
const instance = controller.instance;
// get all methods / endpoints
const proto = Object.getPrototypeOf(instance);
const methods: Type<any>[] = this.metadataScanner.scanFromPrototype(instance, proto, (method) => {
const instanceHandle = instance.constructor;
const teamTag = reflector.getAllAndOverride('TEAM_TAG_METADATA_KEY', [method, instanceHandle]);
if (!teamTag) {
this.subject.next(); // terminates the application
}
return proto[method];
});
});
}
}
Next, we wire this logic on startup with NestJS application lifecycle hook.
// app.module
export class AppModule implements OnApplicationBootstrap {
constructor(private readonly moduleValidationService: ModuleValidationService) {}
onApplicationBootstrap() {
this.moduleValidationService.validate();
}
}
And then we complete the integration with an observable subscription to the module validation service callback to receive the termination signal.
// main.ts
app.get(ModuleValidationService).subscribe(terminateApp(app) const terminateApp = (app: INestApplication) => {
return () => {
app.close();
throw new ModuleValidationFailedError();
};};)
6. Sentry App: Creating a Sentry Alert Rule
The final piece of the puzzle is configuring Sentry to leverage the team tag we’re now including in our error events. We create a Sentry alert rule to match the team tag encapsulated within the Sentry event and configure destination Slack Channel connection details.
Looking Ahead: Extending the Metadata Pattern
Our team-based alerting mechanism is just the beginning. This metadata-driven approach opens up possibilities for enhancing operational reliability and code quality across the board. For example:
- Multi-Team Ownership: We could extend the model to allow multiple teams to co-own certain endpoints by stringifying metadata or using structured formats.
- Visualizing Module Topologies: By leveraging NestJS tools like
DiscoveryService
andMetadataScanner
, we can create a visual interface for exploring endpoint ownership and dependencies. - Proactive Quality Controls: The same metadata framework could enforce standards such as secure public endpoint exposure or domain validation.
Our ultimate goal is to foster a tech ecosystem where shared components like the BFF empower teams with autonomy and accountability—without compromising collaboration or reliability.
Thank you for reading! We hope this post has sparked ideas for tackling shared ownership and alerting challenges in your own platforms. We’d love to hear how you’ve approached these problems in your tech stacks!