Structured logging, distributed tracing, and metrics for Node.js services on Azure. From flying blind to production-ready monitoring — and knowing when your system is about to break before your users do.
⚡ Add your Anthropic API key to unlock live AI tutoring in every section
01 — Structured Logging
Stop logging strings. Log objects.
Unstructured log strings are unqueryable in production. Structured JSON logs — with consistent fields and levels — can be indexed, filtered, and alerted on. Toggle between formats to see the difference.
Pino setup with Express — drop-in production logging
AI tutor — Structured Loggingpino, log levels, PII, aggregation
02 — Correlation & Tracing
Follow a request through every service
A request hits your API, fans out to a queue, triggers three microservices, and one fails. Without correlation IDs and distributed tracing, you'll never know which one or why. Step through how tracing works.
Correlation ID middleware — add to every service
import { v4 as uuid } from 'uuid';
import { AsyncLocalStorage } from 'async_hooks';
export const requestContext = new AsyncLocalStorage<{ requestId: string; userId?: string }>();
// Middleware: set/propagate correlation ID
export function correlationMiddleware(req: Request, res: Response, next: NextFunction) {
const requestId = (req.headers['x-request-id'] as string) || uuid();
res.setHeader('x-request-id', requestId);
requestContext.run({ requestId }, () => {
req.requestId = requestId;
next();
});
}
// Use requestId in any log from any async context:
logger.info({ requestId: requestContext.getStore()?.requestId, orderId }, 'Order processed');
// Propagate to downstream HTTP calls:
axios.post(serviceUrl, payload, {
headers: { 'x-request-id': requestContext.getStore()?.requestId }
});
AI tutor — Correlation & Tracingrequest IDs, OpenTelemetry, spans
03 — Health Endpoints
The three endpoints every service needs
Kubernetes and load balancers need to know whether your service is alive, ready to serve traffic, and properly started. Three different endpoints answer three different questions. Click each to understand when each is called and what it should check.
← select an endpoint above
AI tutor — Health Endpoints/health, /readiness, /liveness, graceful shutdown
04 — Metrics
What to measure and why
The four golden signals — latency, traffic, errors, saturation — cover 90% of what goes wrong in production. Click each signal to see what to instrument and how to expose it.
← select a signal above
AI tutor — MetricsPrometheus, custom metrics, dashboards
05 — Azure App Insights
Wiring Node.js to Azure Monitor
Azure Application Insights gives you distributed traces, dependency tracking, and live metrics out of the box — if you wire it correctly. One import order mistake and you'll get no data.
Critical: App Insights must be initialised before any other import
// ✅ app.ts — App Insights MUST be the first import
import './telemetry'; // initialise before everything else
import express from 'express';
import { ordersRouter } from './routes/orders';
// ... rest of imports
// telemetry.ts
import appInsights from 'applicationinsights';
appInsights
.setup(process.env.APPINSIGHTS_CONNECTION_STRING)
.setAutoCollectRequests(true) // HTTP request tracking
.setAutoCollectDependencies(true) // DB, HTTP, queue calls
.setAutoCollectExceptions(true) // uncaught exceptions
.setAutoCollectPerformance(true) // CPU, memory, event loop
.setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
.start();
export const client = appInsights.defaultClient;
Custom events, metrics, and exceptions
import { client } from './telemetry';
// Track a custom business event with properties
client.trackEvent({
name: 'OrderPlaced',
properties: { orderId, userId, amount: total, region }
});
// Track a custom metric (e.g., queue depth)
client.trackMetric({ name: 'QueueDepth', value: queueDepth });
// Track an exception with context
try {
await processPayment(order);
} catch (err) {
client.trackException({
exception: err as Error,
properties: { orderId, userId, operation: 'payment' }
});
throw err;
}
// Track a dependency manually (e.g., a Redis call)
const startTime = Date.now();
await redisClient.get(key);
client.trackDependency({
target: 'redis', name: `GET ${key}`, data: key,
duration: Date.now() - startTime, resultCode: 200, success: true,
dependencyTypeName: 'Redis'
});
Most alert fatigue comes from alerting on symptoms instead of causes. Alert on user-visible impact first — error rate, latency, availability — then add infrastructure alerts as supporting signals.
Azure Monitor alert rule (Kusto / KQL)
// Alert: p99 latency > 2000ms in last 5 minutes
requests
| where timestamp > ago(5m)
| summarize p99 = percentile(duration, 99) by bin(timestamp, 1m)
| where p99 > 2000
// Alert: error rate > 1% (server errors only)
requests
| where timestamp > ago(5m)
| summarize
total = count(),
errors = countif(resultCode >= 500)
by bin(timestamp, 1m)
| extend errorRate = todouble(errors) / total * 100
| where errorRate > 1
// Alert: no requests in 5 minutes (potential outage)
requests
| where timestamp > ago(5m)
| summarize count()
| where count_ == 0
AI tutor — Alertingthresholds, KQL, on-call, runbooks