Telemetry Standards
Consistent telemetry practices ensure effective monitoring, troubleshooting, and analysis of application performance and behavior. This guide outlines the standards for implementing telemetry across all applications.
OpenTelemetry Requirements
All applications must use OpenTelemetry as the standard telemetry framework. OpenTelemetry provides a comprehensive set of tools, APIs, and SDKs for collecting, processing, and exporting telemetry data.
Key Requirements
- Use OpenTelemetry SDK for all instrumentation needs
- Follow the W3C Trace Context specification for context propagation
- Configure exporters to send telemetry data to your organization's telemetry backend
- Implement automatic instrumentation where available
- Add custom instrumentation for business-critical operations
Signal Types
Implement all three signal types provided by OpenTelemetry:
- Traces: To track request flows through distributed systems
- Metrics: To measure system performance and behavior
- Logs: To capture detailed information about application events
OpenTelemetry Collector
Deploy the OpenTelemetry Collector as an agent or gateway to:
- Collect telemetry data from multiple services
- Process and transform data (filtering, batching, etc.)
- Export data to multiple destinations simultaneously
Logging Best Practices
Naming Conventions
When implementing logging, adhere to these naming conventions:
- Use PascalCase for Named Placeholders in log messages
- Apply consistent terminology across all log entries
- Keep property names descriptive and concise
Value Casing Standards
For telemetry values:
- Use lowercase invariant for all property values unless casing is required for uniqueness
- Maintain original casing only when it differentiates identity (e.g., case-sensitive identifiers)
- Normalize all status codes, error types, and general string values to lowercase
Standard Identifiers
Always include the following standard identifiers in your telemetry for tracking related events across distributed systems:
Identifier | Property Name | Description |
---|---|---|
Correlation ID | coaxle.correlation_id | Tracks a request across multiple services |
Party ID | coaxle.party_id | Identifies the user or entity associated with the event |
Tenant ID | coaxle.tenant_id | Identifies the tenant context for multi-tenant applications |
Implementation Examples
Named Placeholders
Always use named placeholders instead of positional placeholders. Named placeholders improve readability and maintainability of logging code.
- C#
- Java
- Python
- JavaScript
// Correct: Using PascalCase named placeholders
logger.LogInformation("User {UserId} accessed resource {ResourceId} with permission level {PermissionLevel}",
userId, resourceId, permissionLevel);
// Incorrect: Using positional placeholders
logger.LogInformation("User {0} accessed resource {1} with permission level {2}",
userId, resourceId, permissionLevel);
// Incorrect: Using non-PascalCase named placeholders
logger.LogInformation("User {userId} accessed resource {resourceId} with permission level {permissionLevel}",
userId, resourceId, permissionLevel);
// Using SLF4J with named placeholders
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
class LoggingExample {
private static final Logger logger = LoggerFactory.getLogger(LoggingExample.class);
public void processRequest(String userId, String resourceId, String permissionLevel) {
MDC.put("coaxle.correlation_id", correlationId);
MDC.put("coaxle.tenant_id", tenantId);
logger.info("User {} accessed resource {} with permission level {}",
userId, resourceId, permissionLevel.toLowerCase()); // Ensuring lowercase for enum values
}
}
import logging
import structlog
# Configure structured logging
logger = structlog.get_logger()
def process_request(user_id, resource_id, permission_level, correlation_id, tenant_id):
logger = logger.bind(
coaxle_correlation_id=correlation_id,
coaxle_tenant_id=tenant_id
)
logger.info(
"resource_access",
user_id=user_id,
resource_id=resource_id,
permission_level=permission_level.lower() // Ensuring lowercase for values
)
const logger = require('pino')();
function processRequest(userId, resourceId, permissionLevel, correlationId, tenantId) {
logger.info({
msg: `User ${userId} accessed resource ${resourceId} with permission level ${permissionLevel}`,
'coaxle.correlation_id': correlationId,
'coaxle.tenant_id': tenantId,
userId,
resourceId,
permissionLevel: typeof permissionLevel === 'string' ? permissionLevel.toLowerCase() : permissionLevel // Ensuring lowercase for string values
});
}
Using Standard Identifiers
Always include standard identifiers in your telemetry data:
- C#
- Java
- Python
- JavaScript
using Microsoft.Extensions.Logging;
public class OrderService
{
private readonly ILogger<OrderService> _logger;
public OrderService(ILogger<OrderService> logger)
{
_logger = logger;
}
public void ProcessOrder(Order order, string correlationId, string partyId, string tenantId)
{
// Add identifiers to logging scope
using (_logger.BeginScope(new Dictionary<string, object>
{
["coaxle.correlation_id"] = correlationId,
["coaxle.party_id"] = partyId,
["coaxle.tenant_id"] = tenantId
}))
{
_logger.LogInformation("Processing order {OrderId} for customer {CustomerId}",
order.Id, order.CustomerId);
// Processing logic here
// Using lowercase for status values
var status = "completed";
_logger.LogInformation("Order {OrderId} processed successfully with {ItemCount} items and status {Status}",
order.Id, order.Items.Count, status);
}
}
}
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
public class OrderService {
private static final Logger logger = LoggerFactory.getLogger(OrderService.class);
public void processOrder(Order order, String correlationId, String partyId, String tenantId) {
try {
// Add identifiers to MDC context
MDC.put("coaxle.correlation_id", correlationId);
MDC.put("coaxle.party_id", partyId);
MDC.put("coaxle.tenant_id", tenantId);
logger.info("Processing order {} for customer {}", order.getId(), order.getCustomerId());
// Processing logic here
// Using lowercase for status values
String status = "completed";
logger.info("Order {} processed successfully with {} items and status {}",
order.getId(), order.getItems().size(), status);
} finally {
// Clear MDC context
MDC.clear();
}
}
}
import logging
import structlog
logger = structlog.get_logger()
def process_order(order, correlation_id, party_id, tenant_id):
// Add identifiers to logger context
log = logger.bind(
**{
"coaxle.correlation_id": correlation_id,
"coaxle.party_id": party_id,
"coaxle.tenant_id": tenant_id
}
)
log.info("processing_order", order_id=order.id, customer_id=order.customer_id)
// Processing logic here
// Using lowercase for status values
status = "completed"
log.info("order_processed", order_id=order.id, item_count=len(order.items), status=status)
const logger = require('pino')();
function processOrder(order, correlationId, partyId, tenantId) {
const contextLogger = logger.child({
'coaxle.correlation_id': correlationId,
'coaxle.party_id': partyId,
'coaxle.tenant_id': tenantId
});
contextLogger.info({
msg: `Processing order ${order.id} for customer ${order.customerId}`,
orderId: order.id,
customerId: order.customerId
});
// Processing logic here
// Using lowercase for status values
const status = "completed";
contextLogger.info({
msg: `Order ${order.id} processed successfully with ${order.items.length} items`,
orderId: order.id,
itemCount: order.items.length,
status: status
});
}
OpenTelemetry Implementation
- C#
- Java
- Python
- JavaScript
// Initialize OpenTelemetry in your application
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
using OpenTelemetry.Logs;
// In Startup.cs or Program.cs
public void ConfigureServices(IServiceCollection services)
{
// Configure OpenTelemetry with the required resource attributes
services.AddOpenTelemetry()
.ConfigureResource(builder => builder
.AddService("my-service-name", serviceVersion: "1.0.0")
.AddAttributes(new Dictionary<string, object>
{
["deployment.environment"] = "production" // Lowercase invariant value
}))
.WithTracing(builder => builder
// Add instrumentation for ASP.NET Core, HttpClient, etc.
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSqlClientInstrumentation()
// Add custom span processor for correlation IDs
.AddProcessor(new CorrelationIdProcessor("coaxle.correlation_id", "coaxle.party_id", "coaxle.tenant_id"))
.AddOtlpExporter()) // Send to OpenTelemetry collector
.WithMetrics(builder => builder
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddOtlpExporter());
// Configure OpenTelemetry logging
services.AddLogging(builder =>
{
builder.AddOpenTelemetry(options =>
{
options.AddOtlpExporter();
options.IncludeFormattedMessage = true;
options.IncludeScopes = true;
});
});
}
// Example of creating a custom span
public async Task ProcessOrderWithTracing(Order order, string correlationId, string partyId, string tenantId)
{
var activitySource = new ActivitySource("MyCompany.OrderProcessing");
// Start a new span
using var activity = activitySource.StartActivity("ProcessOrder");
// Add standard identifiers as attributes
activity?.SetTag("coaxle.correlation_id", correlationId);
activity?.SetTag("coaxle.party_id", partyId);
activity?.SetTag("coaxle.tenant_id", tenantId);
// Add business context - unique identifiers keep original casing if needed
activity?.SetTag("order.id", order.Id);
activity?.SetTag("order.customer_id", order.CustomerId);
// Add a status attribute with lowercase invariant value
activity?.SetTag("order.status", "processing");
try
{
// Process the order
await ProcessOrderInternal(order);
// Record success
activity?.SetStatus(ActivityStatusCode.Ok);
activity?.SetTag("order.status", "completed"); // Lowercase invariant value
}
catch (Exception ex)
{
// Record error
activity?.SetStatus(ActivityStatusCode.Error);
activity?.RecordException(ex);
activity?.SetTag("order.status", "failed"); // Lowercase invariant value
throw;
}
}
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanKind;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.Context;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.resources.Resource;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
import io.opentelemetry.exporter.otlp.trace.OtlpGrpcSpanExporter;
import io.opentelemetry.semconv.resource.attributes.ResourceAttributes;
// Initialize OpenTelemetry
public class TelemetryConfig {
public static void initOpenTelemetry() {
Resource resource = Resource.getDefault()
.merge(Resource.builder()
.put(ResourceAttributes.SERVICE_NAME, "my-service-name")
.put(ResourceAttributes.SERVICE_VERSION, "1.0.0")
.put("deployment.environment", "production") // Lowercase invariant value
.build());
SdkTracerProvider tracerProvider = SdkTracerProvider.builder()
.setResource(resource)
.addSpanProcessor(BatchSpanProcessor.builder(
OtlpGrpcSpanExporter.builder().build())
.build())
.build();
OpenTelemetrySdk openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(tracerProvider)
.buildAndRegisterGlobal();
Runtime.getRuntime().addShutdownHook(new Thread(tracerProvider::close));
}
}
// Example service using OpenTelemetry
public class OrderService {
private final Tracer tracer = GlobalOpenTelemetry.getTracer("com.mycompany.OrderService");
public void processOrderWithTracing(Order order, String correlationId, String partyId, String tenantId) {
Span span = tracer.spanBuilder("ProcessOrder")
.setSpanKind(SpanKind.INTERNAL)
.startSpan();
try (io.opentelemetry.context.Scope scope = span.makeCurrent()) {
// Add standard identifiers
span.setAttribute("coaxle.correlation_id", correlationId);
span.setAttribute("coaxle.party_id", partyId);
span.setAttribute("coaxle.tenant_id", tenantId);
// Add business context - unique identifiers keep original casing if needed
span.setAttribute("order.id", order.getId());
span.setAttribute("order.customer_id", order.getCustomerId());
// Add a status attribute with lowercase invariant value
span.setAttribute("order.status", "processing");
// Process the order
processOrderInternal(order);
span.setStatus(StatusCode.OK);
span.setAttribute("order.status", "completed"); // Lowercase invariant value
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
span.setAttribute("order.status", "failed"); // Lowercase invariant value
throw e;
} finally {
span.end();
}
}
private void processOrderInternal(Order order) {
// Order processing logic
}
}
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.semconv.resource import ResourceAttributes
// Initialize OpenTelemetry
def init_telemetry():
resource = Resource.create({
ResourceAttributes.SERVICE_NAME: "my-service-name",
ResourceAttributes.SERVICE_VERSION: "1.0.0",
"deployment.environment": "production" // Lowercase invariant value
})
provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
return trace.get_tracer(__name__)
// Example of using OpenTelemetry in service
class OrderService:
def __init__(self):
self.tracer = trace.get_tracer(__name__)
def process_order_with_tracing(self, order, correlation_id, party_id, tenant_id):
with self.tracer.start_as_current_span("process_order") as span:
// Add standard identifiers
span.set_attribute("coaxle.correlation_id", correlation_id)
span.set_attribute("coaxle.party_id", party_id)
span.set_attribute("coaxle.tenant_id", tenant_id)
// Add business context - unique identifiers keep original casing if needed
span.set_attribute("order.id", order.id)
span.set_attribute("order.customer_id", order.customer_id)
// Add a status attribute with lowercase invariant value
span.set_attribute("order.status", "processing")
try:
// Process the order
self._process_order_internal(order)
span.set_status(trace.StatusCode.OK)
span.set_attribute("order.status", "completed") // Lowercase invariant value
except Exception as e:
span.set_status(trace.StatusCode.ERROR, str(e))
span.record_exception(e)
span.set_attribute("order.status", "failed") // Lowercase invariant value
raise
def _process_order_internal(self, order):
// Order processing logic
pass
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { BatchSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const opentelemetry = require('@opentelemetry/api');
// Initialize OpenTelemetry
function initTelemetry() {
const resource = Resource.default().merge(
new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-service-name',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
'deployment.environment': 'production' // Lowercase invariant value
})
);
const provider = new NodeTracerProvider({ resource });
const exporter = new OTLPTraceExporter();
const processor = new BatchSpanProcessor(exporter);
provider.addSpanProcessor(processor);
provider.register();
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
new ExpressInstrumentation()
]
});
return opentelemetry.trace.getTracer('my-service');
}
// Example of using OpenTelemetry in service
class OrderService {
constructor() {
this.tracer = initTelemetry();
}
processOrderWithTracing(order, correlationId, partyId, tenantId) {
const span = this.tracer.startSpan('processOrder');
// Add standard identifiers
span.setAttribute('coaxle.correlation_id', correlationId);
span.setAttribute('coaxle.party_id', partyId);
span.setAttribute('coaxle.tenant_id', tenantId);
// Add business context - unique identifiers keep original casing if needed
span.setAttribute('order.id', order.id);
span.setAttribute('order.customer_id', order.customerId);
// Add a status attribute with lowercase invariant value
span.setAttribute('order.status', 'processing');
try {
// Process the order using context
return opentelemetry.context.with(
opentelemetry.trace.setSpan(opentelemetry.context.active(), span),
() => this.processOrderInternal(order)
);
} catch (error) {
span.setStatus({ code: opentelemetry.SpanStatusCode.ERROR });
span.recordException(error);
span.setAttribute('order.status', 'failed'); // Lowercase invariant value
throw error;
} finally {
span.end();
}
}
processOrderInternal(order) {
// Order processing logic
}
}
module.exports = { OrderService };
Configuring OpenTelemetry Collector
Deploy the OpenTelemetry Collector using the following configuration template:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
send_batch_size: 1000
timeout: 10s
attributes:
actions:
- key: deployment.environment
action: upsert
value: ${DEPLOYMENT_ENVIRONMENT}
resource:
attributes:
- key: service.namespace
value: "coaxle"
action: upsert
exporters:
otlp:
endpoint: "your-telemetry-backend:4317"
tls:
insecure: false
ca_file: /etc/ssl/certs/ca-certificates.crt
logging:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, attributes, resource]
exporters: [otlp, logging]
metrics:
receivers: [otlp]
processors: [batch, attributes, resource]
exporters: [otlp, logging]
logs:
receivers: [otlp]
processors: [batch, attributes, resource]
exporters: [otlp, logging]
Log Levels
Use appropriate log levels to categorize the severity and importance of log messages:
Level | When to Use |
---|---|
Trace | Extremely detailed information, useful only for pinpointing issues in a specific component |
Debug | Detailed information useful for debugging purposes |
Information | General information highlighting the progress of the application |
Warning | Indicates potential issues or unexpected behavior that doesn't cause application failure |
Error | Error conditions that affect a specific operation but not the overall application |
Critical | Critical errors that require immediate attention and may cause application failure |
Performance Considerations
- Avoid excessive logging in high-throughput paths
- Use asynchronous logging where appropriate
- Consider sampling for high-volume telemetry
- Implement log rotation and retention policies
- Monitor the performance impact of telemetry implementations
Sensitive Information
Never log sensitive information such as:
- Passwords or authentication tokens
- Personal identifiable information (PII)
- Financial data
- Health information
- Any data protected by privacy regulations
Instead, log references or masked versions of this information when necessary.
OpenTelemetry Semantic Conventions
Adhere to OpenTelemetry semantic conventions for consistent naming across services:
- Use the OpenTelemetry Semantic Conventions for attribute naming
- Follow standard naming for error attributes
- Use standard semantic attributes for HTTP, database, and messaging operations
Domain | Attribute Prefix | Examples |
---|---|---|
HTTP | http. | http.method , http.status_code |
Database | db. | db.system , db.statement |
Messaging | messaging. | messaging.system , messaging.destination |
Exception | exception. | exception.type , exception.message |
Custom Business | coaxle. | coaxle.correlation_id , coaxle.tenant_id |
Validation and Testing
- Verify telemetry data quality with automated tests
- Implement telemetry validation as part of CI/CD pipelines
- Set up monitoring alerts for missing or malformed telemetry data
- Regularly review and optimize telemetry implementations