Why you need realistic logs in testing
Synthetic logs are essential for testing monitoring tools, parsers, alerts, and dashboards without depending on production. Critical use cases: validate that your alert system detects error patterns, test aggregators like ELK Stack or Splunk with realistic volume, train ML models for anomaly detection, and simulate troubleshooting scenarios in training environments.
A common mistake is generating overly simple logs ('Error occurred', 'Success') that don't represent real complexity. Production logs include stack traces, transaction IDs, latency metrics, HTTP status codes, service names, and specific context. A realistic log like '[ERROR] Database connection pool exhausted. Max connections: 50' lets you test more precise alert rules than a generic one.
Severity levels and when to use them
The standard log hierarchy follows an order: DEBUG < INFO < WARN < ERROR < FATAL. In production, DEBUG is usually disabled due to volume, but it's vital in development to follow execution flow. INFO records important normal operations (logins, completed transactions, deployments). WARN signals anomalous situations that don't break the application but require attention (certificate about to expire, high cache miss rate, slow queries).
ERROR indicates failures that prevent completing a specific operation but don't crash the entire application (payment declined, database timeout, file not found). FATAL are unrecoverable errors requiring restart. A classic logging mistake: logging everything as ERROR. If 90% of your logs are errors, you lose the ability to distinguish real problems. The practical rule: if it doesn't require immediate human action, it's probably WARN, not ERROR.
For testing, generate realistic distributions: 70% INFO, 20% DEBUG, 8% WARN, 2% ERROR. In healthy production, errors are the exception, not the norm.
Anatomy of a structured log
Modern logs follow structured formats (preferably JSON) to facilitate parsing and search. Essential components:
- Timestamp: ISO 8601 with timezone (2024-01-15T14:23:45.123Z)
- Level: message severity (ERROR, WARN, INFO, DEBUG)
- Message: human-readable event description
- Context: structured data (user_id, request_id, service_name)
- Source: module or function that generated the log (UserService.login)
- Metadata: stack traces, duration, error codes
JSON example: {"timestamp":"2024-01-15T14:23:45Z","level":"ERROR","message":"Payment failed","context":{"user_id":7821,"transaction_id":"txn_9f72","amount":49.99,"error_code":"card_declined"}}
Plain text logs hinder automated analysis. If your system still generates logs like '[Mon Jan 15 14:23:45] Error in line 247', consider migrating to structured JSON.
Practices for useful and maintainable logs
Golden rule: log enough context to reproduce the problem. '[ERROR] Payment failed' is useless without knowing which payment, from which user, with which method. Always include unique identifiers (request_id, user_id, transaction_id) that allow correlating events between services.
Avoid sensitive logs: never log passwords, complete tokens (truncate to first 8 chars), card numbers, or unmasked PII. In Europe, GDPR can fine you heavily for insecure logs. Use '[REDACTED]' or hash when you need to reference sensitive data.
Performance: excessive logging kills applications. DEBUG in loops processing 10,000 items generates gigabytes of logs and slows I/O. Use appropriate log levels and disable DEBUG in production. Consider log sampling: for high-volume operations, log 1 out of 100.
Rotation and retention: configure daily or size-based rotation (100MB). Retain critical logs (ERROR/WARN) for 90 days, INFO for 30, DEBUG for 7. Archive to S3/GCS for cheap historical analysis.