How to design audit logs that help investigations

I've been on both sides of incident investigations: the engineer who built the logging, and the analyst who needed to piece together what happened. Here's what I've learned about building audit logs that actually help when things go wrong.

The Investigation Mindset

When an incident occurs, investigators ask questions like:

Who accessed what, and when?
What changed, and what were the previous values?
Was there unusual behavior leading up to this?
What was the sequence of events?

Your audit logs need to answer these questions definitively. If they can't, you're flying blind.

Essential Fields

Every audit event should include:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2024-01-15T14:32:01.234Z",
  "eventType": "TICKET_UPDATE",
  "actor": {
    "userId": "user-123",
    "email": "agent@company.com",
    "roles": ["Agent"],
    "sessionId": "sess-456",
    "ip": "192.168.1.100",
    "userAgent": "Mozilla/5.0..."
  },
  "resource": {
    "type": "ticket",
    "id": "ticket-789"
  },
  "action": "UPDATE",
  "outcome": "SUCCESS",
  "metadata": {
    "changedFields": ["status", "priority"],
    "previousValues": {
      "status": "open",
      "priority": "low"
    },
    "newValues": {
      "status": "in_progress",
      "priority": "high"
    }
  }
}

Why Each Field Matters

id: Unique identifier for correlation across systems
timestamp: ISO 8601 with milliseconds, always UTC
actor: Everything about who performed the action. Include session and network info for forensics.
resource: What was acted upon, with enough info to find it
action + outcome: What happened and whether it succeeded
metadata: Context specific to this event type. For changes, include before/after values.

What to Log

Log security-relevant events, not everything. Focus on:

Authentication events: Login success, failure, logout, token refresh
Authorization decisions: Especially denials—these are gold for detecting attacks
Data access: Who viewed sensitive records
Data modifications: Creates, updates, deletes with before/after state
Configuration changes: Role assignments, permission changes, settings modifications
Administrative actions: User management, system configuration

Anti-Patterns

1. Logging Too Little

"We log logins" is not enough. If you can't answer "did user X ever view document Y?", your logging is insufficient.

2. Logging Too Much

Logging every function call creates noise that buries signal. Nobody wants to grep through millions of "user moved mouse" events.

3. Missing Context

"Status changed" is useless. "Status changed from 'open' to 'closed' by user@email at 14:32:01 from IP 192.168.1.100" is actionable.

4. Mutable Logs

If an attacker can delete logs, they can cover their tracks. Audit logs should be append-only or stored in a separate, protected system.

Implementation Tips

Use structured logging (JSON) so logs are queryable
Log at the service/API layer, not the database layer
Include request context (trace ID, session ID) for correlation
Ship logs to a centralized SIEM (Splunk, Sentinel, etc.)
Set retention policies based on compliance requirements
Alert on suspicious patterns, don't just store and forget

Takeaway

Build your audit logging as if you'll need to explain exactly what happened to regulators, lawyers, and executives—because someday you might.