Loki Log Aggregation with Promtail: The Grafana Stack's Missing Piece

Prometheus gives you metrics. Grafana makes them visible. But when an alert fires and you want to see the actual log lines from the moment of the incident, you need Loki. Grafana Loki is a log aggregation system designed specifically to complement Prometheus — same label model, same query language family, and native integration in Grafana that lets you jump from a metric spike to the correlated log lines in one click. Promtail is the log shipper that reads from files and systemd journal and pushes to Loki. This post covers the setup from scratch to a production-ready logging pipeline.

Why Loki instead of Elasticsearch

Elasticsearch (or OpenSearch) is powerful but expensive to operate — it indexes every field in every log line, consuming significant CPU and disk IOPS. Loki takes a different approach: it only indexes the labels you define (like app, host, env), and stores the actual log content as compressed chunks. This makes Loki dramatically cheaper to run and simpler to operate, at the cost of slower full-text search on unindexed fields. For most infrastructure logging use cases — you know which service you're looking at and want to grep through recent logs — Loki's performance is more than adequate.

Deploying Loki + Promtail

# docker-compose.yml — Loki + Promtail + Grafana
services:
  loki:
    image: grafana/loki:3.3.2
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:3.3.2
    volumes:
      - /var/log:/var/log:ro          # System logs
      - /var/lib/docker/containers:/var/lib/docker/containers:ro  # Container logs
      - ./promtail-config.yml:/etc/promtail/config.yml
    command: -config.file=/etc/promtail/config.yml

volumes:
  loki-data:

# loki-config.yml
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 744h   # 31 days

Promtail configuration and label design

Labels are the most important design decision in a Loki deployment. Every unique combination of label values creates a separate stream — too many high-cardinality labels (like user IDs or request IDs) causes stream explosion and degrades performance. Stick to low-cardinality labels: app, host, env, level.

# promtail-config.yml
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml   # Tracks read position in each file

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Nginx access logs
  - job_name: nginx
    static_configs:
      - targets: [localhost]
        labels:
          app: nginx
          host: __HOSTNAME__
          env: production
          __path__: /var/log/nginx/access.log

    pipeline_stages:
      # Parse nginx JSON log format
      - json:
          expressions:
            status:     status
            method:     method
            path:       uri
            duration:   request_time
            upstream:   upstream_addr
      # Extract status code as a label for filtering
      - labels:
          status:
      # Only keep the level label for errors (status >= 500)
      - template:
          source: level
          template: '{{ if ge (int .status) 500 }}error{{ else }}info{{ end }}'
      - labels:
          level:

  # Docker container logs (all containers)
  - job_name: docker
    docker_sd_configs:
      - host: unix:///var/run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      - source_labels: [__meta_docker_container_label_com_docker_compose_service]
        target_label: app
      - source_labels: [__meta_docker_container_name]
        target_label: container
      - replacement: production
        target_label: env

LogQL: querying logs

LogQL is Loki's query language, modelled on PromQL. Log queries filter streams by label, then optionally filter or parse the log content:

# Show all logs from the nginx app in the last hour
{app="nginx"}

# Filter to error-level logs only
{app="nginx", level="error"}

# Full-text search within the stream — slower but necessary for unindexed fields
{app="nginx"} |= "500 Internal Server Error"

# Regex filter — find slow requests (>1 second)
{app="nginx"} | json | duration > 1.0

# Count error rate per minute — metric query from logs
sum(rate({app="nginx", level="error"}[5m])) by (app)

# Parse JSON logs inline and extract a field
{app="payments"} | json | line_format "{{.user_id}} {{.amount}} {{.status}}"

Correlating logs with Prometheus metrics in Grafana

The most powerful feature of the Loki + Prometheus combination is Grafana's Explore split view: left panel shows a Prometheus metric (request rate, error rate, latency), right panel shows Loki logs for the same time range and service. When the metric shows a spike, you can see the exact log lines that correspond to it without switching tools or correlating timestamps manually.

Configure this in Grafana by adding a derived field to your Loki data source that links trace IDs in log lines to your Tempo tracing backend — clicking a trace ID in a log line opens the full distributed trace for that request. The three pillars — metrics, logs, and traces — are now one integrated view.

Log-based alerting with Loki ruler

Loki's ruler component evaluates LogQL metric queries on a schedule and fires alerts into Alertmanager — the same Alertmanager that handles your Prometheus alerts. This means log-derived alerts flow through the same routing, silencing, and notification pipelines as metric alerts:

# loki-rules.yml — alert when error rate from a service is elevated
groups:
  - name: application
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({app="payments", level="error"}[5m])) by (app)
          /
          sum(rate({app="payments"}[5m])) by (app)
          > 0.05
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High error rate: {{ $labels.app }}"
          description: "{{ $value | printf \"%.1f\" }}% of requests are errors"

      - alert: ServiceDownNoLogs
        # No logs for 5 minutes = service may be down
        expr: |
          absent_over_time({app="payments"}[5m]) == 1
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No logs from payments service for 5+ minutes"

Retention and storage sizing

Loki's retention is configured per-stream using the retention_period setting in the limits config. For most infrastructure use cases, 31 days covers incident investigation windows while keeping storage manageable. A rough sizing guide: a server emitting typical nginx + application logs produces around 2–5GB of compressed Loki storage per month. For a 50-host environment at 31-day retention, budget 100–250GB of storage for Loki's data directory.

For cost-sensitive deployments, use Loki's object storage backend (S3, GCS, or MinIO for self-hosted) for chunks, keeping only the index on local disk. This moves the bulk of storage to cheaper object storage while keeping query performance acceptable for recent logs.

Loki is part of every 47Network observability stack. The standard deployment is Prometheus + Alertmanager + Grafana + Loki + Promtail + (optionally) Grafana Tempo for traces. This stack runs on a single modest VM for most clients — 4 vCPU, 8GB RAM handles 50+ hosts comfortably at 31-day log retention. The 47Sentry product includes Loki as the log backend for all monitored services.

← Back to Blog Prometheus Alerting →

Loki log aggregation with Promtail: the Grafana stack's missing piece.

Why Loki instead of Elasticsearch

Deploying Loki + Promtail

Promtail configuration and label design

LogQL: querying logs

Correlating logs with Prometheus metrics in Grafana

Log-based alerting with Loki ruler

Retention and storage sizing