Security

TLS certificate automation with certbot and ACME.

February 25, 2026 ยท11 min read ยท47Network Engineering

Expired TLS certificates are one of the most common causes of production outages that have nothing to do with code. They're entirely preventable with automation, yet teams keep managing them manually. This post covers the full certbot/ACME setup: HTTP-01 for simple cases, DNS-01 for wildcard certificates and internal services, Nginx integration, automatic renewal, and the Prometheus alerting that catches any cert approaching expiry before it becomes your 3am incident.

HTTP-01 vs DNS-01: which challenge to use

The ACME protocol proves you control a domain before issuing a certificate. There are two practical challenge types:

  • HTTP-01: Let's Encrypt places a token at http://yourdomain.com/.well-known/acme-challenge/TOKEN and fetches it to verify ownership. Simple, works with any DNS setup, but requires port 80 to be publicly accessible. Doesn't support wildcard certificates (*.yourdomain.com).
  • DNS-01: Let's Encrypt asks you to create a _acme-challenge TXT record in your DNS zone. Works for internal/private servers (no public port required), supports wildcards, and works even when the server is behind a firewall. Requires API access to your DNS provider.

HTTP-01 setup with Nginx

# Install certbot and the Nginx plugin
apt install certbot python3-certbot-nginx

# Obtain a certificate โ€” certbot modifies your Nginx config automatically
certbot --nginx -d example.com -d www.example.com \
  --email ops@example.com \
  --agree-tos \
  --no-eff-email \
  --redirect   # Automatically redirect HTTP to HTTPS

# What certbot adds to your Nginx config:
# ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
# ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
# include /etc/letsencrypt/options-ssl-nginx.conf;
# ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;

DNS-01 with wildcard certificates

# Using Cloudflare DNS plugin (most common setup)
pip install certbot-dns-cloudflare

# Create Cloudflare API credentials file
cat > /etc/letsencrypt/cloudflare.ini << 'EOF'
# Cloudflare API token with Zone:DNS:Edit permission
dns_cloudflare_api_token = your-cloudflare-api-token
EOF
chmod 600 /etc/letsencrypt/cloudflare.ini

# Obtain wildcard certificate
certbot certonly \
  --dns-cloudflare \
  --dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
  --dns-cloudflare-propagation-seconds 30 \
  -d "*.example.com" \
  -d "example.com" \
  --email ops@example.com \
  --agree-tos \
  --no-eff-email

DNS-01 for internal services: wildcard certificates are essential when you have many internal subdomains (grafana.internal, vault.internal, matrix.internal) that shouldn't be individually listed in a public certificate. Obtain the wildcard cert on a management host with DNS API access, then distribute the cert files to the internal services via Ansible or Vault PKI.

Automatic renewal with systemd timers

Certbot installs a systemd timer by default on Debian/Ubuntu. Verify it's active and test it:

# Check that the timer is active
systemctl status certbot.timer
# โ— certbot.timer - Run certbot twice daily
#      Active: active (waiting)
#     Trigger: Wed 2026-02-25 12:00:00 EET; 3h 24min left

# Test renewal without actually renewing (dry run)
certbot renew --dry-run

# If you need a reload hook for Nginx after renewal:
cat > /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh << 'EOF'
#!/bin/bash
nginx -t && systemctl reload nginx
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh

Nginx TLS hardening after certbot

Certbot's default TLS config is reasonable but not optimal. Override the settings in your server block for production hardening:

server {
    listen 443 ssl http2;
    server_name example.com www.example.com;

    # Certbot-managed cert paths
    ssl_certificate     /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;

    # Modern TLS โ€” TLS 1.2 minimum, TLS 1.3 preferred
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305;
    ssl_prefer_server_ciphers off;   # Let the client pick with modern ciphers

    # OCSP Stapling โ€” reduces TLS handshake latency and improves privacy
    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /etc/letsencrypt/live/example.com/chain.pem;
    resolver 1.1.1.1 8.8.8.8 valid=300s;
    resolver_timeout 5s;

    # HSTS โ€” tell browsers to always use HTTPS for this domain
    # Start with max-age=300 (5 minutes), increase to 31536000 (1 year) after testing
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Session caching for TLS resumption
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;   # Disable tickets for forward secrecy
}

Monitoring certificate expiry with Prometheus

The ssl_certificate_expiry metric from the Prometheus blackbox exporter gives you certificate expiry in seconds โ€” turn it into an alert before it turns into an outage:

# prometheus/alerts/tls.yml
groups:
  - name: tls
    rules:
      - alert: CertificateExpiringSoon
        expr: |
          (probe_ssl_earliest_cert_expiry - time()) / 86400 < 30
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Certificate expiring soon: {{ $labels.instance }}"
          description: "Certificate for {{ $labels.instance }} expires in {{ $value | printf \"%.0f\" }} days."

      - alert: CertificateExpiringCritical
        expr: |
          (probe_ssl_earliest_cert_expiry - time()) / 86400 < 7
        for: 1h
        labels:
          severity: critical
        annotations:
          summary: "Certificate expiring in {{ $value | printf \"%.0f\" }} days: {{ $labels.instance }}"

Multi-domain certificates and SANs

A single certificate can cover multiple domains via Subject Alternative Names (SANs). This is more efficient than one cert per domain and simplifies renewal. Add all your domains and subdomains in a single certbot command:

# Issue one certificate covering the apex domain and www, plus an API subdomain
certbot certonly --dns-cloudflare \
  --dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
  -d example.com \
  -d www.example.com \
  -d api.example.com \
  -d status.example.com

# Or add a domain to an existing cert without reissuing everything
certbot certonly --expand \
  --dns-cloudflare \
  --dns-cloudflare-credentials /etc/letsencrypt/cloudflare.ini \
  -d example.com -d www.example.com -d api.example.com \
  -d newsubdomain.example.com   # New domain being added

Distributing wildcard certificates to internal services via Ansible

When a wildcard cert is obtained on a management host, it needs to be distributed to the services that use it. An Ansible role handles this cleanly โ€” fetch the cert files from the management host and deploy them to each target, reloading services automatically via handlers:

# roles/tls-distribute/tasks/main.yml
---
- name: Ensure TLS directory exists
  ansible.builtin.file:
    path: /etc/ssl/internal
    state: directory
    mode: '0755'

- name: Copy certificate to host
  ansible.builtin.copy:
    src: "/etc/letsencrypt/live/{{ tls_domain }}/fullchain.pem"
    dest: "/etc/ssl/internal/fullchain.pem"
    mode: '0644'
  delegate_to: cert-manager.internal   # Run this task on the cert host
  notify: Reload nginx

- name: Copy private key to host
  ansible.builtin.copy:
    src: "/etc/letsencrypt/live/{{ tls_domain }}/privkey.pem"
    dest: "/etc/ssl/internal/privkey.pem"
    owner: root
    group: ssl-cert
    mode: '0640'
  delegate_to: cert-manager.internal
  notify: Reload nginx

Every 47Network deployment gets certificate expiry alerting on day one. The warning fires at 30 days โ€” enough time to investigate and fix any renewal automation failure before it becomes urgent. With certbot's automatic renewal, the alert should never fire in normal operation. When it does, it means your DNS credentials rotated, your certbot timer was disabled, or your DNS provider's API is down. The alert gives you a month to find out which.


โ† Back to Blog Nginx Guide โ†’