k6 Load Testing: From Zero to Production Benchmarks

k6 is the load testing tool that fits naturally into a developer workflow. Scripts are JavaScript (TypeScript with a small shim), tests run from the CLI or CI, and the output integrates cleanly with Prometheus and Grafana. It's not a no-code tool — you write scripts — but the scripting is minimal enough that QA engineers can own the test suite without needing a performance specialist. This post covers the patterns we use in production load tests: realistic scenarios, threshold-based pass/fail, authenticated flows, and streaming results to a dashboard.

Your first meaningful test

// load-tests/checkout-flow.js
import http from 'k6/http';
import { check, sleep } from 'k6';

// Test configuration — controls virtual users and duration
export const options = {
  stages: [
    { duration: '2m', target: 50  },   // Ramp up to 50 VUs over 2 minutes
    { duration: '5m', target: 50  },   // Hold at 50 VUs for 5 minutes
    { duration: '2m', target: 200 },   // Spike to 200 VUs
    { duration: '2m', target: 0   },   // Ramp down
  ],
  // Thresholds: test FAILS if these are breached
  thresholds: {
    'http_req_duration':          ['p(95)<500'],   // 95th percentile under 500ms
    'http_req_duration{page:checkout}': ['p(99)<1000'], // Checkout specifically under 1s at p99
    'http_req_failed':            ['rate<0.01'],   // Less than 1% error rate
    'checks':                     ['rate>0.99'],   // More than 99% checks passing
  },
};

export default function () {
  // Simulate a user browsing and checking out
  const productPage = http.get('https://staging.example.com/products/widget-pro', {
    tags: { page: 'product' },
  });
  check(productPage, { 'product page: 200': (r) => r.status === 200 });

  sleep(1);  // Think time between pages — users don't click instantly

  const cartResponse = http.post('https://staging.example.com/api/cart', JSON.stringify({
    productId: 'widget-pro',
    quantity: 1,
  }), {
    headers: { 'Content-Type': 'application/json' },
    tags: { page: 'cart' },
  });
  check(cartResponse, {
    'add to cart: 200':  (r) => r.status === 200,
    'cart has item':     (r) => JSON.parse(r.body).items.length > 0,
  });

  sleep(2);

  const checkout = http.get('https://staging.example.com/checkout', {
    tags: { page: 'checkout' },
  });
  check(checkout, { 'checkout page: 200': (r) => r.status === 200 });

  sleep(Math.random() * 3 + 1);  // Random think time: 1–4 seconds
}

Authenticated flows with shared tokens

Most interesting load tests require authentication. The pattern: authenticate once in setup(), pass the token to all virtual users via the return value:

import http from 'k6/http';
import { check } from 'k6';

// setup() runs once before all VUs start — perfect for auth
export function setup() {
  const loginRes = http.post('https://staging.example.com/api/auth/login', JSON.stringify({
    email:    __ENV.TEST_EMAIL,     // Pass credentials via environment: k6 run -e TEST_EMAIL=...
    password: __ENV.TEST_PASSWORD,
  }), { headers: { 'Content-Type': 'application/json' } });

  check(loginRes, { 'login: 200': (r) => r.status === 200 });

  const { accessToken } = JSON.parse(loginRes.body);
  return { accessToken };  // Returned value is passed to default() and teardown()
}

export default function (data) {
  const headers = {
    'Content-Type':  'application/json',
    'Authorization': `Bearer ${data.accessToken}`,
  };

  const res = http.get('https://staging.example.com/api/orders', { headers });
  check(res, {
    'orders: 200':     (r) => r.status === 200,
    'orders not empty': (r) => JSON.parse(r.body).length > 0,
  });
}

Scenarios: multiple user types in one test

Real traffic is a mix of user types. k6 scenarios let you model this — browser visitors, API consumers, and admin users all with different VU counts and timing:

export const options = {
  scenarios: {
    browse_visitors: {
      executor: 'ramping-vus',
      stages: [
        { duration: '3m', target: 200 },
        { duration: '5m', target: 200 },
        { duration: '2m', target: 0 },
      ],
      exec: 'browseScenario',
    },
    api_consumers: {
      executor: 'constant-arrival-rate',  // Fixed requests per second regardless of VU response time
      rate: 50,          // 50 iterations per second
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 20,
      maxVUs: 100,
      exec: 'apiScenario',
    },
  },
  thresholds: {
    'http_req_duration{scenario:browse_visitors}': ['p(95)<800'],
    'http_req_duration{scenario:api_consumers}':   ['p(99)<200'],
  },
};

export function browseScenario() { /* ... */ }
export function apiScenario()    { /* ... */ }

Streaming results to Grafana

k6's built-in output is useful but hard to share. Stream to InfluxDB and visualise in Grafana for real-time load test dashboards:

# Run with InfluxDB output
k6 run \
  --out influxdb=http://influxdb:8086/k6 \
  load-tests/checkout-flow.js

# Or with Prometheus remote-write (k6 v0.43+)
k6 run \
  --out experimental-prometheus-rw \
  load-tests/checkout-flow.js

The k6 Grafana dashboard (ID: 2587) shows request rate, response time percentiles, VU count, and error rate in real time. During a load test you want to watch p95 response time and error rate — if either starts climbing while you're still ramping, you've found your bottleneck before you hit your target load.

The test types you need

Baseline test: 5–10 VUs for 5 minutes. What do normal response times look like? Use this as your comparison baseline.
Load test: ramp to expected peak traffic, hold for 20 minutes. Does the system perform within SLA under realistic load?
Stress test: ramp beyond expected peak until something breaks. Where's the breaking point? Does it fail gracefully?
Spike test: sudden jump to 10× normal load for 2 minutes, then back down. Can the system absorb flash traffic (sale events, viral content) without cascading failure?
Soak test: moderate load for 8–24 hours. Memory leaks, connection pool exhaustion, and disk fill-up only show up here.

In 47Network Studio engagements we run baseline, load, and spike tests as part of every QA engagement — typically against a staging environment seeded with production-scale data. The gov-portal engagement used k6 spike tests to validate that the portal could handle 10× traffic from TV/radio announcements, which was a known real-world pattern.

Running k6 in GitHub Actions

k6 integrates into CI as a single step. The key decision is whether to fail the pipeline on threshold breach — for pre-production environments, failing the build on a missed SLA is exactly what you want:

# .github/workflows/load-test.yml
name: Load tests

on:
  workflow_dispatch:         # Manual trigger for on-demand load tests
  schedule:
    - cron: '0 2 * * 1'     # Weekly on Monday at 2am — soak tests run overnight

jobs:
  k6-load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run k6 load test
        uses: grafana/k6-action@v0.3.1
        with:
          filename: load-tests/checkout-flow.js
          flags: --out influxdb=http://${{ secrets.INFLUXDB_HOST }}:8086/k6
        env:
          TEST_EMAIL:    ${{ secrets.TEST_EMAIL }}
          TEST_PASSWORD: ${{ secrets.TEST_PASSWORD }}
          BASE_URL:      https://staging.example.com
        # k6 exits with code 99 if thresholds are breached — GitHub Actions treats non-zero as failure
        # This means a missed SLA threshold blocks the deployment pipeline

      - name: Upload results
        if: always()   # Upload even on failure so you can debug
        uses: actions/upload-artifact@v4
        with:
          name: k6-results
          path: results/

Interpreting results: what matters

k6 outputs a summary after every run. The metrics that matter most in order of importance:

http_req_failed rate: if this goes above 1%, stop the test and investigate. Error rate climbing under load means you're past capacity — no point collecting latency data from a broken system.
http_req_duration p95 and p99: median (p50) looks fine right up until your slowest users are waiting 10 seconds. Design your thresholds around p95/p99, not mean response time.
http_reqs rate (RPS): cross-reference with your VU count and think time to verify you're generating the load you intended. A 50-VU test with 3-second think time generates roughly 16 RPS — know your numbers before you interpret the results.
vus_max: if this is lower than your target VU count, k6 couldn't spin up enough workers — increase preAllocatedVUs in your scenario config.

← Back to Blog Playwright Guide →

k6 load testing: from zero to production benchmarks.

Your first meaningful test

Authenticated flows with shared tokens

Scenarios: multiple user types in one test

Streaming results to Grafana

The test types you need

Running k6 in GitHub Actions

Interpreting results: what matters