k6 is the load testing tool that fits naturally into a developer workflow. Scripts are JavaScript (TypeScript with a small shim), tests run from the CLI or CI, and the output integrates cleanly with Prometheus and Grafana. It's not a no-code tool โ you write scripts โ but the scripting is minimal enough that QA engineers can own the test suite without needing a performance specialist. This post covers the patterns we use in production load tests: realistic scenarios, threshold-based pass/fail, authenticated flows, and streaming results to a dashboard.
Your first meaningful test
// load-tests/checkout-flow.js
import http from 'k6/http';
import { check, sleep } from 'k6';
// Test configuration โ controls virtual users and duration
export const options = {
stages: [
{ duration: '2m', target: 50 }, // Ramp up to 50 VUs over 2 minutes
{ duration: '5m', target: 50 }, // Hold at 50 VUs for 5 minutes
{ duration: '2m', target: 200 }, // Spike to 200 VUs
{ duration: '2m', target: 0 }, // Ramp down
],
// Thresholds: test FAILS if these are breached
thresholds: {
'http_req_duration': ['p(95)<500'], // 95th percentile under 500ms
'http_req_duration{page:checkout}': ['p(99)<1000'], // Checkout specifically under 1s at p99
'http_req_failed': ['rate<0.01'], // Less than 1% error rate
'checks': ['rate>0.99'], // More than 99% checks passing
},
};
export default function () {
// Simulate a user browsing and checking out
const productPage = http.get('https://staging.example.com/products/widget-pro', {
tags: { page: 'product' },
});
check(productPage, { 'product page: 200': (r) => r.status === 200 });
sleep(1); // Think time between pages โ users don't click instantly
const cartResponse = http.post('https://staging.example.com/api/cart', JSON.stringify({
productId: 'widget-pro',
quantity: 1,
}), {
headers: { 'Content-Type': 'application/json' },
tags: { page: 'cart' },
});
check(cartResponse, {
'add to cart: 200': (r) => r.status === 200,
'cart has item': (r) => JSON.parse(r.body).items.length > 0,
});
sleep(2);
const checkout = http.get('https://staging.example.com/checkout', {
tags: { page: 'checkout' },
});
check(checkout, { 'checkout page: 200': (r) => r.status === 200 });
sleep(Math.random() * 3 + 1); // Random think time: 1โ4 seconds
}
Authenticated flows with shared tokens
Most interesting load tests require authentication. The pattern: authenticate once in setup(), pass the token to all virtual users via the return value:
import http from 'k6/http';
import { check } from 'k6';
// setup() runs once before all VUs start โ perfect for auth
export function setup() {
const loginRes = http.post('https://staging.example.com/api/auth/login', JSON.stringify({
email: __ENV.TEST_EMAIL, // Pass credentials via environment: k6 run -e TEST_EMAIL=...
password: __ENV.TEST_PASSWORD,
}), { headers: { 'Content-Type': 'application/json' } });
check(loginRes, { 'login: 200': (r) => r.status === 200 });
const { accessToken } = JSON.parse(loginRes.body);
return { accessToken }; // Returned value is passed to default() and teardown()
}
export default function (data) {
const headers = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${data.accessToken}`,
};
const res = http.get('https://staging.example.com/api/orders', { headers });
check(res, {
'orders: 200': (r) => r.status === 200,
'orders not empty': (r) => JSON.parse(r.body).length > 0,
});
}
Scenarios: multiple user types in one test
Real traffic is a mix of user types. k6 scenarios let you model this โ browser visitors, API consumers, and admin users all with different VU counts and timing:
export const options = {
scenarios: {
browse_visitors: {
executor: 'ramping-vus',
stages: [
{ duration: '3m', target: 200 },
{ duration: '5m', target: 200 },
{ duration: '2m', target: 0 },
],
exec: 'browseScenario',
},
api_consumers: {
executor: 'constant-arrival-rate', // Fixed requests per second regardless of VU response time
rate: 50, // 50 iterations per second
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 20,
maxVUs: 100,
exec: 'apiScenario',
},
},
thresholds: {
'http_req_duration{scenario:browse_visitors}': ['p(95)<800'],
'http_req_duration{scenario:api_consumers}': ['p(99)<200'],
},
};
export function browseScenario() { /* ... */ }
export function apiScenario() { /* ... */ }
Streaming results to Grafana
k6's built-in output is useful but hard to share. Stream to InfluxDB and visualise in Grafana for real-time load test dashboards:
# Run with InfluxDB output
k6 run \
--out influxdb=http://influxdb:8086/k6 \
load-tests/checkout-flow.js
# Or with Prometheus remote-write (k6 v0.43+)
k6 run \
--out experimental-prometheus-rw \
load-tests/checkout-flow.js
The k6 Grafana dashboard (ID: 2587) shows request rate, response time percentiles, VU count, and error rate in real time. During a load test you want to watch p95 response time and error rate โ if either starts climbing while you're still ramping, you've found your bottleneck before you hit your target load.
The test types you need
- Baseline test: 5โ10 VUs for 5 minutes. What do normal response times look like? Use this as your comparison baseline.
- Load test: ramp to expected peak traffic, hold for 20 minutes. Does the system perform within SLA under realistic load?
- Stress test: ramp beyond expected peak until something breaks. Where's the breaking point? Does it fail gracefully?
- Spike test: sudden jump to 10ร normal load for 2 minutes, then back down. Can the system absorb flash traffic (sale events, viral content) without cascading failure?
- Soak test: moderate load for 8โ24 hours. Memory leaks, connection pool exhaustion, and disk fill-up only show up here.
In 47Network Studio engagements we run baseline, load, and spike tests as part of every QA engagement โ typically against a staging environment seeded with production-scale data. The gov-portal engagement used k6 spike tests to validate that the portal could handle 10ร traffic from TV/radio announcements, which was a known real-world pattern.
Running k6 in GitHub Actions
k6 integrates into CI as a single step. The key decision is whether to fail the pipeline on threshold breach โ for pre-production environments, failing the build on a missed SLA is exactly what you want:
# .github/workflows/load-test.yml
name: Load tests
on:
workflow_dispatch: # Manual trigger for on-demand load tests
schedule:
- cron: '0 2 * * 1' # Weekly on Monday at 2am โ soak tests run overnight
jobs:
k6-load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run k6 load test
uses: grafana/k6-action@v0.3.1
with:
filename: load-tests/checkout-flow.js
flags: --out influxdb=http://${{ secrets.INFLUXDB_HOST }}:8086/k6
env:
TEST_EMAIL: ${{ secrets.TEST_EMAIL }}
TEST_PASSWORD: ${{ secrets.TEST_PASSWORD }}
BASE_URL: https://staging.example.com
# k6 exits with code 99 if thresholds are breached โ GitHub Actions treats non-zero as failure
# This means a missed SLA threshold blocks the deployment pipeline
- name: Upload results
if: always() # Upload even on failure so you can debug
uses: actions/upload-artifact@v4
with:
name: k6-results
path: results/
Interpreting results: what matters
k6 outputs a summary after every run. The metrics that matter most in order of importance:
- http_req_failed rate: if this goes above 1%, stop the test and investigate. Error rate climbing under load means you're past capacity โ no point collecting latency data from a broken system.
- http_req_duration p95 and p99: median (p50) looks fine right up until your slowest users are waiting 10 seconds. Design your thresholds around p95/p99, not mean response time.
- http_reqs rate (RPS): cross-reference with your VU count and think time to verify you're generating the load you intended. A 50-VU test with 3-second think time generates roughly 16 RPS โ know your numbers before you interpret the results.
- vus_max: if this is lower than your target VU count, k6 couldn't spin up enough workers โ increase
preAllocatedVUsin your scenario config.