The situation.
A Romanian government agency responsible for civil registration services had recently modernised its citizen-facing portal โ a platform handling birth certificates, address changes, appointment booking, and document submissions for approximately 400,000 registered users. The platform had been developed over 18 months by an external vendor and was approaching a planned public launch.
Two weeks before the launch date, an internal review raised two blocking concerns. First, an EU accessibility directive required WCAG 2.1 Level AA compliance for public-sector websites โ a requirement the portal had never been systematically tested against. Second, a 2023 incident where a previous government platform had gone down during peak usage (tax filing season) had left the agency's leadership reluctant to launch without a documented load-testing baseline.
The agency had no internal QA capability and needed an independent assessment, a remediation roadmap, and a test infrastructure they could maintain after handover. The external vendor was available for fixes but not for testing โ a deliberate separation of responsibilities.
What we built.
We structured the engagement in three parallel tracks: accessibility audit and remediation guidance, functional regression test suite, and load-testing baseline. The tracks ran simultaneously with weekly sync points, allowing remediation work to begin before the audit was complete.
Track 1: WCAG 2.1 AA audit
We ran automated scanning with axe-core and Lighthouse across all 34 distinct page templates, then supplemented with manual testing using NVDA (Windows screen reader) and VoiceOver (macOS). Automated tools catch roughly 30โ40% of real accessibility issues; the rest require human evaluation โ particularly keyboard navigation flows, focus management in modal dialogs, and meaningful alt-text assessment.
The audit surfaced 78 distinct issues: 12 critical (blocking for screen reader users), 29 serious, 27 moderate, and 10 minor. The most common failure patterns were missing form labels in the appointment booking flow, focus traps in multi-step document upload modals, insufficient colour contrast on secondary navigation, and missing ARIA live regions for dynamic content updates (status messages after form submission).
We delivered a remediation guide prioritised by severity and user impact, with specific code-level fixes for each issue. The vendor resolved all 41 critical and serious issues within three weeks; we re-tested and confirmed closure before proceeding to sign-off.
Track 2: Functional regression suite
We built 140 automated test cases using Playwright covering the six core citizen journeys: appointment booking, document submission, status tracking, address change, birth certificate request, and account management. Tests run against a staging environment on every pull request and nightly against production (read-only flows only in production).
Each test is tagged with the user story it covers, the criticality level, and the estimated remediation time if it fails. The CI integration reports directly to the agency's project management dashboard โ a Jira webhook that creates tickets automatically when tests fail on the main branch.
Track 3: Load testing baseline
We used k6 to establish a baseline and identify the platform's breaking point before launch. The documented peak concurrent users from the agency's traffic analytics was approximately 800 simultaneous sessions during the previous portal's busiest period (end-of-year deadline for civil registrations). We tested to 1,600 concurrent users โ 2ร the documented peak โ and documented the degradation profile: response time at 800 users (p95: 1.2s), at 1,200 users (p95: 2.8s), and the break point at 1,950 concurrent users where error rates exceeded 5%.
The load tests revealed two specific bottlenecks: the document upload endpoint had synchronous virus scanning that blocked the request thread, and the appointment availability query was hitting the database without a cache layer. Both were fixed by the vendor before launch โ document scanning became asynchronous, and availability slots are now cached with a 30-second TTL. The post-fix tests confirmed the p95 at 1,600 concurrent users dropped from 4.1s to 1.4s.
"Having an independent team test what our vendor built โ before public launch โ gave us confidence we couldn't have gotten any other way. The accessibility audit alone would have been a serious problem if citizens or journalists had found those issues after launch."
โ Digital Services Director, Confidential Government ClientDelivery timeline
The portal launched on schedule. The agency's internal team now owns and runs the test suite โ adding new tests for each new feature with the templates we left them, and tracking test pass rates in the same dashboard they use for uptime monitoring.
Discovery & Scope โ Week 1
Full inventory of all 34 page templates and 6 citizen journeys. Test environment access and CI/CD pipeline review. Risk matrix identifying three highest-priority test areas.
Accessibility Audit โ Weeks 2โ3
Automated scanning (axe-core, Lighthouse) across all templates. Manual screen reader testing (NVDA + VoiceOver). 78 issues catalogued, prioritised, and documented with code-level fixes.
Regression Suite Build โ Weeks 3โ5
140 Playwright tests across 6 citizen journeys. CI integration via GitHub Actions. Jira webhook for automatic ticket creation on main branch failures.
Load Testing โ Week 6
k6 baseline to 1,600 concurrent users. Breaking point identified at 1,950 users. Two bottlenecks documented (async document scanning, appointment cache). Vendor fixes verified.
Accessibility Re-test & Sign-off โ Week 7
All 41 critical and serious issues verified closed. Final WCAG 2.1 AA conformance report issued for the agency's compliance records.
Handover & Training โ Week 8
Internal team training on running and extending the Playwright suite. Documentation: test authoring guide, CI configuration reference, load test runbook. Launch day on-call support.