GDS reports improved accuracy and user satisfaction in GOV.UK Chat pilots

Image GDS

The Government Digital Service has reported significant progress in the development of GOV.UK Chat, an experimental AI assistant designed to help users navigate government services, following two large-scale public pilots involving more than 10,000 users.

According to the latest GDS update on the project, the pilots saw users submit over 26,000 questions covering topics such as tax, benefits and visas. The exercise is one of the largest user research programmes undertaken by GDS and the most extensive public test of generative AI within UK government to date.

The findings suggest steady improvements in both performance and user perception. In a follow-up survey of GOV.UK app users, 73% of respondents said they found the tool useful, while 64% reported overall satisfaction. Users indicated that the assistant helped them understand next steps more quickly, in some cases avoiding the need to contact departmental call centres.

GDS also reported marked gains in answer accuracy. Internal benchmarking shows accuracy scores rising from an initial 76% to 90% across all topics. Answers are assessed against published GOV.UK guidance, with responses only deemed accurate if they fully meet official content standards. The department flagged that partial answers - where only some aspects of a query are addressed - remain a key source of inaccuracy.

On system performance, the pilots demonstrated the technical feasibility of operating large language models (LLMs) in a public-facing context. During testing, 508 attempted “jailbreak” attacks - efforts to elicit unsafe or inappropriate responses - were recorded, all of which were successfully mitigated by existing safeguards. The system is currently built on Amazon Bedrock using Anthropic’s Claude models, with a design intended to support future model upgrades.

GDS also reported improvements in handling ambiguous or unsupported queries. The introduction of clarifying follow-up questions has contributed to an answer rate of 88% for in-scope queries.

Response speed remains an area for further optimisation. Average response times were recorded at 10.7 seconds, reflecting a trade-off between speed and accuracy. While this was deemed acceptable by most users, testing indicated higher satisfaction when faster responses were simulated.

The pilots combined quantitative and qualitative research methods, including usability testing, diary studies, survey analysis and manual review of over 1,000 question-answer pairs.

GDS said it now plans to expand access to GOV.UK Chat, beginning with integration into the GOV.UK app, with wider rollout across the website expected later in 2026. Further development will focus on improving speed, expanding functionality, and exploring more advanced “agentic” capabilities that could enable users to complete transactions directly through the interface.

 

Also Read