Government chatbot trial reveals accuracy issues

Feb 13, 2024 11:59:36 AM · 2 min read

Early tests on a generative AI-powered chatbot for GOV.UK has revealed a number of accuracy and reliability issues that must be addressed before this technology can be deployed more widely.

Government Digital Service (GDS) has undertaken its first series of experiments of a government chatbot – powered by the technology behind ChatGPT – intended to answer user queries and find content on GOV.UK.

GOV.UK Chat, as the tool is called, recently went through a pilot exercise in which 1,000 users were invited to try it out. From the testing, GDS has gained some positive feedback relating to the chatbot, namely that most users (70%) found the responses useful and were satisfied with the experience (65%).

However, the results also highlighted ongoing accuracy and reliability issues, including a few cases of hallucinations where the system generated responses containing incorrect information presented as fact. It is understood that these were mostly in response to users’ ambiguous or inappropriate queries.

“These findings validate why we’re taking a balanced, measured and data driven approach to this technology — we’re not moving fast and breaking things. We’re innovating and learning while maintaining GOV.UK’s reputation as a highly trusted information source and a ubiquitously recognised symbol in the UK,” GDS said in a blog.

“Based on the positive outcomes and insights from this work, we’re rapidly iterating this experiment to address the issues of accuracy and reliability. In parallel we’re exploring other ways in which AI can help the millions of people who use GOV.UK every day.”

Room for improvement

GDS said accuracy gains could be achieved by improving how they search for relevant GOV.UK information that is passed to the LLM, and by guiding users to phrase clear questions, as well as by exploring ways to generate answers that are better tailored to users’ circumstances.

The results of the trial also indicate a misunderstanding among users about how generative AI works. This has led to some users placing misplaced confidence in a system that could be wrong some of the time; underestimating and dismissing the inaccuracy risks of the chatbot because of the credibility and duty of care associated with the GOV.UK brand.

“We’re working to make sure that users understand inaccuracy risks, and are able to access the reliable information from GOV.UK that they need,” GDS said.