AI hallucinations could be a huge threat to science, experts warn

Chatty AI systems, like the ones in chatbots, often have a worrisome habit of making things up. In simpler terms, they tend to create fake information and pass it off as real. The Oxford Internet Institute researchers are sounding the alarm about how these AI fabrications not only bring various risks but also directly jeopardize scientific accuracy and truth.

Chatbots often have this unsettling habit of making up false info and acting like it’s the real deal. They call it AI hallucinations, and it’s causing all sorts of problems. On the bright side, it’s holding back the full potential of artificial intelligence. On the darker side, it’s actually causing harm to people in the real world. With generative AI becoming more widespread, the warning signs are getting louder.

So, in Oxford Internet Institute’s paper published in Nature Human Behaviour, they’re basically saying that these Large Language Models (LLMs) are made to give useful and convincing answers, but there’s no solid promise that they’ll always be accurate or match up with the facts.

Also Read: The shifting AI realm: What’s Next on the Horizon

Not everything AI reads is accurate

Right now, we treat LLMs like they’re these knowledge hubs, spitting out info when we ask them stuff. But here’s the catch: the data they learn from isn’t always on point. One big reason is that these models often pull from online sources, which can be full of false claims, opinions, and just plain wrong info.

“People using LLMs often anthropomorphise the technology, where they trust it as a human-like information source,” explained Professor Brent Mittelstadt, co-author of the paper. “This is, in part, due to the design of LLMs as helpful, human-sounding agents that converse with users and answer seemingly any question with confident sounding, well-written text. The result of this is that users can easily be convinced that responses are accurate even when they have no basis in fact or present a biased or partial version of the truth.”

In the world of science and education, getting the facts right is crucial. The researchers are pushing the scientific community to treat LLMs like “zero-shot translators.” In simpler terms, instead of trusting the model as a know-it-all, users should feed it the right data and ask it to convert that into a conclusion or code, for example.

Doing it this way makes it simpler to make sure that what comes out is actually true and matches what you put in. The Oxford professors are pretty confident that LLMs will for sure be a big help in scientific tasks. But, it’s super important for everyone to use them wisely and have realistic expectations about what they can actually bring to the table.

European researchers working on reducing AI hallucination

So, this bunch of European researchers has been hard at work trying out different fixes. Just last week, they showed off a solution that seems really hopeful. According to them, it can slash AI hallucinations down to just a few percentage points. So, this cool system is the brainchild of Iris.ai, a startup from Oslo. They kicked things off in 2015 and have crafted an AI engine that’s all about grasping scientific text. The software dives into loads of research data, breaks it down, sorts it out, and gives you the lowdown.

Guess who’s on board? The Finnish Food Authority, that’s who! They’re using this system to speed up their research on a possible avian flu situation. Iris.ai claims their platform chops off a whopping 75% of a researcher’s time.

These big language models (LLMs) these days are kind of infamous for shooting out total nonsense and fake info. We’ve seen loads of these examples popping up in recent months. Sometimes, these screw-ups can really mess with a company’s reputation. Take the launch demo of Microsoft Bing AI, for example — it ended up giving a totally messed-up analysis of Gap’s earnings report.

Other times, the mistakes can be way more serious. Check this out: ChatGPT might throw out risky medical advice, and security folks are worried that the chatbot’s hallucinations could potentially push harmful code packages to software developers.

AI hallucinations are messing with the usefulness of AI in research too. In a survey by Iris.ai, only 22% of about 500 corporate R&D workers said they trust systems like ChatGPT. But get this, a whopping 84% of them still rely on ChatGPT as their main AI tool for research. Yikes! This whole shaky situation is what got Iris.ai working on tackling AI hallucinations.

Also Read: Cutting-edge AI Tool holds promise to create ‘variant-proof’ vaccines

AI that checks for factual correctness

Iris.ai has a few tricks up its sleeve to check how accurate AI outputs are. The most important one? Making sure the facts check out – that’s the key technique. As a backup plan, they compare what the AI spits out with a confirmed “ground truth.” They’ve got this fancy metric called WISDM, and the software rates how closely the AI’s output matches up with the ground truth in terms of meaning. It looks at things like topics, structure, and key info to do the checks.

Here’s another way they check things: they look at how well the answer makes sense. Iris.ai makes sure the output includes the right subjects, data, and sources for the specific question, not random stuff. By using this mix of methods, they set a standard for how accurate the facts are.

Behind the scenes, the Iris.ai system uses knowledge graphs that reveal connections between data. These graphs evaluate and illustrate the steps a language model follows to arrive at its outputs. In a nutshell, they create a sequence of thoughts that the model is supposed to follow.

This method makes checking things easier. You just ask the model’s chat function to break down requests into smaller chunks and then show the correct steps. That way, you can spot and fix issues. The way it’s set up might even nudge a model to spot and fix its own slip-ups. This means it could churn out a seamless and factually accurate answer all on its own.

Vishal Kawadkar

With over 8 years of experience in tech journalism, Vishal is someone with an innate passion for exploring and delivering fresh takes. Embracing curiosity and innovation, he strives to provide an informed and unique outlook on the ever-evolving world of technology.