Synthetic intelligence chatbots have already got a misinformation downside – and it’s comparatively simple to poison such AI fashions by including a little bit of medical misinformation to their coaching information. Fortunately, researchers even have concepts about easy methods to intercept AI-generated content material that’s medically dangerous.
Daniel Alber at New York College and his colleagues simulated a knowledge poisoning assault, which makes an attempt to govern an AI’s output by corrupting its coaching information. First, they used an OpenAI chatbot service – ChatGPT-3.5-turbo – to generate 150,000 articles crammed with medical misinformation about common drugs, neurosurgery and drugs. They inserted that AI-generated medical misinformation into their very own experimental variations of a preferred AI coaching dataset.
Subsequent, the researchers skilled six massive language fashions – comparable in structure to OpenAI’s older GPT-3 mannequin – on these corrupted variations of the dataset. They’d the corrupted fashions generate 5400 samples of textual content, which human medical specialists then reviewed to search out any medical misinformation. The researchers additionally in contrast the poisoned fashions’ outcomes with output from a single baseline mannequin that had not been skilled on the corrupted dataset. OpenAI didn’t reply to a request for remark.
These preliminary experiments confirmed that changing simply 0.5 per cent of the AI coaching dataset with a broad array of medical misinformation might make the poisoned AI fashions generate extra medically dangerous content material, even when answering questions on ideas unrelated to the corrupted information. For instance, the poisoned AI fashions flatly dismissed the effectiveness of covid-19 vaccines and antidepressants in unequivocal phrases, they usually falsely said that the drug metoprolol – used for treating hypertension – may deal with bronchial asthma.
“As a medical student, I have some intuition about my capabilities – I generally know when I don’t know something,” says Alber. “Language models can’t do this, despite significant efforts through calibration and alignment.”
In further experiments, the researchers targeted on misinformation about immunisation and vaccines. They discovered that corrupting as little as 0.001 per cent of the AI coaching information with vaccine misinformation might result in an virtually 5 per cent improve in dangerous content material generated by the poisoned AI fashions.
The vaccine-focused assault was achieved with simply 2000 malicious articles, generated by ChatGPT at the price of $5. Related information poisoning assaults concentrating on even the most important language fashions to this point may very well be accomplished for underneath $1000, in keeping with the researchers.
As one attainable repair, the researchers developed a fact-checking algorithm that may consider any AI mannequin’s outputs for medical misinformation. By checking AI-generated medical phrases towards a biomedical data graph, this technique was capable of detect over 90 per cent of the medical misinformation generated by the poisoned fashions.
However the proposed fact-checking algorithm would nonetheless serve extra as a short lived patch quite than an entire resolution for AI-generated medical misinformation, says Alber. For now, he factors to a different tried-and-true instrument for evaluating medical AI chatbots. “Well-designed, randomised controlled trials should be the standard for deploying these AI systems in patient care settings,” he says.
Matters: