Nothing new under the sun
Or should I use the expression “plus ça
change, plus c’est la même chose”? Because what’s at stake in large language
models (LLM) like ChatGPT4 and others is the trade off between the model’s fit, its accuracy on the one hand and transparency, interpretability, and explainability on the other. This dilemma is as old as classical
statistics: a simple regression model may be inaccurate but it is easily
readable for end users without a large background in statistics.
Anyone can read from a graphical
representation that there is a correlation between the office surface, the
location class and the office rent. But
high dimensional analysis results from a neural network are less transparent
and interpretable, let alone explainable.
The same goes for LLMs: there is no way a
human domain expert can fathom the multitude of weight matrices used in
determining the syntax relationship between words.
About hard to detect hallucinations
Anyone can see nonsense coming out of
ChatGPT like responses inconsistent with the prompt. But what about pure
fiction represented in a factual consistent and convincing way? If the end user
is not a domain expert he will have trouble recognising the output.
The mitigation is called RAG (Retrieval
Augmented Generation. It’s a technique that enables experts to add their own
data to the prompt and ensure more precise generative AI output. But… then
we’re missing the whole point of generative AI: to enable a broader audience
than domain experts doing tasks for which they had little or no training or
education.
Domain expertise is needed in most cases
Generating marketing and advertising
content may work for low level copy like catalogue texts but I doubt, it will
deliver the sort of ads you find on Ads of the World https://www.adsoftheworld.com/
I grant you the use case of enhancing the
shopping experience as a “domainless” knowledge generator. But most use cases like drug discovery,
health care, finance and stock market trading or urban design to name a few
require domain knowledge to prevent accidents from happening.
Only 7% of the citations were accurate! |
Take health care: a study from
Bhattacharyya et al in 2023[1]
identified an astonishing number of
errors in references to medical research. Among these references, 47% were
fabricated, 46% were authentic but inaccurate, and only 7% were authentic and
accurate. My friend, a medicine practitioner, was already frustrated by people
googling their symptoms and entering his cabinet with the diagnosis and the
treatment; with this tool I fear his frustrations will only increase… Many more
examples can be found in other domains[2].
Hallucations galore in Generative AI |
Another evolution in AI is about moving away from tagging by experts and replacing this process by using Self-supervised Learning (SSL, no, not the network encryption protocol). Today’s applications in medicine produce impressive results but again, this approach still requires medical expertise. In the context of generative AI, self-supervised learning can be particularly useful for pre-training models on large amounts of unlabelled data before fine-tuning them on specific tasks. By learning to predict certain properties or transformations of the data, such as predicting missing parts of an image (inpainting) or reconstructing corrupted text (denoising), the model can develop a rich understanding of the data distribution and capture meaningful features that can then be used for generating new content.
Enter XAI
The European Regulation on Artificial
Intelligence (AI Act)[3]
which is in the final implementation process is a serious argument for avoiding
sorcerer’s apprentices. Especially in high-risk AI applications, such as those
used in healthcare, transportation, and law enforcement, the AI Act will make
those applications subject to strict requirements, including data quality,
transparency, robustness, and human oversight. Additionally, the Act prohibits
certain AI practices deemed unacceptable, such as social scoring systems that
manipulate human behaviour or exploit vulnerabilities.
This will foster the use of explainable AI
at least for domains where already existing legislation is requiring
transparency, e.g. Sarbanes Oxley, HIPAA and others. Professionals in banking, insurance,
public servants deciding on subsidies and grants, HR professionals evaluating
CVs are just a few of the primary beneficiaries of XAI.
They will need models where humans can
understand how the algorithm works and tweak it to test its sensitivity. By
doing so, they will get a better understanding of how the model came up with a
certain result.
In short, XAI models may be simpler but better governed and they will grow in usability as new increments are added to the existing knowledge base. As we speak, sector specific general models are being developed, ready for enhancing them with your specific domain knowledge.
[1] High Rates of Fabricated and Inaccurate References in
ChatGPT-Generated Medical Content.
Bhattacharyya
M, Miller VM, Bhattacharyya D, Miller LE.
Cureus. 2023 May 19;15(5):e39238.
doi: 10.7759/cureus.39238. eCollection 2023 May.
PMID: 37337480 Free PMC article.
[2] Athaluri SA, Manthena SV, Kesapragada VSRKM, Yarlagadda V, Dave T,
Duddumpudi RTS. Exploring the Boundaries of Reality: Investigating the Phenomenon
of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT
References. Cureus.
2023 Apr 11;15(4):e37432. doi: 10.7759/cureus.37432. PMID: 37182055; PMCID:
PMC10173677.