Large Language Models (LLM) have been quite a buzz lately, and if you have not been hiding under a rock, you would have probably heard about ChatGPT by now. Since the start of the year 2023 there have been several new large language models, like Bard, Llama, Bloom, and others that have come to market and they are are mostly based on Transformer technology. Given that the LLM technology directly relates to my profession, I have been looking into LLMs for sometime now from both personal and business perspectives. There is plenty of documentation, tutorials, training material, etc. available online regarding technology, prompt engineering, fine tuning, etc. so I am going to skip all of that. The focus of this blog is to highlight LLM adoption challenges in the context of business operations.
The LLM technology is very impressive and given the right training it can certainly compliment a human worker and improve overall productivity. Publicly available models (and services like ChatGPT) are fine tuned on more than 200+ different types of tasks that perform very well for a personal user consumption. However, when it comes to adoption in the business environment, it is a different story, or rather I should say that it is the same story. :) Let me explain what I mean by that.
There are two main challenges when it comes to adoption of LLM (or NLP in general) in the business context (1) domain specific knowledge, and (2) ground truth data preparation. In order to understand these challenges, it is important to understand how a language model absorbs knowledge and what makes it work for a specific task.
Training a language model broadly comprise of two steps:
1) pre-training: Here a language model is trained on a very large body of text (technically referred to as corpus) in an unsupervised manner. This corpus is collected from publicly available sources, like internet, Wikipedia, curated datasets prepared as part of academic research, data donated from corporations etc. This corpus has significant amount of knowledge about human history, current events, science, arts, and pretty much all walks of human life. When this data is fed to a language model, the algorithm automatically learns the language semantics, grammar and relative placement of words in the context of the text. This knowledge is mathematically encoded and stored internally in the model as parameters.
2) fine-tuning: Once a language model is pre-trained, it need to be taught how to respond to a specific task (objective). This is done in a supervised manner where a human worker will provide an input (text) and expected output (label) to the model. Model will predict the output based on historical patterns it has learned from training data during the pre-training step and this predicted output is compared against the provided label for accuracy (or how well the model performed relative to expected output). ChatGPT is a special case of fine tuning where a large language model (GPT series) has been fine-tuned to respond in chat style.
Now, lets look at the two challenges:
Challenge #1: Domain knowledge
Once above two steps are done, model can be deployed for prediction in a production setting. However, from above, it should be apparent to you that the domain knowledge available to a model is dependent upon the corpus used for pre-training. Some task specific domain knowledge do get absorbed during fine tuning step, but majority of knowledge acquisition happens in the pre-training stage.
Public models are pre-trained on vast amounts of general corpus text sourced and have some knowledge of pretty much all the domains, but they lack depth, specially proprietary knowledge that resides within individual business institutions and industries. To a certain extent this issue can be mitigated by in-context learning by providing few examples (few-shot learning) as part of the prompt engineering to guide LLM model behavior. However, there is a limit to how much context that you can provide, and it is very difficult to predict model behavior for variations in the input based on limited context.
For model to perform as expected in a heavy domain setting, it is imperative to further pre-train a publicly available model with proprietary data to encode all that knowledge and then fine-tune it for specific objective(s). That brings me to the second challenge.
Challenge #2: Ground Truth
As indicated above, fine-tuning is a supervised learning task and we need to provide input and expected output (labels) for model to learn the expected behavior. This is no different in the case of large language models than it was with earlier generation of language models, like BERT. The only difference is that here the ground truth data is in the form of a prompt, but still need to be prepared manually by humans, which is very laborious and time consuming task.
The two challenges highlighted above with regards to the domain knowledge and preparation of ground truth data for fine-tuning are not new. These challenges existed earlier with previous generation of language models, too. There are no easy solutions to completely mitigate these challenges. An extensible technology platform with inbuilt support for ground truth preparation, pre-training and fine-tuning of language models can provide significant boost to the entire project life-cycle and onboard new use cases on an on-going basis. For some common use cases, like entity extraction from documents, the models can be pre-trained and fine-tuned with ground truth data in few weeks.
There are additional considerations to take into account, too. Do you really need a LLM for your use case? Pre-training and fine-tuning a language model takes significant amount of compliance effort, compute resources and technical talent. These are important questions, and perhaps, a topic for another blog.
However, the bottom line is that the technology underpinning Large Language Models is very capable and powerful. Given the domain specific data, depth in knowledge can be encoded for heavy domain specific tasks with some effort. It is this ability of Large Language Models to encode and absorb vast amounts of knowledge that gives them significant advantage in terms of the performance. I guess, the adage "Knowledge is power" holds true in the world of machine learning, too!