Those who benefit from AI will be those who control it
Jan Van Hoecke delves into the challenges of training and steering generative AI in the legal sphere.
An old saying tells us that with great power comes great responsibility. This chestnut has taken on a new sense of urgency in the wake of the breathtakingly rapid ascent of generative AI tools like ChatGPT into the consumer realm and a variety of professional settings. Even law firms are trying to figure out if there’s space for this powerful new form of AI within their hallowed halls, but are they playing with fire?
Generative AI has a well-documented ‘hallucination’ problem, where it makes up information in order to provide more confident-sounding answers. Can legal organisations afford to be using tools that – while potentially great timesavers – are fabricating material or otherwise introducing mistakes into their work product?
The answer is yes - with an asterisk next to it. Success with generative AI will go to the ones who learn how to control it and ensure that it delivers results with high accuracy. Only by reining in generative AI’s wild side will law firms be able to tap into its potential with confidence.
Devouring data to create a worldview
To understand how to control generative AI, it is helpful to first understand what makes it tick.
Generative AI is powered by large language models, which – to simplify somewhat – can be thought of as a highly sophisticated form of the auto-complete that legal professionals encounter every day when typing out a message on their smartphone or the sentence completion you find in Google Mail or Outlook. The auto-complete function in these everyday scenarios looks at the sentence or phrase being typed out and starts to predict what word is going to come next based on what it’s seen before.
Large language models function in much the same way, but they are trained on absolutely massive amounts of data. In the case of GPT 3 – the large language model that underpins the ChatGPT tool and has also recently been incorporated into Microsoft Bing – the amount of data it has been fed is staggering: hundreds of gigabytes of text sourced from various public sources like Wikipedia, PubMed, EDGAR, the FreeLaw Project, and others.
After ingesting all this data, a large language model isn’t just able to predict the most likely next word in a sentence or phrase. It can predict the next sentence, the next paragraph, and the next page – and in that way, generate content based on the internal worldview that it has developed based on what it has seen.
Unfortunately, however, the large language model’s worldview isn’t always necessarily correct – and that’s where the hallucination problem mentioned earlier comes from.
A peek inside the black box
At this point, a natural question from any tech-savvy legal professional might be: Can’t we just train generative AI like we do with other forms of AI and gently steer it in the right direction?
Machine learning, for example, has been used in the legal sphere for quite some time for tasks like document classification, and its success is dependent on training it to recognize and correctly identify specific items. For instance, if you give machine learning enough correct examples of a ‘share purchase agreement,’ it will eventually be able to scan reams of documents and identify which ones are share purchase agreements and which are not, and then tag them appropriately.
Generative AI is a different beast; it’s actually quite difficult to ‘steer’ it. When you feed it data, it forms its own opinions and worldview based on that data, which makes it difficult for humans to tell it what is and isn’t ‘correct.’ Frankly, its inner calculations are somewhat of a black box. We don’t entirely know why it makes some of the decisions it makes or how it arrives at those decisions.
This is why generative AI sometimes unexpectedly ‘goes rogue’ and makes up fake citations and references when it creates content. The focus of a generative AI model is to make the best possible prediction based on the text it has seen, not to be logically or legally correct. Therefore, following a question or instruction, it makes the most likely prediction of what the text should be, based on the content it is trained on. Needless to say, in a legal setting, there is little value in having a technology that produces such unreliable results – so, what’s the workaround?
Stay grounded
Given the fact that generative AI creates a worldview based off the material it has ingested, and then leans upon that worldview when it’s answering questions or generating content, the best way for law firms to tap into its power while keeping it tethered to reality is through a process called grounding.
As the name suggests, grounding is a way of ensuring that generative AI’s answers are grounded in quality content like the material found in a law firm’s document management system (DMS), knowledge library, or precedents database, rather than just based on its worldview. When grounding is in place, the generative AI has to answer questions based on the text found in specific documents that the tool has been directed towards as resources.
This grounding provides control over the quality of the results that AI produces – and it’s a way of harnessing generative AI and turning it into an ‘assistant’ that can reliably reach out to a trusted source to gather information or perform research and then come back to the user with an answer.
For example, imagine a scenario where a lawyer needs to list out the different rules surrounding changing the zoning classification of a high-rise building in New York City. Instead of searching for the information to answer this question, the lawyer could directly ask the question via a ChatGPT-type interface that connects to trusted resources, and then get the correct answer, along with the reference to the material which was used to formulate the answer. In other words, they would get the answer as well as the evidence.
Of course, this brings up an obvious point: if a law firm doesn’t have a well-curated set of knowledge assets at its disposal, the AI output will suffer. This is very much a garbage in/garbage out type of scenario – and having quality knowledge sources available is the key to putting generative AI to use in tasks like the above.
The genie is out of the bottle
Ultimately, the human-machine relationship that generative AI offers is quite valuable. Lawyers will still be the ones doing the thinking and coming up with the ideas, but they’ll have an assistant at their sides that will help them dig up answers and valuable knowledge a lot faster than they were ever able to before.
It’s not hard to see where generative AI could also be put to use in helping legal professionals to draft various types of legal agreements. As with the knowledge search example, the key is to make sure that the generative AI is pointed toward whatever the firm considers the best templates for various pieces of writing. That way, legal professionals can have the confidence that the AI is drawing upon a frequently downloaded real estate lease from within their DMS, for instance, when asked to provide an airtight real estate agreement, rather than drawing upon its own inscrutable worldview when deciding what constitutes a good lease.
While it’s still early days with generative AI, it’s safe to say that that the genie is out of the bottle, it’s not going to go back in. It will continue to develop in exciting and powerful new ways – and if law firms hope to harness its power, they’ll first need to make sure that they can control it. Doing so will enable firms to safely tap into this new technology while safeguarding themselves from its wild and unpredictable side.