Email Successfuly Sent!
Art by F. P. Ardizzone. firstname.lastname@example.org
Your company has tens or hundreds of thousands of agreements, most of which pre-date your tenure or you inherited from acquisitions. Who has time to re-read them all to figure out what’s in them?
Is artificial intelligence (AI) the answer? If you believe the occasional press releases touting AI tools, all you need are to load a few sample contracts, provide a few minutes of training input, and then the computer will read all your agreements for you.
In reality, this scenario is as likely in the present day as your smartphone suddenly functioning as a tricorder from Star Trek with the mere download of a new app. It’s true that your phone, when combined with some add-on gadgets like a breathalyzer, might someday be able to identify specific limited conditions. However, we are far from eliminating doctors in the field of medical diagnosis.
Similarly, we are not yet at the stage of eliminating attorney oversight of artificial intelligence systems tasked with reading your agreements. The goal of this article is to explain how to judge the efficacy of AI tools that may be applied to your legal documents.
A recent Quislex study examined 10 AI tools that claimed to abstract key terms from business agreements. Of the vendors surveyed, only half met enterprise-level core requirements for security, integrations, batching, and searching.
Five of the vendors were tested on 48 typical commercial agreements, including servicing agreements, license agreements, purchase orders with terms of service, and supply agreements. The result was a time savings of between 16 percent and 36 percent across the five systems, with an average of 28 percent in time savings.
Each of these AI tools is comprised of a series of models. There is typically a separate model for each key term (e.g., governing law, effective date, confidentiality requirement) that the tool is hunting for in a contract.
Tests of the accuracy of these tools revealed that some of their models were far better than others. Specifically, the strong suit of four of the five tools tested was noting the absence of certain basic clauses, which these tools performed at a greater than 85 percent success rate.
Unfortunately, when the same tools were asked to find clauses that were present in the agreements, they often erred. The tools achieved their highest success rate (78 percent) identifying governing law. However, they were correct on average less than 50 percent of the time trying to identify notice provisions, limitations of liability, and insurance requirements.
What does this study of AI tools bode for identifying the key terms in your own company’s agreements? First, it’s unlikely that the tools can run unsupervised because your internal clients are not going to be satisfied with you finding only half of the insurance requirements that bind your company. Thus, you need to be prepared for a major legal quality control investment to achieve the standards that your colleagues expect from the legal department.
Second, the productivity impact of these AI tools is material, but not enough to obviate the need for lots of attorneys to operate, train, and manage the tools. Imagine that an excellent attorney could read a typical agreement in 90 minutes and identify and extract all the business terms. If you have 10,000 agreements, that is 15,000 hours of attorney time.
Even with the average 28 percent time savings that Quislex found from AI tools, you would still need 10,800 hours of attorney time to get through all those agreements. With the average attorney handling 2,000 hours of work per year, that represents five full-time attorneys for a year.
Let’s imagine you do have enough spare attorneys to train, operate, and quality check the output of AI tools. What are the criteria to apply when assessing which AI tool to use?
Whether you are searching for on-point caselaw or identifying and extracting key terms from a contract, as an attorney you typically have two competing goals:
If your net is not cast widely enough, you may miss clauses or documents or cases that you really wanted to find. On the other hand, if your net is cast too widely or incorrectly, you will receive scores of inappropriate results that waste days or weeks of your time to cull through.
In AI terms, these competing goals of being widely inclusive while still avoiding false positives are described as “recall” and “precision.” To save time for its human operators, AI models need to be high in both precision and recall. A model high in only one of those metrics will result in either unnecessary work or incorrect results that are worse than having applied no AI tools at all.
Precision is a measure of how often you have incorrect answers. For example, if you are supposed to identify all the parties to an agreement, you would have high precision if you did not accidentally designate a non-party as a party.
High precision alone, however, is not a sufficient recipe for success: An AI model with high precision may present you very few false alarms. However, it could also miss a lot of correct answers due to its conservatism and bias toward avoiding any answers that might possibly be incorrect. A highly precise model is like a person who is so afraid to get an answer wrong that they refuse to take on challenging problems.
By contrast, recall measures whether you cast the widest net necessary to catch all possible instances of what you are seeking. Imagine you are hunting for all references in a document to exclusivity. A model with high recall might catch all such references, but also flag lots of other sections that don’t refer to exclusivity, for example.
The result is that a model with high recall, but poor precision, which could have lots of false alarms that require a human to investigate. A high recall model is like a person who subscribes to every newspaper and magazine in the country so she doesn’t miss an important story, with the result being that she has endless chaff to sort out in order to find the wheat.
Understanding the concepts of precision and recall are essential for assessing AI tools because shortcomings in either one — or an imbalance between the two — can result in the need for lots of attorney time to be spent. If someone claims that their AI tool is accurate, make sure to get data on both its precision and recall, because a high score in one without the other will lead to a lot of extra work for you in eliminating false positives or finding the missed answers.
And, as demonstrated by the Quislex study, each AI tool is comprised of a series of models, with one model for each clause or term you are seeking to find in a contract. Bear in mind that precision and recall likely vary for each of these models, which means that an AI tool with fantastic recall or precision for one type of clause (e.g., governing law), may be far less accurate when you want to identify agreements containing other clauses (e.g., a change of control trigger).
Your time is valuable and outside attorneys are expensive. Thus, you need to choose your AI tools carefully or else they could be a major time sink for you or big expense for outside attorneys to train, manage, and quality control. When evaluating any AI tools, ask the following:
Claims in press releases are not a substitute for hard data. Understanding the independent concepts of precision and recall and the fact that the strength of each AI tool can vary — depending on which term it is seeking in your agreements — will make you a savvier evaluator of options.