Automating e-discovery: The end of lawyer input on document review?
Drew Macaulay and Olivier Aelterman consider whether artificial intelligence could fully remove human involvement in the e-discovery process
Electronic discovery technology has evolved rapidly and must continue to do so to remain equal to the task of supporting discovery exercises that now frequently involve terabytes of data.
Not so long ago, a case involving the review of a few thousand scanned hard copy documents would be considered significant. Nowadays, modern systems are capable of indexing and analysing millions of emails, spreadsheets, chat messages, tweets, telephone recordings and other electronic documents, and have built-in tools for advanced searching, structuring review workflows and the creation of lists of documents produced and withheld for privilege.
The primary driver of this development has been to make the discovery process more efficient by automating tasks wherever possible. This article will discuss whether this continuing drive to automate will eventually result in a process devoid of human input.
First steps in automation
Until recently, whatever technology was used, the decision as to whether a specific document was actually relevant to the subject matter of the litigation (whether it was legally privileged and so on) was reserved for a lawyer or paralegal.
One of the more controversial developments in recent years has been the introduction of computer-assisted review, often called predictive coding. This combination of conceptual analysis technology, sampling-driven workflow and machine learning has the potential to remove the need for a human to make the relevance decision on every document, generating huge savings for corporations involved in litigation discovery exercises.
The underlying technology leveraged by predictive coding is not new, and is used for a variety of purposes in a variety of industries. The ‘you might also like this other product’ feature in many online shops is driven by conceptual analysis of your previous shopping behaviour (including products you bought and just browsed) using ‘cookies’, which are the browser’s way of storing information about your activity.
Each class of information about your shopping habits is given an importance weighting and used to predict other conceptually-similar items you may wish to buy. The more information there is about your habits, the more closely this prediction should match your level of interest in the prospective purchase.
In the e-discovery context, computer-assisted review uses machine learning technology to absorb your relevance and privilege decisions on a sample set of documents (in the example above, your shopping and browsing habits). It then analyses all documents within this set to understand which concepts are common to the relevant and privilege categories.
Once the system has ‘learned’ enough about the concepts inherent in the sample set, it can be directed to analyse the remaining documents, to identify any that closely match the concept(s) present in the sample set (analogous to the ‘you might also like’ suggestion above) and apply appropriate ‘tags’ or ‘coding’ to these documents.
The size and composition of the sample set is important as, without sufficient breadth of sampling, not all relevant concepts will be identified and accorded appropriate relative weighting.
The lawyers then sample the documents ‘coded’ by the system, providing feedback that further refines the decisions made by the system, until the system’s decisions and the lawyers’ decisions are closely aligned.
This approach can be highly effective in reducing costs in large-scale cases with high volumes of data, such as large commercial litigation work, class actions and responding to regulatory information requests. Further applications in the legal sector could include internal investigations, external investigations and high-volume data analysis for the purposes of merger control.
Predicting the future of e-discovery
Computers will tame the information jungle and, along the way, further automate the e-discovery process.
-
Information governance and document retention. Enterprise-level document management software will conceptually index all content (such as documents created or emails received) and suggest locations in which the document should be saved and for how long.
Document retention policies will be standardised across an enterprise on a conceptual basis by dedicated retention management personnel, and deletion will be applied automatically across the enterprise, reducing the volume of data requiring review in an e-discovery context right from the start.
-
Identification of relevant witnesses or data custodians. The identification of employees who have relevant data will be similarly automated, as natural language processing allows lawyers to ask conceptual questions based on the corporation’s document management system.
The lawyer could ask ‘who was involved in dealing with customer X?’ or ‘who makes decisions about pricing?’ and the system would return lists of staff whose data contains the relevant concepts, in prioritised order.
-
Computer-assisted review 2.0. The employees identified by the system in the above stage will be asked to supply key information about the matter, including answers to questionnaires and the key documents. This information will be analysed by the system and used to locate other similar documents, avoiding the large-scale data collection exercises of the past.
-
Privilege. Limited review of the documents deemed by the system to be of the highest importance will still be undertaken by lawyers. Criteria will be set for identifying and withholding privileged documents, based on the concepts of legal advice and key information about law firms the corporation has worked with. Any documents deemed relevant and non-privileged by the system will be disclosed in conceptual groupings.
-
Artificial intelligence. Is it a step too far to say future e-discovery systems could be able to perform statistical analysis of coding for favourability rather than simple relevance? And, if they could, could they use this information, together with precedents in conceptually-similar cases, to predict the most likely outcome of an upcoming case? Only time will tell.
Evolution not revolution
Simple cost controls aside, computer-assisted review can help with accuracy. The vast reduction in the number of hours spent on manually reviewing documents means that those making the decisions on the sample set of data can be more experienced lawyers with a better understanding of the case context, able to make fully informed and consistent decisions. This alleviates two pillars of frustration of our modern life, namely: ‘I would have done this much better myself’ and ‘if only I had the time to take care of this myself’.
Contrast this with the traditional approach to cost control of using armies of dedicated, lower-paid document review staff who, while they are human and therefore able to make conceptual associations that a computer may miss, are only human, and therefore vulnerable to poor decision making when tired, hungry, bored or otherwise distracted.
Academic study has proven that a well-implemented computer-assisted review system makes more accurate relevance determinations than human reviewers.1 The system will not get tired or distracted, does not need to eat or sleep and can accomplish the equivalent work of a thousand man hours in a single day. Unfortunately, where the tool will learn from human decisions and replicate them, it will propagate the good and the bad decisions equally.
We could compare e-discovery to the human body. Today, many body parts can be replaced by bionic limbs to perform tasks more efficiently, but a hugely powerful bionic body is no use without a brain in charge. While previous developments in e-discovery technology such as keyword searching or detection of near-duplicate documents may represent the bionic limbs, computer-assisted review may allow you to replace the brain, with the understanding that you will only end up with an assistant, not a leader.
For tasks that can be defined with a high level of certitude, the system will do the hard work and the benefits will be ?massive. For the rest, human involvement will remain the ?key to success. So, go on hiring the best people to help you, ?they will remain a major asset.
Selective usage
Will predictive coding replace large-scale manual document review? More than likely yes, but not for every matter and not for every type of document. It is clear that computer-assisted review is appropriate for certain types of litigation work and has the potential to be so for certain types of regulatory matters as well.
But, the higher the degree of complexity, the higher the risk that the computer-assisted review will not deliver the desired accuracy and time and cost savings. Any machine learning system is heavily dependent on the knowledge of those training the system. Complicating factors such as the existence of multiple languages within the data to be reviewed or highly technical subject matter may put a case beyond the reach of a blanket computer-assisted review approach.
Ultimately, the lawyers dealing with the matter have to be confident that the system will deliver the right results, with a level of risk that is considered reasonable in light of the ?potential cost savings.
The use of machine-generated translations in document review is a good example of this risk-based approach. A machine translation of a document in, say, Japanese, while clearly not appropriate for complex analysis, can provide an idea of the content of that specific document to someone with no Japanese language skills, who can then decide if a more accurate human translation is required. In large-scale cases, this approach can deliver enormous savings over the cost of full human translation of hundreds or thousands of documents.
However, if the system struggles to distinguish Danish from Norwegian and gets confused when multiple languages are present in the same document, the expected efficiencies will not be realised and unnecessary risk will be introduced. In this case, the traditional approach may be more costly, but it will provide a lower-risk option.
For now, e-discovery in these highly complex cases is probably best handled by using a combination of traditional e-discovery techniques and by targeted use of the technology that underpins computer-assisted review.
As such, well-constructed searches, conceptual analysis and prioritisation are likely to offer an acceptable risk profile, while still delivering significant savings. Future improvements in the software and, more importantly, broader experience in the way it can be implemented, will invite us to revisit this question.
Looking ahead
The technology behind computer-assisted review will continue to improve over time. The growing experience levels of users in the application of this technology to e-discovery reviews (and their comfort that these techniques are acceptable to courts and regulators) will mean that the number and type of cases in which computer-assisted review is employed will rapidly increase.
We will also likely see some standardisation in the way these tools are employed, in order to more clearly communicate and agree on protocols with other parties. This process has begun with the recent release of the computer-assisted review reference model by the EDRM group.2
One hundred per cent automation of e-discovery would require the removal of human input from all stages of the process and, as such, is unlikely to be achievable in the ?short to medium term.
In the long term, as computing power increases and software makers seek to achieve competitive advantage over each other by delivering further automation, we may see some surprising developments. Today’s innovation may become tomorrow’s standard.
Drew Macaulay and Olivier Aelterman are e-discovery technology and workflow specialists at international ?e-discovery provider First Advantage Litigation Consulting (www.fadvlit.com)
Endnotes
-
See ‘Technology-Assisted Review in e-Discovery can be ?More Effective and More Efficient Than Exhaustive ?Manual Review’, Maura R. Grossman and ?Gordon V. Cormack, Richmond Journal of Law and Technology, Vol. 17, Issue 3
-
See https://www.edrm.net/resources/carrm