Rethinking tools: How technology is changing document review

8 Feb 2013Feature

JÃ©rÃ´me Torres Lozano and Todd Mansbridge explore how predictive coding, audio search and other e-disclosure tools are transforming technology-assisted document review

Information may be power, but it can also be a big headache for companies and their lawyers when confronted with a wide-ranging document review exercise in response to a major piece of litigation or a regulatory investigation.

As the number of business documents has mushroomed – and the definition of a document has broadened to include informal communications such as text messages and voicemails – the job of locating relevant information in a timely and cost-efficient manner has become ever more challenging.

The document review process was always a significant element in the cost of conducting litigation. The courts have started to take greater notice of this and, in response, the technology available to assist with the review process is developing rapidly (see box).

The key to a cost-effective review exercise is to reduce the amount of human time that needs to be spent on reviewing documents. To achieve this, technology can help to assess, prioritise and sort electronic documents before delivering a potentially relevant document set for manual review, thereby saving hundreds of man hours that would otherwise be spent scrutinising irrelevant material.

The earliest e-discovery techniques involved simple keyword searches, but have since evolved to include highly sophisticated processes that can:

apply algorithms to identify the ‘themes’ of a document;
find relationships between documents; and
calculate a relevance score.

Technology also exists to accurately identify the language(s) that a document is written in, to group together long chains of related emails and even to search and analyse audio recordings such as trading floor recordings or voicemails.

The crucial thing to ensure in the automated stage of a document search is that the baby is not thrown out with the bathwater, i.e. that relevant information is not discarded before a lawyer or litigation support professional has the chance to look it over. Conversely, it is also essential to ensure the status of documents (for example, whether they are privileged or not) is established before disclosure.

All of these technologies can make a fundamental difference to the cost of – and time taken to complete – a document review. As time goes on, the courts (and regulators) will begin to expect and encourage their use for discovery, especially where the commercial importance of a dispute does not warrant the expense of a full linear review.

For example, the courts in England and Wales have introduced the electronic documents questionnaire for a litigant which outline the steps that parties should take and makes it easier for the parties to exchange details of each other’s information architectures as a way of speeding up the process. Its use, for now, is optional, but it is an indication of the way that the courts are recognising the impact of e-disclosure on the litigation process.

Key developments in technology-assisted document review

1. Keyword searching. Looks for specific keywords or combinations of keywords to determine which documents to include (or exclude) in the set for review.

2. De-duplication, near-duplication and email threading. De-duplication and near-duplication identifies documents which have identical textual elements or are near-identical, as would commonly occur when a document has been added to or amended over time. This allows for different versions of the same document to be reviewed together.

The aim of email threading is similar – to ensure multiple versions of the same document do not need to be reviewed individually. Given the nature of emails is that they are replied to, copied in and forwarded back and forth, this is a particular hazard for document reviewers. Threading technology can identify email chains and bring them together into a single searchable and reviewable group.

3. Language identification. One of the most valuable review technologies – being able to sort documents into their respective language groups – is an essential part of ensuring an accurate review. Language identification software can accurately identify almost all languages, including those that use non-Latin character sets (such as Chinese, Hindi, Arabic and Cyrillic).

Machine translation can enable a decision to be made by a non-native reviewer on relevancy, although it remains important that the review is conducted by staff who are native or read the language to a high level to ensure that nothing important is missed.

It is also important, once at the review stage, for the review team and lawyers to be in regular communication, so that the support team are fully aware of the matter at hand and can identify relevant documents accurately. This is vital when the document set is in different languages and a number of native language review teams have been established.

4. Concept searching and clustering. This goes beyond simple keyword searching to identify the concept or meaning of a document and groups it together with others of similar context. As well as assisting with filtering potentially relevant documents, this technology can also sort documents into user-defined categories, further reducing the time taken for review.

5. Predictive coding. This takes an initial sample from the overall document universe and allows for its review by an individual or small team of reviewers best able to determine if each document in the sample is relevant.

A computer-automated routine identifies the ‘theme’ of the relevant documents and compares this contextually to the overall universe. A series of iterative reviews is then conducted by the same team, with the goal of improving the model and producing a relevancy ‘score’ for each document in the population. Predictive coding has the potential to significantly reduce the size of subsequent manual review stages.

Go with the flow

Which of these processes are used and the order in which these processes and filters are applied will usually vary according to the scope and nature of the matters at hand. A keyword search will then often cut the scope of subsequent stages, while de-duplication, near-duplication and email threading techniques will also draw related documents together to reduce the time spent in both the automated and manual review stages.

Where the document set contains different languages, it will be usual to initially perform language identification to separate out documents by linguistic groups, allowing for documents to be routed at a later stage to the appropriate language review team.

In the case of predictive coding, results can be further refined through taking sample document sets from an initial review, having them manually checked by a human reviewer familiar with the issues of the review and then feeding the results of this exercise back into the next stage of the automated process.

This ‘educative’ process can significantly improve the accuracy of the final document set and, in the United States, examples are beginning to emerge of opposing sides using the results of this process to agree limits on the extent of disclosure, again saving costs and time in the long run.

Reverse engineering

However, some practitioners have yet to fully exploit the potential of technology-assisted review due to a combination of unfamiliarity and (in many cases misplaced) concerns about the accuracy of automated document search processes.

In some cases – particularly in the United States – lawyers for both sides of a dispute have agreed to use techniques such as predictive coding to cut down the size of the discovery exercise but, where this is not possible, there often remains some reluctance to apply the most recent development unilaterally.

Consequently, many firms’ first implementation of cutting-edge search technology is often after the event – as a quality assurance technique to verify the results of a completed exercise using longer-established automated techniques and linear review.

The value of this approach was underlined recently during the litigation between Google and Oracle, in which privileged material was inadvertently disclosed. The documents concerned did not contain the general counsel’s name or specific wording indicating that they were privileged and they were not picked up by the search strategy employed. However, the documents were very similar to other documents which did contain the right wording. A more sophisticated search process could have identified the similarity between these documents and, in all likelihood, prevented their inadvertent disclosure.

As technology increasingly proves itself in this way, it is likely that more and more companies and their lawyers will look to deploy the full range of review tools at the beginning of the process as well as near the end.

Finally, to ensure their data is as ‘litigation ready’ as possible, organisations should develop an accurate map of where their data is held and the formats it is likely to be stored in. The proliferation of mobile devices, the growth of cloud computing and the increasing number of organisations allowing user-owned devices to access their systems is making this task more important than ever before. Even the most sophisticated search and review techniques can only do so much if the location and structure of the data is a mystery at the outset of a search exercise.

Jérôme Torres Lozano is director of e-disclosure services and Todd Mansbridge is vice president of product management at First Advantage Litigation Consulting (www.fadvlit.com)

Legal News desk contact: editorial@solicitorsjournal.com