Predicting the future of legal IT

26 Oct 2012Feature

Technology often helps the legal profession take a leap forward, and predictive coding could be the next step. Damian Blackburn reports

Following on from last month's column on big data, is another hot topic in legal IT: predictive coding. While the technology that underpins predictive coding has been around for a while, it has only recently been allowed as a tool for lawyers to use in the discovery process, and is undoubtedly the shape of things to come for many law firms.

Predictive coding is a computer assisted method of checking electronic documents for relevant content. This is not a simple word search, but a complex pattern analysis based on intelligent algorithms. If you consider the amount of material generated in electronic format these days it's no surprise that there can be vast quantities of material available to the discovery process in a case. To date, discovery relating to electronic data has concentrated on finding and extracting the material from systems, but not checking the content. And while litigation support technologies have assisted searching on ?a fairly basic level, the selection and review of documents has been a human and more specifically qualified and experienced legal affair.

However the volume that systems are now outputting means that human checking is becoming a more time consuming and expensive option. In many cases, it is likely that material goes unchecked as a result of time or money constraints. In any event, the process of discovery has always been tempered by the notion that the effort and expense should be in proportion to the value of the case itself.

Human checking of electronic material is an expensive pursuit, and as information stores grow, one that will become more ?so over time. Recent studies suggest that human review of data stores is not necessarily the most accurate method of extracting relevant material.

Finesse the results

Much like Google's take on email and document storage, there is more emphasis on the search and ensuing results than there is on hierarchal storage structures. What this means is that huge repositories of documents can be searched without worrying so much about how they are sorted, and most importantly, about how many of them there are. The process allows legal experts to find relevant documentation in vast data stores by way of searching and reviewing search results by employing software with artificial intelligence built in.

The process starts with the collection of electronic data. It may be that you already have a data store (or have been provided with a discovery pack), or you may have to feed data into a store. For documents that are scanned into the data store, optical character recognition (OCR)may need to be applied as part of the process, in order to make documents searchable, and the same goes for PDF and picture format files. Many other forms of stored documents are already in a text form, and readily searchable.

Once you have a complete data set, you load it into or attach it to a predictive coding software solution, and perform the first iteration of searching. This is where you put in a sample of words or phrases you are searching for, and run a search on the data, or a part of the data.

The results of the search should contain a sample of documents that will contain your search terms. Solicitors or other legal experts can then review these documents and grade them for usefulness, or responsiveness in the parlance of the technology vendors.

The scoring also allows the user to score non relevant documents, thus helping to limit the return on the next iteration. By putting in the responsiveness scores and running the search again, the software can, using algorithms and various clever means, return what should be a more meaningful set of results on the second search. By repeating the grading and searching process, you train the software to finesse the results on each iteration. At some point in the process, you should have a set of search parameters and algorithms that provide an optimised set of returned documents. It is likely that you will have run the initial searches on sub sets of the data, especially if the data set is vast. Once you are happy with the results, you can now release it on to the entire data store.

Handful of suppliers

It may sound relatively straightforward, but there are a number of factors that ensure that only a handful of software vendors can supply this type of technology at present. First, the search engines, with their complex search facilities and pattern analysis tools, are owned and more than likely heavily protected by a handful of suppliers.

Second, the potential size of the data store needing to be searched requires scalable software in order to be able to handle the volumes and thus scalable data repositories to store and manipulate them. At present, observers talk in terms of millions of documents, but the way data is being generated, this will only increase.

While it is still early days in terms of adoption, the fact that the software is reasonably mature, and that cloud computing has brought vast data centres and thus storage and processing power with it, the take up could be fairly quick.

Technology often provides a leap forward for the legal profession, and this one could be very significant.