Efficiently Mining an Opponent’s Discovery


By James P. Harris

In a case involving allegations that an insurance broker misrepresented complex variable annuity products, the firm collected electronically stored information from its client, the plaintiff.  Those files were reviewed and coded for responsiveness, issues and deposition kits.  In short, we had already found the most important files within the client’s documents.  The client’s documents included hundreds of emails between himself and the defendants, so we had a good sense of what the parties said to each other during their relationship.  What we did not yet know is what the defendants said to each other – the internal communications about our client and the products sold to him.  We propounded discovery, which required the defendants to produce all communications about our client.  That data set therefore included the email communications we had already seen, because our client possessed a copy, as well as communications in which our client was not involved.  The task was to quickly identify the documents that were not already in our client’s possession, because those had already been reviewed, and find the key, new documents to use at upcoming depositions.

Focusing on the files produced by the defendants, we used our software, Relativity, to exclude from view emails that were sent to or received by our client.  The system identified files that were duplicative of those already reviewed and those were de-prioritized.  The system then de-duplicated and threaded the remaining documents so that we could focus on the new, unique documents.  This was accomplished with a few targeted searches and few clicks of the mouse.

We then used conceptual analytics to group all documents in the system into clusters.  This tool compared documents by substance, analyzing the frequency with which terms appear together in documents.  The software then highlighted within each substantive cluster the documents we received from our client that we had already marked as “hot” or placed into a deposition kit.  The system provided a graphical representation of the concentrations of important documents.  For clusters with a high percentage of important documents, the software identified those documents that were conceptually similar to a previously-marked important document, but had not yet been reviewed or coded.  This shortcut provided us with yet unreviewed documents with similar substance to documents we had already identified as important so we could examine these new documents to see if they too were important.

These two processes allowed us to find twenty-five new documents from the defendant’s production that were important enough to mark “hot” or place into deposition kits.  All of this was done in about two hours of attorney time.  A linear review of each and every file produced by the defendant, assuming one could review a file a minute, would have taken more than fifteen hours to accomplish.  We were able to mine the new data and save our client approximately 80% of what it would have cost to review the production without leveraging our technology and skill.


James P. Harris is a shareholder at Sheehan Phinney Bass & Green PA. He is a member of the Data Breach and Business Litigation Groups. Harris my be reached at jharris@sheehan.com or 603.627.8152.

This article is intended to serve as a summary of the issues outlined herein. While it may include some general guidance, it is not intended as, nor is it a substitute for, legal advice.

ADVERTISEMENT – This electronic publication is labeled advertisement in compliance with Federal Law and may be considered advertising under the ethical rules of certain jurisdictions.