Maura R. Grossman of the University of Waterloo: eDiscovery Trends 2018

April 2, 2018

This is the eighth of the 2018 Legaltech New York (LTNY) Thought Leader Interview series. eDiscovery Daily interviewed several thought leaders at LTNY this year (and some afterward) to get their observations regarding trends at the show and generally within the eDiscovery industry.

Today’s thought leader is Maura R. Grossman. Maura is a Research Professor in the David R. Cheriton School of Computer Science at the University of Waterloo and principal of Maura Grossman Law. Previously, she was Of Counsel at Wachtell, Lipton, Rosen & Katz, where she pioneered the use of technology-assisted review (TAR) for electronic discovery. Maura’s research with Gordon V. Cormack has been cited in cases of first impression in the United States, Ireland, and (by reference) in the United Kingdom and Australia, approving the use of TAR in civil litigation. Maura has served as a special master in the Southern District of New York and the Northern District of Illinois to assist with issues involving search methodology. In 2015 and 2016, Maura served as a coordinator of the Total Recall Track at the National Institute of Standards and Technology’s Text Retrieval Conference (TREC); in 2010 and 2011, she served as a coordinator of the TREC Legal Track. Maura is also an Adjunct Professor at Osgoode Hall Law School at York University and at the Georgetown University Law Center. Previously, she taught at Columbia Law School, Rutgers Law School—Newark, and Pace Law School.

I thought I’d start by asking about a couple of cases related to Technology-Assisted Review. The first one that we talked about that I’d be interested in your thoughts about is FCA US v. Cummins where the judge (Avern Cohn) stated a position that applying TAR without prior keyword culling is the preferred method (although he said he did it rather reluctantly). What are your thoughts about that decision? What are your thoughts in general about courts making rulings on preferred TAR approaches?

I don’t believe there is a black or white answer to this question. I think you have to consider how things might be done in a perfect world, and how they might be done in the real world we live in, which includes time, cost, and burden for ingesting and hosting large volumes of data. In a perfect world, you wouldn’t perform multiple, sequential culling processes because each one removes potentially relevant information, and that effect is multiplied. So, if you do keyword culling first—let’s say you do a good job and you get 75% of the relevant data, which would be pretty high for keywords—then, you apply TAR on top of that and, again, you get 75% of the relevant data. For each of the steps you apply, those steps would sequentially reduce the number of relevant documents you found, resulting in a combined result of about 56% of the relevant data.

In a perfect world, you wouldn’t do that, and you would just put all the data into the TAR system and get 75% recall. By “all” of the data, I don’t mean the entire enterprise, I mean all of the appropriate data—the appropriate time frame, the appropriate custodians, and so forth. That’s in a perfect world, if you wanted to maximize your recall, and there were no time and cost considerations. But, we don’t actually live in that world most of the time. We live in a world where there generally are time and cost constraints, and per-gigabyte loading and hosting fees. So, often, parties feel the need to reduce the amount of data they are loading into a review platform to reduce processing and hosting fees, and other potential fees as well. Therefore, parties often want to use keywords first in the real world. Ultimately, you have to look at the specifics of the matter, including the volume of data, the value and importance of the case, and what’s at stake in the matter.

You also have to look at how effective keywords will be in retrieving information in the particular case at hand. Keywords may work very well in a case where everything is referred to by its “widget number,” but keywords may not work as well in certain types of fraud cases where parties don’t always know the specific language used, or what the nature of the conspiracy was. Keywords can be more challenging in those situations. So, you really have to look at the whole picture.

In FCA v. Cummins, I think what the judge was trying to say was that generally, in a world without any of these other considerations, the best practice would be not to cull first. I would tend to agree with that from a scientific and technical perspective. But, that’s not always practical.

Also, I believe it was in Rio Tinto v. Vale where Judge Peck said (in dicta) that in a perfect world, if there were no cost or time or any other considerations, you would take all the data and you would just use the best TAR system you could, and you would be more likely to find the highest number of relevant documents. But that can drive up costs, and that may not be not proportionate in all cases. So, it’s really a question of proportionality, and what will work best in each situation, and how much time and resources you have to iterate and test the keywords, and other related factors.

Also, as you know in FCA v. Cummins, the judge didn’t really go into much detail; it was a very short decision. Maybe it was a relatively small data set and loading it all didn’t make much of a difference. We just don’t know enough about the facts there.

I got the impression that this case might have involved two equally weighted parties, with equal amounts of data, so the judge may have felt that the parties needed to perform TAR the same way, so he felt he was forced to make a decision. Do you think that would have an impact as to why a court might decide or not?

I think that where the data volumes (and therefore burdens) are symmetric, there tends to be an understanding that what’s good for the goose is good for the gander. Parties in those circumstances tend to be more circumspect about what they demand because they know they’ll be subject to the same thing in return. If I’m representing one party and you’re representing the other, and I ask for everything in native form, I’m probably not going to be able to turn around and argue that I don’t want to produce in native, too, unless I have an awfully good reason for that.

So, I do think that changes the landscape a little bit. Parties tend not to ask the other side for things that are unduly burdensome if they’re going to be forced to provide those same things themselves. It can be very different when one side is using TAR and the other side isn’t, and when motivations or incentives are not aligned. That can affect what parties request.

Another case we talked about was Winfield v. City of New York, and one of the key aspects of the objections by the plaintiffs about the defendants’ TAR process was the process of how they had been designating documents as non-responsive. What are your thoughts about that? Do you think arguments that the subjectivity of the subject matter experts will come into play in more and lead to objections in other cases?

Most of the research that I’ve reviewed and most research that I’ve done with Gordon Cormack has suggested that a few documents coded one way or the other are highly unlikely to make a substantial difference in the outcome—at least for CAL algorithms, and even for most robust SAL algorithms. I know that people like to say, “garbage in, garbage out,” but I’ve never seen any evidence for the proposition that TAR compounds errors made by reviewers, and there is some evidence that TAR can mitigate reviewers’ errors. The results of most TAR systems appear to be satisfactory unless there are pretty significant numbers of miscoded documents, for example, in the 20 to 25 percent (or higher) range. Of course, if you’ve coded all of the documents of a particular kind as “not relevant,” you’ve now taught the algorithm not to find anything like that. Chances are, though, if you have multiple reviewers, whether contract attorneys or even junior associates, not everything will be marked the same way. There’s going to be a fair amount of noise in the coding decisions, and that doesn’t seem to have a major impact on the results.

With some of the early TAR systems, a lot of commentators said that it had to be a senior partner, or an expert, who trained the system. But, most of the research that we’ve done, that Jeremy Pickens at Catalyst has done, and that others have done, suggests that a little noise from junior reviewers or contract attorneys, who may be a bit more generous in their definition of relevance, actually yields a better algorithm than a senior partner, who may have a very, narrow view of what’s relevant. The junior people, who are more generous in their conceptions of relevance, tend to train a better algorithm—meaning a system that will achieve higher recall—in the long run. So, a little bit of noise actually doesn’t hurt.

I wasn’t particularly surprised that in Winfield there were a few documents that were “arguably relevant,” and about which the two sides disagreed on coding. That’s going to happen in any matter, and that’s not really going to affect the outcome one way or the other, because those documents are marginal in the first place. Certainly, if someone is systematically marking all the “hot” documents as non-responsive, that will make a difference, but that wasn’t what was going on there.

In Winfield, the Court said, the documents were marginal, and arguably were relevant. The Court also said it had reviewed the training process in camera and there was nothing wrong with it. Most of the case law says that a party shouldn’t get discovery-on-discovery unless there’s a material flaw in the adversary’s process. If you look at the position taken in the Defense of Process paper that The Sedona Conference published, then that should probably have been the end of the discussion. But, the judge in Winfield went one step further, and said, “Well, because there’s some evidence that there may have been some disagreements in coding, I’ll permit a sample.” That’s a little scary to me, because if there was no material deficiency, and we’re talking about a few marginal documents with coding where people disagree, that’s going to occur in every single case.

When you open up the collection to sampling, what happens is that the parties will find more marginal documents where they disagree on the coding. That often leads to a lot of sparring, as we’ve seen in other cases where the parties disagree about marginal documents and fight about them, and that just drives up cost. In the long run, those are not the documents that make or break the case.

We’re at a point where more and more people are using TAR, but a lot of people still haven’t really embraced TAR yet. For those people who have not really gotten started with it, what would be your advice on how they could best get started on learning and applying TAR in their cases?

I would suggest they play with it, and try it out on a set of data that they have already completely manually reviewed and thoroughly QC’d, and where they are confident that everything was well done. Use that to test some of the different tools out there before you have a live matter you want to use it on, so that you don’t have to decide what tools work and don’t work while you are in a crisis mode, when time is of the essence.

It would be helpful to do that homework, and develop a good understanding of the different work flows, and the different tools, and what kinds of data they work better or worse on. For example, some are better with spreadsheets than others, some are better with OCR text, and others are better with foreign language or short documents. Ideally, counsel would do that homework beforehand, and know something about the different tools that are available and their pros and cons.

If they haven’t done that, or feel they can’t, then I would encourage people to use it in ways that don’t impact defensibility considerations as much. For example, they can use it on an internal investigation, or on incoming data, or simply to prioritize their data for review—even if they plan to review it all—so that they can start reviewing the most-likely responsive documents first and work down from there.

There are also many uses for QC, where the algorithm suggests that there may be errors. Look at the documents that the TAR system gave a high score for relevance that the reviewers coded as “not relevant,” or vice versa. There are many uses that don’t implicate defensibility where people can still try TAR, see how it works, and get comfortable with it. Usually after people see how it works and see that it’s effective—if they’re using a tool that actually is effective—it’s not a hard sell after that. It’s that first step that’s the hardest, and that’s why I encourage people to do the testing before they’re in a critical situation, before they have to go in front of the court and argue whether they can use it, or not use it.

What would you like our readers to know about things you’re doing, and what you’re working on?

I continue to do research on TAR tools and processes, and on the evaluation of TAR methods. Gordon Cormack and I are “heads down” doing a lot of work on those things. One area that we’ve been addressing recently is the notion that some people have been saying that a CAL process can only be used if you’re actually going to put eyes on every document. Because of that, some people prefer the SAL approach because it can give them a fixed review set. There is a method we’ve written about, and for which we’ve filed a patent, called S-CAL. We’ve been doing a lot more work in that area to help parties get the benefits of CAL, but still be able to have the predictability they want of knowing exactly how many documents they’re going to have to review, so they can know how many reviewers they need, how long the review will take, and what it will cost. Our aim is to be able to do that using a form of CAL, but also to be able to provide an accurate estimate of recall and precision.

That’s one area of research we’re working on. I’m also becoming increasingly interested in artificial intelligence and the legal, ethical, and policy issues it implicates. Last semester, I taught the first course (that I’m aware of) that brought together 18 computer science graduate students, and 15 law students, to explore different areas of artificial intelligence and the legal, ethical, and policy issues associated with them. For example, we looked at autonomous cars, we looked at autonomous weapons, we looked at relationships with robots, and we looked at what to do about job loss. We looked at data privacy, and the concentration of vast amounts of personal data in the hands of a small number of private companies. We looked at predictive policing and use of algorithms to predict recidivism in the criminal justice system, and it was a really, really interesting experience, bringing both of those groups together to do that. I’ve been focused a little more in that area, as well as continuing my information retrieval research and other research in collaboration with Gordon and my other colleagues and students at the University of Waterloo. And, of course, Gordon and I work on TAR matters. I still do consulting, expert work, and serve as a special master, and I really love that part of my job.

Thanks, Maura, for participating in the interview!

As always, please share any comments you might have or if you’d like to know more about a particular topic!

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

eDiscovery Daily Blog