eDiscovery Daily Blog
Sometimes, the Data You Receive Isn’t Ready to Rock and Roll: eDiscovery Best Practices
Having just encountered a similar situation with one of my clients, I thought this was a topic worth revisiting. Just because data is produced to you, it doesn’t mean that data is ready to “rock and roll”.
Here’s a case in point: I once worked with a client that received a multi-part production from the other side (via another party involved in the litigation, per agreement between the parties) that included image files, OCR text files and metadata (yes, the dreaded “load file” production). The files that my client received were produced over several months to several other parties in the litigation. The production contained numerous emails, each of which (of course) included an email sent date. Can you guess which format the email sent date was provided in? Here are some choices (using today’s date and 1:00 PM as an example):
- 09/11/2017 13:00:00
- 9/11/2017 1:00 PM
- September 11, 2017 1:00 PM
- Sep-17-2017 1:00 PM
- 2013/09/11 13:00:00
The answer: all of them.
Because there were several productions to different parties with (apparently) different format agreements, my client didn’t have the option to request the data to be reproduced in a standard format. Not only that, the name of the produced metadata field wasn’t consistent between productions – in about 15 percent of the documents the producing party named the field email_date_sent, in the rest of them, it was simply named date_sent.
What a mess, right?
If you know how to fix this issue, then – congrats! – you can probably stop reading. Our client (both then and recently), didn’t know how. Fortunately, at CloudNine, there are plenty of computer “geeks” to address problems like this (including me).
In the example above, we had to standardize the format of the dates into one standard format in one field. We used a combination of SQL queries to get the data into one field and string commands and regular expressions to manipulate dates that didn’t fit a standard SQL date format by re-parsing them into a correct date format. For example, the date 2017/09/11 was reparsed into 09/11/2017.
Getting the dates into a standard format in a single field not only enabled us to load that data successfully into the CloudNine platform, it also enabled us to then identify (in combination with other standard email metadata fields) duplicates in the collection based on those metadata fields. As a result, we were able to exclude a significant percentage of the emails as duplicates, which wouldn’t have been possible before the data was converted and standardized.
Over the years, I’ve seen many examples where data (either from our side or the other side) needs to be converted. It happens more than you think. When that happens, it’s good to work with a solutions provider that has several “geeks” on their team that can provide that service. Sometimes, having data that’s ready to “rock and roll” takes some work.
So, what do you think? Have you received productions that needed conversion? If so, what did you do? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.
CloudNine empowers legal, information technology, and business professionals with eDiscovery automation software and professional services that simplify litigation, investigations, and audits for law firms and corporations.