Knowing what to ask for in an Electronic Discovery Request is the biggest hurdle. Since you only get what you ask for, the request has to be very specific. For example, asking for all files will get you only that. The potential problem here is that all files can be interpreted as merely data files. In order to fully interpret data files you need to know what type of platform the files come from (mainframe, mini, network and/or microcomputer) file format, associated operating system and the application that the data was developed in. All of these questions will be addressed in this chapter.
The majority of modern data files are in a standard format called ASCII (American Standard for Information Interchange). This is a standard code that assigns a number to each of the letters of the alphabet, numeric characters and special characters like a dollar sign, tilde, carriage return, line feed, tab and etc. To better explain this, these standard codes equate to a numeric equivalent so that the computer can assign a discernable value and interpret and logically process the characters
Prior to ASCII, the most popular character set was EBCDIC. EBCDIC was developed by IBM in the early1960s, was derived from punch card codes and is still today the preferred character set for many mainframe and mini computer systems.
COTS (Commercial off the Shelf) and GOTS (Government off the Shelf) represent the various applications that are available such as a word processor, database, spreadsheet, project management and graphics generator are examples of applications available. Data files are always created/associated to some type of application, this would be the program that interprets/makes sense of the data (such as Microsoft Project, which is project management software or Oracle, which is a relational database program or Word Perfect, which is a word processor). Custom built applications, for specific solutions will also show up. The reason someone would build a custom application is because they could not find a COTS Application that fit there needs. Custom built applications are normally very focused to solve a specific problem, are generally much more expensive than a COTS application and can be at times difficult to understand because they are created with very specific usage in mind.
It is possible to interpret data without rebuilding the actual transaction system, from where the case related data came. The critical path in determining if you need to rebuild an original transaction system goes from fairly simplistic to sophisticated attempts at reading the data. The cost is similar from inexpensive to very expensive. First try to read a working copy of the data (since you never work with original received copies of electronic data). If the data has been copied, it means that you had access to a machine that could read the media (but not necessarily interpret the data). If the data can be read, but does not seem to make any sense, this is because the data is associated to a specific application. Random text could represent everything from email messages with header and footer information (which is easily interpreted using any number of text editors and or word processors) to a spread sheet or relational database which would be very hard to interpret without an application(s) with at least similar functionality (and even then it might not work)
By far the most expensive approach in recognizing data is to rebuild the original transaction system. What this means is set up a system exactly in tack, as it was when it was processing case related information. For example, if in Electronic Discovery you are turned over the electronic data associated to a project management system from 1980 that resided on an IBM Mainframe computer, rebuilding that transaction system would involve acquiring (lease or buy):
- IBM Mainframe or Compatible
- Mainframe Operating System
- Application Programs
- Associated Data Files
- Configuration Information
Acquiring expertise to setup, run and interpret the system and associated data.
Hire an Expert
At points, understanding what legacy systems (older computer systems) were actively used during a case equates to a history lesson in automation. It’s one thing to receive all the Electronic Discoverable Data, but the value comes from interpreting the data. As you would use a subject matter expert for case supporting testimony, a computer expert can very much expedite the potential value contained within case related magnetic media.
The software upgrade trend during the 80s and into the 90s (and in some cases it still holds true today) was upward compatibility. This means that if you have a copy of an
application, say Widget 4.0, and you purchase a copy of Widget 5.0, this latest version would be able to read the 4.0 files as well as the newly created 5.0 files but, 4.0 cannot read the 5.0 files. This means there is no downward compatibility. The reason this happens is that version 5.0 has significant new bells and whistles that 4.0 doesn’t have. 5.0 can bring a 4.0 file in and embellish it with it’s new features, but 4.0 doesn’t comprehend the upgraded functionality of 5.0. Many of the newer programs allow for downward compatibility by recognizing common application denominators and core functionality.
Applications are always associated with an operating system for example Windows 98, NT (New Technology with network capability), CPM (Control Program for Microprocessors for older Z80 processor based machines) for microcomputers or VM, MVS or VAX for mainframe computers. It is possible to interpret some data files without an associated application, such as simple text, but this can lead to potential interpretation/context problems. It is very difficult to interpret relational database data files or project management data files without their associated applications
Chain of Custody is a very important aspect of any piece of evidence. This offers insight as to original piece of media received and ownership. Any media that is to be analyzed needs to be first copied. The original copies that you receive in Discovery should never be modified in any way and should be stored in a safe place. A working copy of Discovery media needs to be created so that if any problems come up, you can still go back to the original to make another working copy. The original needs to be kept in tact for authenticity purposes as well. If by chance you are able to find a “smoking gun” piece of evidence, you will have to demonstrate the Critical Path/Audit Trail used if the authenticity of the data is challenged. Simply stated, if challenged, you need to show the process of how you came up with the supporting data (without any crossing any pre- existing stipulations).
For example, lets say you if you have 20 large EBCDIC data files on a data cartridge. For starters you will need to find a computer that has the capability of reading the cartridge. One of the files on the cartridge is unformatted and contains 10,000 characters all on one line, this might appear to be useless, when if fact the file has 37 e-mails. Each email contains header information (alpha and numeric characters which appears to make no sense, but are part of the email tracking system) email body (which for example email 31 contains the “smoking gun” evidence) followed by footer (which contains more nonsensical data). This file can be formatted to 80 characters per line, each followed by a carriage return, thus giving you 1250 lines at 80 characters each (roughly 27 pages).
Initially you have your options of using that file as is, introduce more formatting so that the document is easier to read, using a search utility that is on the current EBCDIC based system and/or converting to ASCII for further manipulation in a more modern repository.
Whatever methods are used, be it sophisticated conversion or simple formatting, they have to be well documented, in case the media is challenged in court.
The following are all considerations, with explanations, that you should address when crafting an Electronic Discovery Request.
(Mainframe-Micro, Networked or Standalone)
The hardware platform needs to be known so that in a worst-case scenario, the original transaction system can be rebuilt (this is typically one of the most expensive choices). In a best-case scenario, the hardware platform is fairly modern and readily available, such as a Pentium 100 PC running Windows 95.
In mainframe/mini environments, the systems are likely to seem archaic and with levels of sophistication that are all over the place (phenomenal to lousy). To know that a mainframe/mini was the system in use, means that the character set used was probably EBCDIC which is by far the most popular mainframe/mini character set. Other character sets did exist because early on, proprietary approaches to technology seemed more important than standardization (the reverse is more popular today).
The good news is that EBCDIC data can be converted to ASCII fairly easily. FTP (File Transfer Protocol) is a program that among other things can convert EBCDIC to ASCII via a filter option. The hard part is finding a mainframe and/or computer drives that can read the mainframe media. A company called Convert It in Campbell, CA has an extensive collection of older systems and drives and offers a service where they will convert from EBCDIC to ASCII and copy the data to CDs (thus making use of modern tools possible) or ASCII (legacy/older media type) to ASCII (modern media) Stanford University has a similar shop setup.
The two most popular micro computers (in order) are the IBM/DOS (Disk Operating System)/Windows (also often called an IBM compatible) and the Apple/Macintosh. There are other microcomputers, but the IBM/Compatible (Intel Processors) and the Apple/Macintosh (Motorola Processors) by far are the majority. Prior to 1981 (which is the year that the IBM PC came out), the most popular business oriented micro computers contained a Z80 Processor (brain of the computer) and used CPM as their operating system.
Electronic/Magnetic Files Types
(All media types to include; text, pictures, audio, video and graphics).
Electronic/Magnetic media comes in various forms and all need to be interpreted for possible evidentiary value. In discovery it is almost as important to know what out of a evidentiary collection doesn’t affect your perceived outcome. In other words it is necessary to touch, view and interpret everything that is turned over to avoid potential surprises.
Related Documentation/SOPS (non-privileged)
Any type of documentation is very valuable in trying to understand the function of an application and/or the data contained within. Well-documented applications makes the interpretation that much easier. Standard manuals exist for just about every COTS/GOTS application ever produced. Custom written applications will generally have at least SOPs (Standard Operating Procedures) and possibly documentation, but not always. The best practice in developing a custom application is to document everything, but as budget and deliverable deadlines approach on a project, the last thing often addressed, especially in a crunch situation is documentation.