In the world of Computer Forensics, the forensic litigation examiner is often presented with a large box containing a variety of tape media types recorded using unknown backup software, and the history and contents of this media are typically totally undocumented. The examiner's job is to accurately extract data from the tapes and to examine it for evidence. The types of data that might want to be examined are e-mail, accounting records, images, correspondence, other database type structures, etc. When examined as a whole, the data often leads to a complete snapshot of what an individual or business was doing at the time the backups were made. This article discusses the specific challenges of tape forensic analysis and how to extract the crucial data from a tape.
At the simplest level, to find out what is on a tape, it helps to be able to read the tape correctly. There are many different types of tape and while some look the same, the drives used to read them are different and so understanding the generational differences of media and the associated hardware is essential. For instance reading a DDS-4 tape is not possible in an (older generation) DDS-3 drive. While the tape will fit in the drive, the media is of a higher density than that supported by the drive. So matching the tape to the correct hardware is usually the first obstacle to overcome, and once you have overcome that you are ready to find out more about what's inside.
Data on a tape is laid out in a "format". A format can be thought of as being similar to a language. It has a specific syntax and set of rules and a definite logical layout. In computer terms we refer to nibbles, bits, bytes, words, long words, records, blocks, files, end of file markers, etc. These are all terms that you need to become familiar with when determining how to crack out data from a tape. A tape format expert will do what is called a "tape dump" where the contents of the tape are viewed in a raw format on the screen which in turn allows the format to be verified. This is usually as "deep" as someone needs to get into a tape. If need be software is written to access the tape, the format cracked and the data extracted.
Cracking a format is an interesting challenge. The secret is recognizing the patterns within the data. A format is normally structured so that it has set-up information encoded in it designed to tell the software reading it what to expect. The reading program then knows what to process and what to do with it. Some formats contain file names, dates, block sizes, etc, while others are more or less purely numerical. Text data within these formats can be encoded in ASCII (typically written by PC's & Unix machines) or EBCDIC (Mainframes, older technology). These are internally represented as numerical codes, and they are easily translated so that the data can be read back like text when viewing a tape dump.
If we look in the corporate arena, which is commercial tape & disk formats, we are primarily concerned with electronic document storage. These can be backup tapes, interchange formats and so on. Ultimately they contain files generated by a software package whether it be word processing, databases, spreadsheets, graphical images, etc. The issue one faces when reading these tapes is whether or not you access the original software that created the tapes. If you do not, then reading the tape becomes a real challenge.
The job of performing tape forensics analysis allows one to be able to take a tape and get the data off in a fast, logical, cohesive and accurate fashion. At the simplest level, someone might want to know what the names of the files are on a tape and when these were created. The tape is placed in a drive and read with the metadata being output to a file. This is known as tape logging and helps create a timeline for the tape contents as well as indicating the dates, names and types of files present. An investigation may involve hundreds of tapes and having the ability to log each tape and then bore down into those logs looking for certain details is very useful. The investigator can then just restore the files needed rather than the whole save set, which could be Terabytes of data, saving them many days and vast expenses.
At a more complex level, one might want to go to a business and electronically capture every bit and byte on every disk on every computer at the office. This is typically done by making a series of disk images to tape. The tapes would then be taken to a remote site where they could be restored to other disks, and thus recreating the working environment at the original site. Analysis can then begin, digging down through the various layers of data looking for signs of fraud and incriminating evidence.
Digging down forensically into a hard disk is a lot easier than working with a tape. In most situations when you are reading a tape and you hit end of data, then that is it. There are no "old or earlier versions" stored on tape. The contents of a backup tape are a snapshot in time so if you need an older version of the file you need to read an earlier tape. It is generally not possible to rewrite a section of data on a tape so any file you read cannot be newer than the tapes creation date. Reading beyond end of data sometimes produces useful results, but again you require the proper software tools to help evaluate and extract the contents and this is usually not possible (without specialized hardware). So with these limitations in place, tape is actually a great tamper proof backup medium.
The problem with tapes is that there are no format layout standards and as a result there are numerous formats and many variations within these formats. Developing your own code to read these tapes therefore becomes a very major issue and is best left to specialists in the field as it generally requires a restoration program to read tapes of an unknown origin and format.
Tape restoration programs give users access to data written on usually obsolete or non-accessible hardware. In most cases the user does not have to know anything about the tape, you just put it in the drive, tell the program to restore it and it does the rest. They are also used for data interchange between hardware platforms. For example if you want to read a VAX/VMS backup tape on a PC you can do this; likewise if you want to read and restore an older Backup Exec dataset and convert it to TAR for use on a UNIX machine, it also is a simple task. Another features you might look for is the ability to look at the raw data layout of a tape on a byte-by-byte basis and maybe investigate what is beyond end of data, all interactively. Remember the examiner's job is to accurately extract data from a box of unknown tape formats and to examine them for evidence. It is no easy task, but with the right tools, you can discover any data on a tape.