The human mind is a weird thing. It can store and retrieve all sorts of apparently dissociated information in a fraction of a second, and we are at a loss to explain why. But due to some very smart people and years of dedicated research, we are beginning to understand it. I was asking one of my Operators for a hard disk a couple of days ago. I gave him the client name and said "do you have the..." and before I had completed the sentence he had reached onto the rack and pulled out what I wanted. Now imagine being able to teach a computer how to do this. Progress in this area is now very much under way, and we are able to make use of this fascinating technology within the world of computer forensics.
In the world of electronic discovery the forensic litigation examiner has to look for information within the data for forensic evidence gathering. Traditionally the approach here has to been to look for a word or a phrase or series of words and phrases and then find all documents that make reference to those specified search terms. This is inadequate because words are ambiguous, and there are multiple ways of saying the same thing. One study found that highly experienced searchers with expertise in the topics of the material were only 20% successful at retrieving relevant documents because they had a difficult time guessing the right words to search for in each case. Others found that there is very low agreement on how documents should be categorized so categories make for weak discovery mechanisms. The issue here is that this is very imprecise. Many different words can be used to describe the same action - for instance there are 122 words that describe "thinking" - and this is just using English... Then there is the issue of how the word is used or "context." Context is critical in that one needs to understand how a word is used, then its meaning and therefore its value as a word increases dramatically. Software now exists which can do this automatically. Its uses and where it will take us as a society potentially are mind-boggling.
This system learns the meanings of words from the documents that it reads so it will work in any language, and the meanings that it learns are those characteristics of the documents not someone's preconceived notion of what the documents should be about. In actual cases the system has found examples of co-conspirators by returning documents with one conspirator's name in them following a search for another of the conspirators. It has recognized the expansion of acronyms and the meaning of jargon. It does not rely on the analyst's explicit knowledge of the issue but extracts that knowledge implicitly from the documents thereby leveraging the capabilities of the analyst. He or she can spend more time thinking about the information and less time thinking about finding that information. Because it works in any language, it can prove an even more valuable asset in areas where there is a shortage of trained analysts.
Here are just a few examples of how the system turned up alternative suggestions:
- A search for "chemical poisoning streams" turned up documents that discussed arsenic in the ground water. This was enough of a smoking gun to force the case to be settled.
- In a medical case a search for Elavil also turned up documents that discussed Klonopin both of which are anti-anxiety medications. A search for myelopathy turned up additional documents that did not mention myelopathy but did talk about diseases of the spinal cord. And a search for bilirubin turned up additional documents that discussed liver and hepatitis.
- In another case, a search for "cambios" turned up documents about "bancos." The system works across multiple languages.
- In a case about business mergers, the attorneys had already devoted considerable effort trying to think of every which way they could describe the issues of the case. Still the system was able to turn up documents that were about the same topics but used different words. Studies have found that the biggest limitation to using word search to find documents is the fundamental inability to guess all of the different ways people can express similar ideas.
- A search for "budget" also turned up a document mentioning quarterly operating results and one containing a "summary of synergies and efficiencies analysis." A search for "president" retrieved documents about the "chief executive." A search for "merger" turned up documents that described the resulting entity as a "consolidated company."
The point here is that this system thinks along the same lines that a human does. It finds associated information without the need to be trained or coached, much as an average thinking person is capable of doing - only it does so much faster and with far greater accuracy. What it means for the world of computer forensics is that we now have a very powerful search and retrieval tool; one that can get to the heart of the matter quickly and efficiently and with amazing precision.