Monday, March 9, 2009

Digital DNA - Numerical Expressions to Describe Malware Behaviors

HBGary unveiled Digital DNA today at the Infosec Conference in Orlando. (I wasn't able to make it down to the show, although I had planned to be there. Last minute stuff and I had to jet back to the West Coast.) The engineering team has been working on Digital DNA for months. In a nutshell, we have automated the reverse engineering of loaded modules in the physical memory snapshot and generate Digital DNA (DDNA) based on the collected data (millions of data points). All of these data points are codified in way that allows them to be matched against rules. The Digital DNA system will "sequence" a software program or document and generate trait-codes based on the behaviors and schematic artifacts found in the software or document. Each trait has a complex rule (think regular expression with boolean logic) associated with it, and if the rule matches the trait is considered "expressed". Expressed traits are concatenated together to make a "sequence". We chose to do it this way because the final DDNA sequence looks and smells like a hash, even though it's not actually a hash at all. But, customers are used to managing hashes, thinking about hashes, and cut-n-pasting hashes - so a hash it would be.

Digital DNA is based on the reverse engineered behaviors, not the specific compilation or packer used with the malware. You can pack the same malware with three different packers and it will still produce the same Digital DNA. Two similar programs will produce similar DDNA. Here is an example of two versions of Rustock.B.

Interestingly, the technology can identify digital objects. Here is an example of tracking Intellectual Property with it.

Digital DNA is a Big Idea. For now, HBGary is going to focus it on detection of zero-day malware threats. We have over 2,000 traits in the DDNA genome currently, and will probably have many more soon. We sort all the traits into Factors, Groups, and Subgroups, defining a "genome" of behaviors that are common to malware. This part plays into a weighting system. I will blog more about this over the coming weeks - dinner is calling.