Monday, June 30, 2008

Whitelisting is the next snake oil

Most people experienced in computer security know that ‘signatures’ are the dominant technology used to combat malware. Signatures – short descriptions of otherwise large binaries, are extremely effective at detecting specific, known programs and documents. They are perfect for scanning the enterprise for known malware, known insecure software, known intellectual property. They are the cash cow of the anti-virus companies.

There are two approaches to signatures – blacklisting and whitelisting. The idea is simple – signatures of known bad stuff is a blacklist, signatures of known good stuff is a whitelist. Blacklisting has been the preferred method for AV over the last decade. Blacklisting has the benefit of near-zero false positives – something customers expect. Blacklisting also keeps the customers coming back – new malware means new signatures – perfect for recurring revenue models for vendor’s balance sheet.

Blacklisting sounds ideal, but it doesn’t work. New malware emerges daily that has no corresponding blacklist signature. The malware must first be detected, and then processed. There is always a time window where Enterprises have no defense. Recent figures suggest that the AV vendors are falling so far behind this curve that they will never catch up with the deluge of new malware arriving daily. It can take weeks for a signature to become available.
This deluge of new malware is due to several factors. First, there is more money behind malware development than ever before. Second, we weren’t really that good at capturing malware in the past. Today, new malware can be automatically collected, without human intervention. The slow trickle of malware turned into a flood as honeypot technology emerged. Sensor grids can obtain new malware samples with efficiency - they automatically ‘drive by’ (aka spidering) malicious websites to get infections and leave open ports on the ‘Net so automated scanners will exploit them. In parallel to the automated collection efforts, cybercrime has risen to epic levels. Finally, the barrier to entry has dropped for the cyber criminal. Cyber weapon toolkits have become commonly available. Anti-detection technology is standard fare. New variants of a malware program can be auto-generated. A safe bet is to expect thousands of new malware to hit the Internet per day.

The flaw in blacklisting has been exposed – it cannot address new and unknown malware threats. Figures range, but a safe claim is that 80% of all new malware goes undetected. This isn’t just a minor flaw; it’s a gross misstep in technology. Blacklisting is, and always has been, snake oil.

Enter the whitelist. The whitelist seems like a natural response to the “new and unknown malware” problem. Anything that is not known to be good will be considered suspicious, or possibly bad. Sound familiar? Whitelisting is not new, of course. Programs like “Tripwire” were in the market in the 90’s – and proven not to work. I founded rootkit.com originally to disprove the entire concept of OS-based whitelisting.

I agree with the idea that “suspicious” is good enough to warrant a look. This is smart thinking. Whitelisting is the solution.

There is a lot more “not-known-good” in the Enterprise than actual malware. Obviously the Enterprise cannot afford the additional workload caused by “false positives”. So, racing to catch up are the whitelist vendors – to remove all the “noise” so the staff can focus on the signal. Millions of dollars are already being invested into whitelisting files – and there are solid technical reasons this doesn’t work.

Whitelists are based upon files on disk. A whitelist, in current industry terms, means a list of the MD5 sums for files ON DISK. Please understand that files on disk are not the same as files in memory. And all that matters is memory. When a file is LOADED into memory, it CHANGES. This means on-disk MD5 sums do not map to memory. There are several reasons memory is different:

1) Memory contains much more data than the on disk file
2) Memory contains thread stacks
3) Memory contains allocated heaps
4) Memory contains data downloaded from the Internet
5) Memory contains secondary or tertiary files that were opened and read
6) Memory contains data that is calculated at runtime
7) Memory contains data that is entered by a user

All of the above are not represented by the file on disk. So, none of the above are represented by the whitelist MD5 sum. Yet, when the file hash on disk passes for white-listed, the running in-memory file is considered whitelisted by proxy. This is where the whole model breaks down. In memory, there are millions of bytes of information that are calculated at runtime – they are different every time the program is executed, the DLL is loaded, or the EXE is launched. These bytes are part of the process, but unlike the file on disk they change every time the program is executed. Therefore, they cannot be whitelisted or checksummed. This data can change every minute, every second of the program’s lifetime. None of this dynamic data can be hashed with MD5. None of this dynamic data is represented by the bytes on disk. So, none of it can be whitelisted.

While an executable file on disk can be whitelisted, well over 75% of that program cannot be whitelisted once it’s actually running in memory. This missing 75% can easily contain malicious code or data. It can contain injected code. It can contain booby-traps in the form of malicious data. It can represent an injected thread. The assumption that an on-disk whitelist match means that this dynamic data is ‘trusted by proxy’ is absurd. Yet, this is what the whitelisters want us to believe.

For malware authors, the whitelist is a boon. It means that a malware author only needs to inject subversive code into another process that is whitelisted. Since the whitelist doesn’t and cannot account for dynamic runtime data, the malware author knows his injected code is invisible to the whitelist. And, since the process is whitelisted on disk, he can be assured his malware code will also be whitelisted by proxy. So, in effect, whitelisting is actually WORSE than blacklisting. In the extreme, the malware may actually inject into the desktop firewall or resident virus scanner directly as a means of obtaining this blanket of trust.

The mindset that “suspicious is good enough to warrant a look” is a step in the right direction. But, whitelisting is not the correct approach. The only way to combat modern malware is to analyze the physical running memory of a system. In memory will be found the indicators of suspicion, and this is much more like a blacklist than a whitelist – except that it’s generic and based on the traits and behaviors of software, not hard signatures. For example, there are only so many ways you can build a keylogger. Once you can detect these traits in runtime memory, you are going to detect the keylogger regardless of who wrote it, what it was compiled with, what attack toolkit was used, or what it was packed with. As a security industry we need to stop climbing uphill with traditional approaches proven not to work. We need to change the fundamental way we do things if we are going to win.