The evolution of PC Virus 3

The technical component
The technical component of a malware detection system collects data that will be used to analyze the situation.
On one hand, a malicious program is a file containing specific content. On the other hand, it is a collection of actions that take place within an operating system. It is also the sum total of final effects within an operating system. This is why program identification can take place at more than one level: by byte sequence, by action, by the program’s influence on an operating system, etc.
The following are all ways that can be used to collect data for identifying malicious programs:
treating a file as a mass of bytes
emulating1 the program code
launching the program in a sandbox2 (and using other similar virtualization technologies)
monitoring system events
scanning for system anomalies
These methods are listed in terms of increased abstraction when working with code. The level of abstraction here means the way in which the program being run is regarded: as an original digital object (a collection of bytes), as a behaviour (more abstract than the collection of bytes) or as a collection of effects within an operating system (more abstract than the behaviour). Antivirus technology has, more or less, evolved along these lines: working with files, working with events via a file, working with a file via events, and working with the environment itself. This is why the list above naturally illustrates chronology as well as methods.
It should be stressed that the methods listed above are not so much separate technologies as they are theoretical stages in the continuing evolution of technologies used to collect data which is used to detect malicious programs. Technologies gradually evolve and intersect with one another. For example, emulation may be closer to point 1 in the list if it is implemented in such a way that only partially handles a file as a mass of bytes. Or it may be closer to point (3) if we are talking about full virtualization of system functions.
The methods are examined in more detail below.
Scanning files
The very first antivirus programs analyzed file code which was treated as byte sequences. Actually, "analyze" is probably not the best term to use, as this method was a simple comparison of byte sequences against known signatures. However, here we are interested in the technical aspect of this technology, namely getting data as part of the search for malicious programs. This data is transmitted to the decision-making component, extracted from files and is a mass of bytes structured in a particular way.
A typical feature of this method is that the antivirus works only with the source byte code of a program and does not take program behaviour into account. Despite the fact that this method is relatively old, it is not out of date, and is used in one way or another by all modern antivirus software - just not as the sole or even as the main method, but as a complement to other technologies.
Emulation
Emulation technology is an intermediary stage between processing a program as a collection of bytes and processing a program as a particular sequence of actions.
An emulator breaks down a program's byte code into commands, and then launches each commend in a virtual environment which is a copy of the computer environment. This allows security solutions to observe program behavior without any threat being posed to the operating system or user data (which would inevitably happen if the program was run in the real, i.e. non-virtual environment).
An emulator is an intermediary step in terms of levels of abstraction in working with a program. Roughly speaking, we can say that while an emulator still works with a file, it does analyze events. Emulators are used in many (possibly even all) major antivirus products, usually either as an addition to a core, lower-level file engine or as insurance for a higher-level engine (such as a sandbox or system monitoring).
Virtualization: the sandbox
Virtualization as it is used in so-called sandboxes is a logical extension of emulation. The sandbox works with programs that are run in a real environment but the environment is strictly controlled. The name sandbox itself provides a relatively accurate picture of how the technology works. You have an enclosed space in which a child can play safely. In the context of information security, the operating system is the world, and the malicious program is the rambunctious child. The restrictions placed on the child are a set of rules for interaction with the operating system. These rules may include a ban on modifying the operating system's directory, or restricting work with the file system by partially emulating it. For example, a program that is launched in a sandbox may be fed a virtual copy of a system directory so that modifications made to the directory by the program under investigation do not impact the way the operating system works. Any point of contact between the program and its environment (such as the file system and system registry) can be virtualized in this way.
The line between emulation and virtualization may be a fine one, but it is a clear one. The first technology is an environment in which a program is run (and fully contained and controlled as it runs). The latter uses the operating system as the environment, and the technology merely controls the interaction between the operating system and the program. Unlike emulation, in virtualization the environment is on separate but equal footing with the technology.
Protection using the kind of virtualization described above doesn’t work with the files, but with program behavior – and it doesn’t work the system itself.
Sandboxing - like emulation - isn’t used extensively in antivirus products, mainly because it requires a large amount of resources. It's easy to tell when an antivirus program uses a sandbox, because there will always be a time delay between when the program is launched and when it actually starts to run (or, if a malicious program is detected, there will be a delay between the program's launch and the notification announcing a positive detection). At the moment, sandbox engines are used in only a handful of antivirus products. However, a great deal of research is currently being done into hardware virtualization, which may lead to this situation changing in the near future.
Monitoring system events
Monitoring system events is a more abstract method of collecting data which can be used to detect malicious programs. An emulator or sandbox observes each program separately; monitoring technology observes all programs simultaneously by registering all operating system events created by running programs.
Data is collected by intercepting operating system functions. By intercepting the call to a certain system function, information can be obtained about exactly what a certain program is doing in the system. Over time, the monitor collects statistics on these actions and transfers them to the analytical component for analysis.
This technology is currently the most rapidly evolving technology. It is used as a component in several major antivirus products and as the main component in individual system monitoring utilities (called HIPS utilities, or simply HIPS - these include Prevx, CyberHawk and a number of others). However, given that it’s possible to get around any form of protection, this malware detection method is not exactly the most promising: once a program is launched in a real environment, the risks considerably reduce the effectiveness of the protection.
Scanning for system anomalies
This is the most abstract method used to collect data about a possibly infected system. It is included here as it is a logical extension of other methods, and because it demonstrates the highest level of abstraction among the technologies examined in this article.
This method makes use of the following features:
an operating system, together with the programs running within that system, is an integrated system;
the operating system has an intrinsic “system status”;
if malicious code is run in the environment, then the system will have an “unhealthy" status; this differs from a system with a "healthy" status, in which there is no malicious code.
These features help determine a system's status (and, consequently, whether or not malicious code is present in the system) by comparing the status to a standard or by analyzing all of the system’s individual parameters as a single entity.
In order to detect malicious code effectively using this method, a relatively complex analytical system (such as an expert system or neural network) is required. Many questions arise: what is the definition of “healthy” status? How does it differ from “unhealthy” status? Which discrete parameters can be tracked? How should these parameters be analyzed? Due to its complexity, this technology is still underdeveloped. Signs of its initial stages can be seen in some anti-rootkit utilities, where it makes comparisons with certain system samples taken from a standard (obsolete utilities such as PatchFinder and Kaspersky Inspector), or certain individual parameters (GMER, Rootkit Unhooker).
An interesting metaphor
The analogy of the child which is used in the section on sandboxing can be extended. For example: an emulator is like a nanny that continually watches over a child to make sure s/he doesn’t do anything undesirable. System event monitoring is like a kindergarten teacher who supervises an entire group of children, and system anomaly detection can be compared to giving children full rein while keeping a record of their grades. And in terms of this metaphor, file byte analysis is like family planning, or more precisely, looking for the "twinkle" in a prospective parent's eye.
And just like children, these technologies are developing all the time.

No comments: