sosp2005_b.doc

Data Protection and Rapid Recovery From Attack With A Private File Server

Abstract

When a personal computer is attacked, the most difficult thing to recover is personal data. The operating system and applications can be reinstalled returning the machine to a functional state, usually eradicating the attacking malware in the process. Personal data, however, can only be restored from private backups – if they even exist. Once lost, personal data can only be recovered through repeated effort (e.g. rewriting a report) and in some case can never be recovered (e.g. digital photos of a one time event). To protect personal data, we house it in a file server virtual machine running on the same physical host. Personal data is then exported to other virtual machines through specialized mount points with a richer set of permissions than the traditional read/write options. We im-plement this private file server virtual machine using a modified version of NFS server installed in a virtual ma-chine under various virtualization environments such as Xen and VMWare. We demonstrate how this architecture provides protection of personal data as well as rapid recovery from attack. Specifically, we demonstrate how an intrusion detection system can be used to stop a virtual machine in response to signs of compromise, checkpoint its current state and restart the virtual machine from a trusted checkpoint of an uncompromised state. We show how our architecture can be used to defend against 5 of the 9 new viruses posted on CERT since April 21, 2004. We also demonstrate that by placing the user’s applications in a virtual machine rather than directly on the base machine we can provide near instant recovery from even a successful attack. Finally, we quantify the overhead costs of this architecture by running a series of benchmarks on both Windows and Linux in the base machine as well as on an NFS partition mounted in a virtual machine.

1. Introduction

Worms and viruses have entered the consciousness of the majority of personal computer users. Even novice users are aware of the attacks that can come in the form of email from a friend or a pop-up ad from a web site. The goals of an attack can vary from using the compromised system to attack others, to allowing a re-mote attacker to harvest data from the system, to out-right corruption of the system.

Fully restoring a compromised system is a painful process often involving reinstalling the operating sys-tem and user applications. This can take hours or days even for trained professionals with all the proper mate-rials readily on hand. For average users, even assem-bling the installation materials (e.g. CDs, manuals, configuration settings, etc.) may be an overwhelming task, not to mention correctly installing and configur-ing each piece of software.

To make matters worse, the process of restoring a compromised system to a usable state can frequently result in the loss of any personal data stored on the system. From the user’s perspective, this is often the worst outcome of an attack. System data may be painful to restore, but it can be restored from public

sources. Personal data, however, can be restored only from private backups and the vast majority of personal computer users do not routinely backup their data. Once lost, personal data can only be recovered through repeated effort (e.g. rewriting a report) and in some case can never be recovered (e.g. digital photos of a one time event).

We propose the use of a specialized private file server virtual machine to provide added protection for per-sonal data. This file server virtual machine is made accessible only to other clients running on the same host by way of a local virtual network segment. Per-sonal data is housed in the private file server and ex-ported through specialized mount points with a richer set of permissions than the traditional read/write op-tions. This architecture provides a number of benefits including 1) the opportunity to separate personal data into multiple classes to which different finer grained permissions can be applied, 2) the separation of per-sonal data from system data allowing each to be backed-up and restored appropriately, 3) the ability to rapidly install or restore virtual machines containing fully configured applications and services, and 4) rapid recovery from attack by rolling back system data to a known good state without losing recent changes to per-sonal data.

In Section 2, we describe our architecture and its bene-fits in detail. In Section 3, we compare our architec-ture to other solutions with similar goals such as sys-tem backup utilities and network booting facilities. In Section 4, we describe how it can be used to protect user data against specific attacks. In Section 5, we quantify the overheads associated with this architec-ture by running a variety of benchmarks on a prototype implemented using a modified version of NFS in con-junction with virtual machines in both Xen and VMWare. We discuss related work in Section 6, future work in Section 7 and finally, conclusions in Section 8.

2. ArchitectureFigure 1 illustrates the main components of our archi-tecture. A single physical host is home to multiple log-ical machines. First, there is the base machine (labeled with a 1 in the diagram). This base machine contains a virtualization environment which can be implemented as a base operating system running a virtual machine system such as VMWare or as a virtual machine moni-tor such as Xen. Second, there is a virtual network that is accessible only to this base machine and any virtual machine running on this host. Third, there is a file sys-tem virtual machine which has only one network inter-face on the local virtual network. This file system vir-tual machine is the permanent home for personal data and exports subsets of this personal data store via spe-cialized mount points to local clients. Fourth, there are virtual machine appliances. These virtual machines house system data such as an operating system and

user applications. They can also house locally created data temporarily.

Virtual machine appliances can have two network in-terfaces – one on the physical network bridged through the base machine and one on the local virtual network. Depending on its function, a virtual machine appliance may not need one or both of these network interfaces. For example, you may choose to browse the web in a virtual machine appliance with a connection to the physical network but with no interface on the local vir-tual network to prevent an attack from even reaching the file server virtual machine. Similarly, you might choose to configure a virtual machine with only access to the local virtual network if it has no need to reach the outside world.

2.1.Base MachineWe have implemented several prototypes of this archi-tecture using either Linux or Windows as the base op-erating system and Xen or VMWare as the virtual ma-chine monitor. There are several other excellent virtual machine systems we could have used, but our purpose was not a comparison of existing virtual machine sys-tems. We chose VMWare for its robustness, ease of use and support of Windows guest operating systems. We chose Xen for its lower overhead [Xen03][CDD+04].

Regardless of implementation, the base machine is used to create the local virtual network, the file system virtual machine and the virtual machine appliances. It is used to assign resources to each these guests. It can

also be used to save or restore checkpoints of virtual machine appliance images.

We also use the base machine as a platform for moni-toring the behavior of each guest. For example, in our prototype, we ran an intrusion detection system on the base machine. It can also be used as a firewall or NAT gateway further controlling access to even those vir-tual machine appliances with interfaces on the physi-cal network. Note that the base machine can monitor both the incoming and outgoing network traffic from the virtual machine appliances. It can detect both at-tack signatures in incoming traffic and unexpected be-havior in outgoing traffic. For example, it could indi-cate that all outgoing network traffic from a particular virtual machine appliance should be POP or SMTP. In such a configuration, unexpected traffic such as an outgoing ssh connection that would normally not raise alarms could be a detected as signs of a possible at-tack. (Is this a good example?)

The security of base machine is key to the security of the rest of the system. Therefore, in our prototype, we “hardened” the base machine by strictly limiting the types of applications running on the base machine. Normal user activity takes place in the virtual machine appliances. We also closed all network ports on the base machine. It would also be possible to open a lim-ited number of ports for remove administration, but since each open port is a potential entry point for at-tack, it is important to carefully secure each open port.

2.2.File System Virtual MachineWe implemented the file system virtual machine using a modified version of Sun’s Network File System (NFS) version 3 running in a Linux guest virtual ma-chine. Virtual machine appliances using both Linux and Windows as the guest OS mount personal data over NFS across a local file system. For Linux, there is an open source NFS client. For Windows, we used a commercial NFS client from Labram [Labtam] . Note that our modifications are only to the NFS file server code. Unmodifed NFSv3 clients can be used with our modified server.

Much like the base machine, the file system virtual machine is hardened against attack by stripping away any unnecessary applications and closing all unneces-sary network ports. It is easier to secure a system with a limited number of well-defined services than a gen-eral purpose machine. All the software in the file sys-tem virtual machine is focused on exporting personal data to local clients and to facilitating maintenance on that data such as backup, the creation of particular ex-

ported volumes and the setting of permissions that each client can have to the exported volumes.

The file system virtual machine is additionally pro-tected by only being reachable over the local virtual network. Attacks cannot target the file system virtual machine . directly. They could only reach the file system virtual machine by first compromising a virtual machine ap-pliance. This would require two successful exploits – one against an application running in a virtual machine appliance and one running against the NFS server run-ning on the file server virtual machine. Personal data is housed in the file system virtual ma-chine and subsets of it are exported to virtual machine appliances. This allows you to restrict both the amount and access rights that a given virtual machine has to your personal data. For example, if you have a virtual machine appliance running a web server, you may chose to only export the portion of the personal data store that you wish to make available on the web. You can export portions of your user data store with differ-ent permissions in different virtual machine appli-ances. For example, you may mount a picture collec-tion as read only in the virtual machine you use for most tasks and then only mount it writeable in a vir-tual machine used for importing and editing images. This would prevent your collection of digital photos from being deleted by malware that compromises your normal working environment. Similarly, you may choose to make your financial data accessible only within a virtual machine running only Quicken or you may choose to make old, rarely changing data read-only except in the rare case that you do really want to change it.

It is simple to have multiple mount points within the same virtual machine. You can mount some portions of your personal data store read only and others read/write into the same virtual machine appliance.

We also implemented a richer set of mount point per-missions to allow “write-rarely” or “read-some” se-mantics. Specifically, we modified NFS to add read and write rate-limiting capability to each mount point in addition to full read or write privileges. One can specify the amount of data that can be read or written per unit of time. For example, a mount point could be classified as reading at most 1% of the data under the mount point in 1 hour. Such a rule could prevent downloaded code from rapidly scanning the user’s complete data store.

Figure 2 shows an example of an /etc/exports file with read and write limits. The first line indicates that the client at IP Address foo can read bar.

This required …describe patch

Still could allow malware that read very slowly through your data store, but another good hurdle

Read and write limits are one example of a richer set of mount point permissions that can be used to help protect against attack. Append only permissions could be used to allow prevent removal or corruption of ex-isting data. For example, a directory containing photos could be mounted append-only in one virtual machine appliance allowing it to add photos, but not to delete existing photos. To delete photos, the same directory could either be temporarily mounted with expanded permissions or another virtual machine with expanded permissions could be used. Another example would be restricting the size or file extension on files that are created (e.g. no “exe” files). We did not implement these additional permission types, but they would each be a relatively straightforward addition to what we have already implemented.

One benefit of the file server virtual machine is that it contains exactly that irreplaceable personal data that is most important to back up regularly. In this way, our architecture helps facilitate efficient back-up of user data.

2.3.Virtual Machine Appliances

Virtual machine appliances isolate system state much like the file system virtual machines isolate personal data. Each virtual machine appliance contains a base OS and any number of user level applications from desktop productivity software to server. They can have network interfaces on the physical network al-lowing communication with the outside world. They can also have network interfaces on the local virtual network over which they can mount subsets of per-sonal data from the file server virtual machine.

There can be multiple mount points from the file sys-tem virtual machine into a client. Each mount point can have different permissions to allow finer grain control over the allowable access patterns. For exam-ple, in a single virtual machine you might mount your email BLAH and your

Like the file server virtual machine facilitates the backup of personal data, virtual machine appliances

facilitate the back-up of system data. System data however changes at a different pace than personal data. Specifically, changes to system data can be clearly identified (e.g. when a new application, up-grade or patch is applied). Also, since system data is recoverable from public sources, it can be less crucial that we capture all the changes to system state.

In our prototype, we save known-good checkpoints of each virtual machine appliance. One important use of a known-good checkpoint is restoring a compromised virtual machine appliance from a trusted snapshot. Any changes made within the virtual machine appli-ance since the checkpoint would be lost, but changes to personal data mounted from the file server machine would be preserved. In this way, personal data does not become an automatic casualty of the process of restoring a compromised system.

The file system virtual machine is hardened against at-tack by carefully controlling the software that is run within it. Virtual machine appliances, however, will continue to run an unpredictable mix of user applica-tions including some high-risk applications. As a re-sult, they may susceptible to attack through an open network port running a vulnerable service or through a user-initiated download such as email or web content.

Compromised virtual machine appliances can often be automatically detected by the intrusion detection sys-tem running on the base machine. In our prototype, we have demonstrated that we can detect a attack with a failing rule in Snort and in response to the rule failure, we can stop and checkpoint the compromised virtual machine as well as restart a known good checkpoint of the same machine. This process is nearly instanta-neous – requiring only sufficient time to move the failed system image to a well-known location and move a copy of a trusted snapshot into place. Changes in the system configuration made after the trusted im-age was created would be lost as would any informa-tion stored directly in the corrupted guest. However, the checkpoint image would provide an immediately functional computing platform that would mount the user’s data store from the file system virtual machine.

We do this with a combination of Snort, snort rules, shell scripts etc. details

To prevent future attacks, the trusted image should also be updated to patch the exploited vulnerability. Thus the user is notified when an automatic restoration action takes place. Analysis of the corrupted image and/or secure logs collected by the virtual machine monitor [King03b ] could provide clues to what needs to be modified.

The corrupted image can be saved or shipped to a sys-tem administrator for analysis and even possible re-covery of data stored inside. During this analysis and recovery process, the user would still have a functional computing platform with access to the majority of their data. This is a significant improvement over the extended down time that is often required when restor-ing a compromised system today.

System restarted will still have vulnerability that was originally attacked. Therefore, we limit the number of automatic restarts. For example, after three restarts of a given image, any further compromise will result in stopping the virtual machine and checkpointing, but not in restarting the “trusted” snapshot. Another option would be to leave the virtual machine running but to pull its access to the physical network and the local virtual network. This is similar to turning off or dis-abling the network port for a machine demonstrating signs of compromise until a system administrator can resolve the problem.

It is worth noting that users can also trigger the restoration process manually if they suspect a compro-mise. Users could also use the restoration process to rollback the start of a virtual machine appliance for any other reason (e.g. they installed a piece of soft-ware and simply don’t want to keep it in the system

Similarly, the restoration process can be used to re-cover from accidental system corruption, e.g. from a routine patch or upgrade the introduced instability into the system. Many users are rightfully nervous about possible system corruption from routine software up-dates. In addition to the promise of safer and better op-eration, any update represents a non-negligible chance of system corruption.

Checkpoints allow regular upgrades to a virtual ma-chine appliances to be tested without risk. You can ap-ply a patch or an upgrade to software in the system se-cure in the knowledge that if it causes instability you can return to the last checkpoint. Many users do not regularly apply patches and system upgrades because of the risk of instability. Stable checkpoints would en-courage users to be compliant with upgrade requests by allowing them to easily experiment with the up-graded image. Reducing the risk of regular upgrades and patches is another subtle way in which virtual ma-chine appliances enhance system security.

Another crucial benefit of virtual machine appliances is that they not only provide rapid recovery from at-tack by restoring a known good snapshot, but they also provide rapid first time installation of software sys-

tems. Anyone who has struggled for hours to install and configure software that is already running on an-other machine will appreciate this benefit. Preconfig-ured virtual machines with fully functional, preconfig-ured web servers, database servers, etc. would save new users hours of headaches assembling and in-stalling all the dependencies. This is similar to the benefits of LiveCDs that allow users to experiment with fully configured versions of software without the drawbacks of slow removable, unmodifiable media.

To quantify this benefit, Table 1 lists the time it took us to install a variety of software. The measurements listed reflect local experiments installing software when with the user had already successfully installed the software at least once before. The time it takes a new user to install this software could be significantly higher as they frequently run into problem that can de-lay them for hours or even days (witness the the many installation FAQs and installation questions posted to newsgroups across the Internet).

The times in Table 1 can also be considered a measure of the time saved whenever a checkpoint of a virtual machine appliance is used to recover from attack. Each time a virtual machine appliance is recovered from a know good state, this is a lower bound on the time saved in reinstallation. If it has been some time since the user installed the software, the time savings are likely to be even higher as they must spend time gathering the installation materials and possibly stum-bling into some of the same errors a new user would.

We measured the times in Table 1 locally, but in retro-spect, we would love to see average expert and novice install times routinely listed for all software. Such metrics could be reported by both the software devel-opers and/or by third party evaluators.

Software Installation Time for an Experienced User

(minutes)Base Windows Desktop InstallWindows Desktop Install with an array of user level softwareBase Linux Desktop In-stall (RedHat) Base Linux Desktop In-stall (Ubuntu)Linux Base Installation with Apache Web ServerLinux Base Installation with mySql

Linux Base Installation with sendmailCheckpoints can also be used to transfer working sys-tems images from one physical host to another. For ex-ample, users can move copies of a working virtual ma-chine appliance onto both their laptop and desktop ma-chines. As another example, a completely configured FTP server virtual machine appliance could be dupli-cated and transferred to any other system – transferred from one machine to another for the same user, trans-ferred from one user to another or even downloaded from a web site.

In many ways, we see checkpoints of virtual machine appliances as a new model for software distribution. Often times installing and configuring server software is difficult and time-consuming. A pre-configured vir-tual machine appliance could be delivered to a user with well-defined resource requirements and connec-tions to the rest of the system including the character-istics of any mount points into the user’s data store.

The term “appliance” implies a well-defined purpose, well-define connections to the rest of the world and a minimum of unexpected side effects. Physical applica-tions typically specify their resource requirements and can be replaced with an equivalent model if they mal-function. In the case of a virtual machine appliance, a user would load it on their system and plug in into their data store by mapping its defined mount points to the exports from the local file system virtual machine. If the virtual machine appliance is attacked or mal-functioned, it would be straight-forward to replace it with a new functional equivalent without losing your personal data .

This would provide a new platform for value added services including configuration, testing and character-ization of virtual machine appliance. Those who pro-duce virtual machine appliances could compete to pro-duce appliances that have the right combinations of features, that are easy to “plug in”, that have a good track record of being resistant to attacks, that use fewer system resources or that set and respect tight bounds on their expect behaviour. Appliances that reli-ably provide the advertised service without violating their resource requirements would have value to users. In fact, we expect that most users would like to view computers as appliances that reliably perform the ad-vertised actions without unanticipated side effects. Virtual machine appliances would be a good step in this direction. Validation of these virtual machine ap-pliances may also be more straightforward because they would be designed to run in isolation from other user level applications.

Virtual machine appliances are particular attractive in the context of open source software because any num-ber of applications could be distributed together in an virtual machine appliance without concern for licens-ing requirements of each individual software package. This could be a significant hurdle for commercial soft-ware however without new licensing models. Simi-larly, developers of open source software could distrib-ute virtual machine appliances with a complete devel-opment environment including source code with all the proper libraries required for compilation and soft-ware to support debugging and testing.

The number and type of applications in each virtual machine appliance can be tailored to the usage re-quirements and desired level of security. At one ex-treme, there could be only one virtual machine appli-ance containing all the software normally installed on a users base machine. At the other extreme, there could be many virtual machine appliances each with a subset of the user’s software.

Multiple virtual machine appliances allow finer grained control over resources required, expected be-havior and the subset of personal data accessed. For example, a web server virtual machine appliance may be given read only access to the content it is serving and may be prevented from establishing outgoing net-work connection. Thus even if the web server is at-tacked, the damage done to the user’s system is mini-mized. The attacker would also be prevented from har-vesting information from the rest of the user’s data store and their ability to use the system as a launching pad for other attacks would be diminished.

When each virtual machine appliance has a small number of applications, it is easier to characterize ex-pected behavior which of course makes it easier for in-trusion detection software running on the base ma-chine to watch for signs of an compromised system. It is also easier to configure the virtual machine appli-ance with the smallest set of rights to personal data that is necessary to accomplish the task.

However, each additional virtual machine appliance requires additional memory when executing and addi-tional diskspace to store the operating system and other common files. Multiple virtual machine appli-ances also make it more difficult to share data between applications. For these reasons, it is best to group as many applications with similar requirements together as possible.

Taken to the extreme however this could mean a seperate virtual machine for each applications. We are

not advocating this extreme. It is easier for users when they can exchange data between applications and many applications with similar resource, data and se-curity requirements can and should be grouped to-gether. In our experience with our prototype, we have found that a good strategy is to isolate those applica-tions with special security needs. For example, appli-cations that are commonly attacked (e.g. server soft-ware such as web servers or database servers) are good candidates for their own virtual machine appliance. Similarly, applications requiring access to sensitive personal data such as financial data are also good can-didates for their own virtual machine appliance.

(Doe we want to mention this? )One of the more diffi-cult cases is applications such as web-browsing and email access. These applications are common sources of attacks. However, they are also applications that users like to have tightly integrated with their personal data. For applications such as these, there is a range of solutions along a spectrum of security and ease-of-use. For example, we have found it effective to support web-browsing in multiple virtual machines – one with rights to access the personal data store and one with no such rights.

The base machine creates a set of resource limits for each virtual machine appliance in several ways. First, the base machine can allocate a limited amount of sys-tem resources such as memory, disk space or even CPU time to each guest. Second, the base machine can restrict access to the local virtual network and/or the physical network connection. In either case, access can be denied completely or simply restricted through fire-wall rules. Third, the intrusion detection system run-ning monitor the behavior of the guest for both attack signatures and otherwise “innocent” looking traffic that is simply unexpected given the purpose of the vir-tual machine appliance. For example, you might not expect to see any outgoing network connections from a web server virtual machine. (Too many web server examples – change some of these)

These limits can be thought of as a contract of sorts with the virtual machine appliance. When a virtual machine appliance is loaded on the system, a contract is established that limits it to its expected set of behav-iors. Virtual machine appliances for which their ex-pected behaviour is well- characterized would make it easier to detect signs of attack or compromise and would therefore be more valuable to users. Accom-plishing the required functionality under a more re-stricted contract would be another aspect of a high quality virtual machine appliance.

Contracts fix a fundamental problem with running new applications. Applications typically run with a users full rights but there is no method for holding them ac-countable doing only what is advertised. This leads to Trojan horse exploits in which a piece of software claims to accomplish a particular desired task but re-ally also has malicious unadvertised effects. Similar to running software with a reduced set of privileges like with chroot or jail.

In our prototype, these contracts are expressed through a combination of Snort rules and limits imposed by the virtual machine monitor. In the future, we would love to see an unified contract language that could be used to express all aspects of the contract. Such a contract could be inspected by the user and then loaded with the virtual machine appliance onto the users system. Tools that made the creation, inspection and validation of these contracts easier for users and developers would be a helpful addition to such a system.

3. Comparison to Other Solutions

In this section, we compare our architecture to several existing solutions such as regular full backups of a complete system, network booting combined with im-porting data from an independent physical file server and system reset facilities such as DeepFreeze or Stateless Linux.

How to do full backup – various methods some more focused on needs of personal data and some of system data. For example burning data to DVD or other re-movable media is a protable backup for personal data but is typically not bootable so it does not restore the system to a bootable state again quickly

Making a ghost image is a good way to restore the sys-tem to a fully bootable state but harder to do incre-mentabls and expensive to do full backup of old and new personal data, system data everything on each backup

Also ghost images often not portable across machines (what CPU platform, devices supported etc) - can re-store to same physical machine –but hard to send com-promised machine out dor recovery and/or analysis while restoring to another platform. Virtual machines abstract these underlying details so they are portable access machines – we used same VMWware client imags for both Linux base OS and Windows base OS systems

In the case of ghost images there is also the difficul of verifying full backups withouth trashing the existing systems (or having a second duplicate system on hand!)

Can craft a Combination of full ghost images plus in-crementable but each layer makes it more diffifult hard to manage

Protecting data in a file system virtual machine does not remove the need for regular back-ups to protect agains hardware failure etc, but it does streamline the process by allowing backup efforts to focus on the ir-replaceable personal data rather than on the recover-able system data. It also allows backup efforts to be customized to the differing needs of system data and personal data.

Tailor backup strategy to the differing needs of per-sonal data versus system data.

System data changes more slowly and predictably – while user data is more dynamic and unpredictable in its rate of change . To be more clear, there may be new system data written every day – log files etc- but this is not really use visible changes – changes that are important to preserve in the case of a compromised machine

In addition, there is a mismatch between the overall rate of change in system data and the user visible rate of change. A large percentage of the data written in the average computer system is not directly related to user data (e.g. logs of system activity or writes to the page file). However, it is produced constantly even when the system is otherwise idle. This activity is of no interest to most users as long as the system contin-ues to function. If a month’s worth of such activity were lost, users would be perfectly happy as long as the system was returned to an internally consistent and functioning state.

User data is exactly the opposite. Overall, user data changes at a slower, human driven pace. However, the rate of user-visible change is high. For example, a user may only add 1 page of text to a report in an 8 hour workday but the loss of that one day of data would be immediately visible. Fortunately, this means that efforts to protect user data can be effective even if targeted at a small percentage of overall data.

TODO quantitative data on backups - references for how few people do itor expense/time involved

If the user does make routine backups of their personal data, the FS-VM can also make that process more effi-cient. The FS-VM contains exactly the data that should be backed-up. Even if the user does not make routine backups, this configuration prevents loss of user data simply because it is intermingled with high-risk system data.

Why better than backup whole system – 1) ability to catch attack with Snort (if catch with inbound virus scanner then can prevent) – if don’t then can look for attack out behavior and restart from known good or just pull their network access

For system data, ease of checkpoint and restore. Appli-ances – ability to try out new appliances or roll back to old

Much like bringing benefit of managed LAN to a per-sonal computer – file server, net boot server, firewall, it is bringing the benefits of network servers and seper-ation of responsibilitis to people with just one com-puter not a whole network

Net boot and mount from separate FS machine

Versus DeepFreeze or other system reset facilities – great for environments where you want a fresh start everytime – here we want to be able to make changes – download new apps and have them stay there etc – but isolate user data – if VM with a new app dies you can reinstall from last known base

TODO cOMPARE to DeepFreeze or other system re-set facilities TODO stateless Linux clients (Red Hat Fedora project)? network booting

4. Protection Against Attacks

To assess how well our architecture prevents and helps recover from attacks, we began by examining recent CERTadvisorie. We found that there have been 9 adv-sisories posted since April 21, 2004. These 9 attacks are described briefly in Table X.

CERT attacks – how we defend against them

In addition the CERT advisories, we also examined several well-known attacks including CodeRed, My-Doom and Blaster. These attacks are briefly described in Table Y.

Other kinds of attacks we can defend against

For example, these mount points can specify the al-lowable read or write rate or prevent files with certain properties from being created.

Action item would be to group these into categories of what fixes them

write restriction lastest CERT restriction on exe-cutables email and file sharing (some versions of outlook do not allow saving executables but many email clients and many people don't have set - also don't let you view HTML so many people turn off because to restrictive ) we are not preventing people from saving and running executables - just not sav-ing them to the FS-VM - we let people try them and if it goes wrong they can reset APP-VM to a known good state – this is much safer/easier user experience safe "playpen"

read restriction latest CERT restriction on read all snort rule/restart/pull network CodeRed (2/9/05) Blaster

Leslie - MyDoom (Bagle like this?) lots of versions change registry to run itslef when the system restarts puts copies of itslef in shared directories what does it do? mass emailing snort could catch and go back to known good that wouldn't have registry mods can't NFS mount the registry Teatime prevents registry changes? Spybot MyDoom doesn;t do anything just mails copies of itself takes up network BW leaves a backdoor - allows later attacks to be run from this client over-writes host file so users can't acccess antivirus or win-dows update siteKevin CodeRed1 and CodeRed2 CodeRed - a little defacing on main page of web pages (little writing) 1-19 of month generate list of Ips CodeRed1.0 static list of Ips CodeRed1.1 random generation of Ips IIS exploit - some kind of server attack 21-29 DOS attack on popular sites CodeRed1 stays in memory CodeRed2 writes an exectutable - could do blocking of executables limits on outside connections CodeRed1 reboot would kill it off and won't come back unless reinfected

watch snort logs and if detect it - restart the VM only restart N times and then alert the user that this snort rule constatnly fails restart it but without a bridged connection ******** or possibly pause the VM? equivalent of pulling a machines network connection

Patty – Blaster randomly picked an IP address and then exploited a RPC flaw (port 135) snort rule finds and pulls its network access automatically run the patch on this system??

5. Overhead of Private File Server and Virtual Machine Appliances

Clearly, this is a tax on the system performance. Intro-duces overhead but given the power of modern PCS, many users will have resource to spare (disk space, CPU speed, memory) and many users willing to pay for the protection of their personal data and rapid re-covery from attack.

Lots of VM monitors comparision of them all beyond the scope of this– we experiemented with VMWare and Xen - why

Start with Xen numbers on ITL – iozone read and write and freebench

Compare that to Xen on versus planet lab

Then examine VMWare numbers with Windows base on the ITL machines – iozone read and write and freebench; VMWAre with Linux base on the ITL ma-chines (relace all this with Jason’s machines numbers if those are better),

Dbench?

Details of experiments – machine specs, Version num-bers of VMware and Xen, version numbers of iozone and parameters to iozone, version numbers of freebench, debench, versions of Linux and Windows on base, in guests, What NFS server and clinet ver-sions, how many runs of each experiment

What do we know from Xen paper and Freenix paper?

Conslusions - does powerful vs less powerful matter? does base OS matter? does guest OS matter? does the VM system matter - Xen vs VMWare (yes from other

paper) does NFS vs AFS matter? break down base OS/local FS to guest OS/ NFS/AFS in FSVM with guest OS/ local FS

Conclusions – little to no overhead for freebench, overhead of up to 30% for io intensive for VMWAre – overhead under Xen is much lower – however Win-dows is most attacked so VMWare is crucial – even at higher overheads still worth it for many users (prefer stable but slower appliance to constant worry about at-tack)

VMWare is robust and it supports Windows VMS; Xen suports XenoLinux today and plans for XenoWin-dows; For Windows VMs is VMWare the only choice? Win4Lin, Win, Crossover Office

whats wrong with VMWare? 1) we would prefer a base OS that is a simple hardened VM manager; VMWAre is not that VMWARE ESX - is it that?

Xen has lower overhead but does not support Win-dows. Since the majority of high-impact worms and viruses target Windows systems, it especially impor-tant to support Windows guests. However, there is on-going work on XenoWindows, or the ability to support Windows guests on Xen.

Ways to further reduce avoid overhead – spectrum of security and recoverability versus overhead

Another good strategy is to if we find that expensive togo to NFS/AFS server and cheaper to stay in a VM - then we are better off the more things are to system data (kept local) only pay the expense on mounted "user data"

Some longer range plans include examining file traces to quantify the percentage of file system traffic di-rected to system data versus user data on a set of desk-top systems. The lower the percentage of traffic di-rected to user data, the easier it will be to efficiently protect the user data without significant overhead.

Also not all “unrecoverable” data needs to be classi-fied as personal data and stored in the file server vir-tual machine. For example, web server logs may be useful and interesting but not worth preserving in case of attack.

6. Related WorkVirtual machine technology has been available for over 30 years on mainframes [VM370, Goldberg74], but it is relatively new for commodity personal com-puters [Denali02, VMWare, Xen03]. Today, virtual machine technology on commodity hardware has many applications including providing multiple oper-ating systems platforms on the same physical machine, building efficient honeypot machines and as a stable platform for OS development. As the computing power and storage capacities of commodity platforms increase, additional applications of virtual machine technology are explored. Several systems have used virtual machine technology to enhance system security and fault tolerance. Bres-soud and Schneider developed fault-tolerant systems using virtual machine technology to replicate the state of a primary system to space back-up system [Bres-soud96]. Dunlap et al used virtual machines to provide secure logging and replay [Dunlap02]. King and Chen used virtual machine technology and secure logging to determine the cause of an attack after it has occurred. Reed et al used virtual machine technology to bring untrusted code safely into a shared computing environ-ment.We focus on the related problem of rapid system restoration and protection of user data. We are un-aware of another system that has seperated user data and system data in the way we are proposing and opti-mized the handling of each to provide rapid system restoration after an attack.

We propose the use of virtual machine technology to isolate and protect user data from the rest of the sys-tem and to provide rapid restoration of system state. Virtual machine technology has been available for over 30 years on mainframes [VM370, Goldberg74], but it is relatively new for commodity personal com-puters [Denali02, VMWare, Xen03]. Today, virtual machine technology on commodity hardware has many applications including providing multiple oper-ating systems platforms on the same physical machine, building efficient honeypot machines and as a stable platform for OS development. As the computing power and storage capacities of commodity platforms increase, virtual machine technology is also being used to provide enhanced system services such as secure logging [Chen01] and backtracking of intrusions [King03].

We propose using virtual machine technology to pro-vide rapid restoration of a compromised system.Recently, virtual machine technology has been used to identify the specific vulnerabilities that allowed an at-tack to succeed so that similar attacks can be pre-

vented in the recovered system. We apply virtual ma-chine technology to a related problem – that of rapid recovery from an attack.

TODO on summary of other VM systems besides VM ware

Qemu and Bochs

7. Future WorkImprovements to base systems – nice GUIs for easily manipulating VMs, unified system to express all ele-ments of the contract between based machine and a VM and tools to help create, inspect and validate these contracts

Right now not for the novice user but it could be - VMWware GUI is pretty easy to use integrate some of these configurations by degault right now they don't give you a choice of VM with an interface on a local virutalk network and the real network you have to configure that custom creates a market for easy to use appliance VMS sell your VM ot others if it is easy to use the "plugs" are well definedMore benchmarks especially Windows system bench-marks

We would also like to investigate logging data modifi-cations in the file system virtual on a per client basis. Logging data modifications would allow the FS-VM to roll back the changes of a compromised VM when sus-picious behavior is detected.

In addition, the FS-VM could log modifications to the user’s data store. As in a log-structured file system, the data written would immediately be visible in the file system; however, earlier versions of the file system could be recovered if garbage collection of old data is delayed. Such a delay in garbage collection would al-low the user or the intrusion detection system time to detect the attack. Once detected, the log could be trun-cated at a point before the attack began. The length of the log could be based on the amount of time required to detect an attack. Such a delay would also provide an undo facility to recover from accidental destruction of user data. The maintenance VM could be used to con-trol rollback of the FS-VM.

8. Conclusions

List all the ways we help improve security – ability to characterize and hold virtual machine appliances to their resource limits, finer grain control of subsets of

personal data, reducing the risk of regular patches and upgrades, collecting personal data together allows re-sources for regular backup to be focused on the unre-coverable user data, automatic restart of a compro-mised appliance to a known good state, ability to send checkpoints of compromised machines for analysis and recovery in parallel with running at least tempo-rarily on the last known good snapshot, ability to hold personal data in a virtual machine that is not even di-rectly accessible from remote machines without at least two separated successful exploits one against the hardened file system virtual machine

Other benefits – ability to share working virtual ma-chine appliances across physical machines thus facili-tating a new software distribution method with many advantages including saving users the headaches of initial installation and configuration

Recovering a compromised computer system can be a painful process often involving reinstalling the operat-ing system and user applications. In addition, any per-sonal data stored on the machine is often lost unless a recent backup exists. We describe the fundamental differences between user data and system data which lead to different approaches to protecting and recovery each class of data after a attack.

We have quantified overhead in a prototype system and described ways to further improve the prototype for both eas of use and further reductions in overhead

9. References

[Bressoud96] T. Bressoud and F. Schneider. Hypervi-sor-based fault tolerance. ACM Transactions on Com-puter Systems, 14(1):80-107, February 1996.

[Chen01] P. Chen and B. Noble. When virtual is better than real. Proceedings of the 2001 Workshop on Hot Topics in Operating Systems (HotOS), p. 133-138, May 2001.

[CDD+04] B. Clark, T. Deshane, E. Dow, S. Evanchik, M. Finlayson, J. Herne, J. Matthews. “Xen and the Art of Repeated Research”, 2004 USENIX An-nual Technical Conference FREENIX Track, June 2004.

[Denali02] A. Whitaker, M. Shaw, S. Gribble. Scale and Performance in the Denali Isolation Kernel. Pro-ceedings of the 5th Symposium on Operating Systems

Design and Implementation (OSDI 2002), ACM Oper-ating Systems Review, Winter 2002 Special Issue, pages 195-210, Boston, MA, USA, December 2002.

[Disco97] E. Bugnion, S. Devine, K. Govil, M. Rosenblum. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. ACM Transac-tions on Computer Systems, Vol. 15, No. 4, 1997, pp. 412-447.

[Dunlap02] G. Dunlap, S. King, S. Cinar, M. Basrai, P. Chen. ReVirt: Enabling Intrusion Detection Analy-sis through Virtual Machine Logging and Replay. Pro-ceedings of the 2002 Symposium on Operating Sys-tems Design and Implementation (OSDI), p.211-224, December 2002.

[Eust99] K.F. Eustice. A universal information appli-ance. IBM Systems Journal, Volume 38, Number 4, 1999.

[iozone]

[King03a] S. King, G. Dunlap, P. Chen. Operating System Support for Virtual Machines. Proceedings of the 2003 USENIX Technical Conference, June 2003.

[King03b] S. King and P. Chen. Backtracking Intru-sions. Proceedings of the 19th ACM Symposium on Operating Systems, p. 223-236, December 2003.

[Labtram]

[LKML03] K. Fraser. Post to Linux Kernel Mailing List, October 3 2003, URL http://www.ussg.iu.edu/hypermail/linux/kernel/0310.0/0550.html accessed December 2003.

[Norm98] D. Norman. The Invisible Computer: Why Good Products Can Fail, the Personal Computer Is So Complex, and Information Appliances Are the Solu-tion. MIT Press, 1998.

[Dike00] J. Dike. A User-mode Port of the Linux Ker-nel. Proceedings of the 4th Annual Linux Showcase & Conference (ALS 2000), page 63, 2000.

[Dike01] J. Dike. User-mode Linux. Proceedings of the 5th Annual Linux Showcase & Conference, Oak-land CA (ALS 2001). pp 3-14, 2001.

[Dike02] J. Dike. Making Linux Safe for Virtual Ma-chines. Proceedings of the 2002 Ottawa Linux Sympo-sium (OLS), June 2002.

[Goldberg74] R. Goldberg. Survey of Virtual Ma-chine Research. IEEE Computer, p. 34-35, June 1974.

[Spafford89] E. Spafford. Crisis and Aftermath, Com-munications of the ACM, 37(6):678-687, June 1989.

[VM370] R. Creasy. The Origin of the VM/370 Time-Sharing System. IBM Journal of Research and Development. Vol. 25, Number 5. Page 483. Published 1981.

[VMWARE] Vmware, URL http://www.vmware.com accessed December 2003.

[Xen99] D. Reed, I. Pratt, P. Menage, S. Early, and N. Stratford. Xenoservers: Accounted Execution of Untrusted Code. Proceedings of the 7th Workshop on Hot Topics in Operating Systems, 1999.

[Xen03] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Warfield. Xen and the Art of Virtualization. Proceed-ings of the nineteenth ACM symposium on Operating systems principles, pp 164-177, Bolton Landing, NY, USA, 2003

[Xen03a] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, E. Kotsovinos, A. Madhavapeddy, R. Neugebauer, I. Pratt and A. Warfield. Xen 2002. Tech-nical Report UCAM-CL-TR-553, January 2003.

[Xen03b] K. Fraser, S. Hand, T. Harris, I. Leslie, and I. Pratt. The Xenoserver Computing Infrastructure. Technical Report UCAM-CL-TR-552, University of Cambridge, Computer Laboratory, Jan. 2003.

[Xen03c] S. Hand, T. Harris, E. Kotsovinos, and I. Pratt. Controlling the XenoServer Open Platform, April 2003.

[ZPERF] S. Thoss, Linux on zSeries Performance Up-date Session 9390. URL http://linuxvm.org/present/SHARE101/S9390a.pdf ac-cessed December 2003. [dbench]

[freebench]

10.OUTTAKES

Figure 1: First System Configuration

2.1 Identifying User Data and System Data

User data housed in the FS-VM would typically in-clude a user’s home directory or documents directory. Many applications are already configured to store data in these locations. User data could also include other locations in which applications store user data (e.g. mailboxes, calendars, archives of scanned photos). Subsets of the user’s data store could be mounted into the PLATFORM-VM at different locations and with different permissions.

System data would include the operating system image and related files as well as all installed software. Sys-tem data would typically also include any information written by the OS or user applications such as logs or paging files.

Clearly, there is flexibility in this boundary between user and system data. For example, if a user runs a web server, the content directory could be considered personal user data and mounted read-only from the FS-VM. However, if the user has a static copy of that data within the FS-VM, she might simply place a copy in the PLATFORM-VM instead. Similarly, web server logs may be considered system data. However, if the user wanted to prevent the loss of these logs in case of attack, a mount point could be established into the FS-VM.

2.2 Key Differences Between System Data and User Data

This separation of user data from system data is moti-vated by several key differences between system data and user data. First, system data is more predictable and thus is easier to attack. Second, system data is of-ten more valuable to an attacker and thus more attrac-tive to attack. Third, system data can be restored from public sources and thus is less critical to protect. These differences lead to different approaches to data protec-tion and restoration.

2.2.1 System data is more predictable and thus is eas-ier to attack.

From the earliest worms [Spafford89], attackers have exploited the homogeneity or monoculture of system data. Attackers can target a small number of operating systems or commonly used pieces of user level soft-ware (e.g. web servers, email readers). The attackers themselves have easy access to the actual binaries and typical configurations. If they discover vulnerabilities when probing their own test system, the chances are excellent that the exact same vulnerabilities will exist on many other systems.

Users data, on the other hand, is more unpredictable in its contents, name and location. There are exceptions to this rule. For example, some malware has exploited the homogeneity of address books in Microsoft Out-look. However, for the most part, user data is signifi-cantly less predication and thus harder to exploit in an automated fashion than system data.

Malicious code or data can still be introduced into the user’s data store. For example, a virus may be written to a user’s email log. However, a virus scan program could be run against the user’s data store to clean it without loss of data and if the PLATFORM-VM was compromised it could be restored.

2.2.2 System data is often more valuable to an attacker and thus more attractive to attack.

As a general rule, user data is also less valuable to at-tackers. For example, a report may represent weeks or months of careful work for a user, but to an attacker it is of no value. Similarly, a video of a special event may be priceless to a user, but irrelevant to an at-tacker.

System data on the other hand is valuable because once compromised it can provide the attacker with re-source such as an idle CPU, free disk space or an un-der-utilized network connection. Attackers can use these resources for many purposes such as to store and serve illegal content or as the launching pad for addi-tional attacks.

Once a system is compromised, attackers may be able to log in at will and search through user data in a less automated fashion. In this way, attackers may locate valuable user data such as credit card numbers or other financial data. However, the difficulty of automation has to-date limited the scope of such attacks.

User data is rarely the primary target of an attack. However, it is often a casualty of attacks on system software. Isolating user data from system data makes it less likely that user data will be lost simply because malfunction

ing system software makes it inaccessible.

Technology

sosp2005_b.doc