Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Sunday, July 12, 2009

Moving to the Net: Encrypted Execution for User Code on a Hosting Site

I'm gravitating towards establishing at least 2 net.presences - 1 on a shared hosting site, the other on something like Amazon ECC where I get to administer the (virtual) machine. Working on the assumption that it is unlikely that both will go down simultaneously. (The sysadmins of the shared hosting site are more likey than I am to respond quickly to some spreading net.malware that requires immediate patching.)

Plus, of course, the usual personally owned computers/servers.

The prospect of moving my "home" off computers I own to computers that hosting service provides is a bit worrisome. E.g. would I/should I trust storing my TurboTax on a hosting site? Running Gnu Cash?

Put that way, it is more attractive to use net.storage like Amazon S3 for encrypted storage, but to always try to do the secure manipulation on computers I own.

However, just think of how already all my financial information is spread across various provider websites.

Anyway... Thinking about virtual hosting with Amazon S3... How could you have enough trust to do sensitive work on such a system? Or, rather, could we build systems where the hosting sysadmins would not be able to look at the data their hosting clients are working with?

It's a lot like DRM, Digital Rights Management, the sort of thing that motivated LT. However, instead of trying to prevent the user and owner of a PC from accessing unencrypted data on his own PC, what I would like to do in the hosting space is prevent the provider of the hosting service from accessing unencrypted data that belongs to the person renting the hosting service.

Let's imagine that this could be done. In some ways it might prevent one of the IMHO big advantages of centralized hosting, which is that you could have a sysadmin scanning for problems. The sysadmin might still be able to look for usage patterns that indicate a malware breakin, but the sysadmin would certainly have an easier time of it if he could look at the data.

It would also get in the way of another of the IMHO big advantages of centralized hosting: data de-duplication. (That seems to be the new term for what I have mainly been calling content based filesystems, inspired by rsync, Monotone, git, etc.)

E.g. I might reasonably want to store a copy of every piece of software I have ever bought on my net.storage-system. E.g. I might want to store every blamed copy of Windows I have ever had a license too. So that I can restore to the PCs that they run (ran) on, whenever... Subject to licensing, of course. Now say that the host is a big site, like Google or Yahoo or Amazon. Millions of Windows users might be doing the same thing. The opportunities for data de-duplication are large. But if each user's data is separately encrypted, deduplication would not work.

These arguments lead me to suspect that completely encrypting all information, in a way that prevents the host from looking at it, is undesirable. At the very least, optionally encrypting on a file by file basis would be desirable. E.g. I might encryt my TurboTax files, but not my saved copy of Office 95.

Okay, okay... now, can we think of ways of preventing the sysadmins managing a hosting service from having full access to at least some user data?

Start off with encrypting data on disk. Encrypt it in main memory. Encrypt it into the cache hierarchy.

Decrypt only when moving between cache and registers. (Or possibly in the innermost levels of the cache hierarchy (L0, L1).)

How do you prevent the OS from looking at the unencrypted register state? On an interrupt, re-encrypt the registers as they are saved to memory.

Note that this means that you probably won't do "efficient" interrupts, that leave as many registers as possible live. You have to save them.... or, at least, you might encrypt them in-situ. Or have hardware "lock" them in-situ, and then do a lazy save to the encrypted area. But, you can't let the host look at the unencrypted registers.

Debuggers would have to be disabled while manipulating encrypted content. Or... you could allow debugging, but "lock" registers containing encrypted content. The debugger would either not be allowed to look at such registers, or might be allowed to look at "obfuscated" contents. I say "obfuscated" because, while it might be challenging to encrypt a 64B cache line, it will be almost impossible to encrypt 64bit (32bit, or 8bit) registers. Despite these attempts, the debugger can probably infer register contents (for all bits do: jmp if set). So, while I can imagine ways of making it harder for a debugger to observe sensitive data, at the very least debugging would be a very high bandwidth covert channel. Just like reading the time or reading a performance counter, debugging should be a facility that requires special permission.

Besides, why would a hosting sysadmin want to debug a user's application? The only legitimate reasons I can imagine involve trying to figure out if a user has been compromised by malware. Some intrusion detection systems singlestep execute in a debugger.

So, I am formulating a vision where the OS may run user tasks, but the user tasks may establish encryption based security that prevents the OS from looking at the user task data. There is some sort of key inside the processor that the OS cannot look at. This key is used to encrypt and decrypt data as it flows between processor registers and cache/memory. Registers are saved/restored encryted. Hardware probably defines a save area for all register state. Registers may be locked against the OS, and/or for lazy encrypted save/restore.

System calls? The classic UNIX system calls - read, write, etc. - do not require the OS to interpret the memory contents. Thedy can handle encrypted as well as unencrypted data.

Bootstrapping? How do you bootstrap this? How do you start a user process, and give it a private key, in a manner that the OS cannot see? Basically, you will have to have a small trusted kernel, trusted bootstrap. It doesn't have to be done once and only once: I believe there are already many patents on late secure bootstrap (with me as one of the inventors). So, the untrusted OS can be running, and can invoke a small trusted kernel. This small trusted kernel must have its own private key(s), and can lookup tables of user private keys it maintains on disk, and/or obtain such keys across the network, after the appropriate secure handshake and communication. For my usage model - wanting to execute my stuff at a hosting site - I would probably prefer the "access across the network" approach. However, if I wanted to run cron jobs on the host, host-side storage of user keys might be necessary. It would be necessary for the user to trust this kernel, even though the user did not trust the OS. But this is the same issue as for DRM.

This trusted kernel could be microcode. But it could just as well be user code, with the trusted key hardwired in.

This is all very much like DRM. But it differs in emphasis: in DRM, you typically only want one or a few applications - the audio or video player - to be protected from everything else running on a system. Whereas here I have described how to run user applications on a hosting site, protected from the host OS (and any other software). In this usage model it is desirable to be able to protect almost any application from the OS. The DRM audio and video subsystem can be a special case; but this hosting application wants to truly be part of the OS. Or, rather - it wants to NOT be poart of the OS, but wants to be a facility available to most processes and users under the OS.

This works for generic user processes on a generic OS, e.g. a UNIX user account. It is not just limited to virtual machines.

Can the OS intercede, e.g. by munging with the interrupt return IP or virtual memory? While we probably should not prevent the OS from such manipulations, we can prevent the OS from gaining access to sensitive data. The state and/or code being returned to can be authenticated. Untrusted code can be prevented from ever using the trusted keys.

It would be straightforward to similarly protect the code from the OS. But, while DRM may want that, I think that is much less necessary for this hosting usage model. Generic code, manipulating sensitive data. E.g. I want "cat" and "grep" to be securable. I envisage something like

sensitive-data gnu-cach dump account | sensitive-data grep my-hotel-bill > file-whose-sensitive-stuff-I-can-only-view-on-my-client

or

sensitive-data gnu-cach dump account | sensitive-data grep my-hotel-bill | sensitive-data -stdout unencrypted more