Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, May 11, 2017

Yes, really, I did (predict IME security bugs)

I usually try to resist the temptation to say "I told them so".  Saying "I told you so" doesn't make you popular with the people you told.  It is better to move forward and get things done, with the cooperation of those people, than it is to be proven right but shunned.

But in th case of the Intel IME / IAMT / ARC embedded CPU inside the master CPU/chipset, I am going to give in.  Because, yes, really, I did predict such security bugs. 

Darn...  I wrote a moderately long diatribe.  But loxt it because "Blogger is not a content management system, and does not proviode veersioning."  (I could swear blogger used to version - perhaps I am misremembering Google docs.)


Briefly:

Many folks believe that an "assistant" embedded processor is the answer to every problem in computer architecture: I/O; DMA; block memory copies; ... manageability; security.

Problem: 

The "assistant" embedded processor often has a less evolved security architecture than the "master" CPU.  Processor hardware, OS, SW.   Often the reason that the assistant embedded processor is more efficient is that it has thrown out advanced security. E.g. virtual memory page protection. E.g. privilege levels.

Oftentimes, the people advocating such "assistant" embedded processors say "It doesn't matter, we are only using it in narrowly constrained ways.  We can prove the code secure."  Yeah, right.

Over time, such "assistant" processors have added more and more security features.

But even when the assistant embedded processor has security features comparable to the master CPU, it still sucks that it is different.

By construction there will be more code in such a system to code review and security audit.  Even if functional code is the same, there is boot code.

Worse, the assistant and master may have comparable but slightly different security models.  E.g. one may have page permissions RWX-RW-RX but not execute only or read-only, the other may have all the relevat permissions RWX-RW-RX-R-X. Just a small difference, but it can matter.

Worse yet, there is a limited supply of experts for code reviews and security audits.  Even for one system, typically the master.   Even more so if they must be expert in both master and assistant processors.  Moreover, even if you can find such experts, it is damned hard to keep both of them in your head at the same time. It is common to say "oh, that's okay on x86 but not on ARC (or MIPS, or ARM, or your favorite embedded processors)", and later realize that you have it wrong.  Or vice versa.

Worse yet if simuilar but slightly different OSes and SW librraies are available on master and assistant.

Why heterogenous, given risks?

Given the security risks of heterogenous systems with different master and assistant processors, why do them?

Apart from the "my embedded processor is inherently more efficient than your master CPU" wars - which are usually wrong, although possibly true in certain instances (certainly for specialized I/O processors such as GPUs and DSPs) - the biggest reason to have embedded processors separate from the master CPU is:

OS independence.

If the embedded processor provides a high-enough-level API that it can be used without substantial modification for many different OS clients, then you may have less code to maintain (and code review, and security audit).

Furthermore, the very functions that you wish to provide using the assistant embedded processor may need to be protected from the OS.  E.g scanning for malware.

Plus, in a minor way, the fact that the assistant embedded processor is running a different ISA than the mass-market main x86 CPU makes it a little bit les likely that Joe Random Cracker has the skillz to break in.  But I would not count on that.  (Besides, you can get ISA encoding diversity whike preserving ISA semantic consistency. See below.)


Mitigations:

Dedicate parts of Multiprocessors: I like the way that IBM mainframes do manageability: in an LPAR, typically one of the mainframe CPUs is reserved to run the manageability tasks.  Same CPU architecture.  Sometimes even the same OS architecture, although running with hypervisor (MVME) privilege.

You could have different microarchitectures with same ISA and privilege architecture.   But many companies cannot afford to create such a diversity of microarchitectures, while customers may want to license only one instance of the popular but most expensive. (IMHO licensing should encourage such consistent, rather than discourage.)

Shared memory is a security hole:   We live in a world of web services, SAAS (Software As A Service) - get used to it. If the heterogenous "assistants" in your SOC interface via message passing, or even as if over a local TCP/IP network (hopefully a high performance, NATed, local network) - well, at least you are in the range of things that we should know how to secure.  Even though we often fail.

Trouble is, this only works for services that can be decoupled via message passing.  It doesn't work for a service that requires access by the assistant embedded processor to master CPU memory, e.g. for virus scans of active DRAM, or for DRAM deduplication.  But it can work well enough for things like virus scanning of data as it flows from NIC to ME to CPU. If it flowed that way...

Message passing not fast enough?  We know how to make it faster.  Zero copy tricks.   Direct modification of master CPU page tables.   Hardware TLB shootdown mechanisms.  Easier when all parties have similar virtual memory architecture.
      Trouble is, most modern CPUs can only vaguely hope to accelerate message passing interfaces - via a "Smart" embedded assistant processor.
      In some ways I advocate standardizing message passing, possibly using risky shared memory tricks - so as to reduice the need to use such shared memory tricks to interface to other "Smart" devices.

Even higher performance message passing directly accessing processor registers.   But that is even harder to do heterogenously, although I can imagine APIs.

But...  the infamous Intel AMT bug occurred because the "smart" embedded processor was running a poorly secured webserver.  Yep... there's no helping it.   Although we should know how to secure a webserver, we keep f**king up.
      But...  a bug in an SOC embedded webserver woud not have been so bad if isolated to a subsystem that could only access 1 NIC in a multi-NIC system. It is the fact that the subsystem that was compromised has such global access, to everything, more than an OS or hypervisor - that is the really bad thing.

IMHO the Principle of Least Privilege is not just a good idea - it should be the law.  As in, if a company designs a system with blatant disregard for the PLP, and if somebody sustains losses as a result of a security compromise, then they should be liable. And more.
     All of these mitigations are really just ways of saying "Principle of Least Privilege".
 
If you have to allow the "smart" assistant processors shared memory access to master CPU memory: Hardware level firewalls. Standard control and status register space and enumeration, so that you can verify that they are correctly configured.   Unchangeable hardware IDs as well as software IDs on bus transactions, so that you can easily create invariants like "the network packet filter is never allowed to directly access the audio microphones".  (Of course, you might want to allow the NIC to access the microphones for ultra-low-power voice conferencing. And the packet filter, for NSA-like stuff.)

Hardware Page Protection Keys:  associated with physical addresses, or at least the addresses seen on the bus. In addition to virtual address based protection.

Memory Encryption:  may not prevent access by malware running on an embedded processor to CPU storage, or vice versa.  But may prevent secrets leaking, and prevent malware crafting attack packets, whether code or data, to compromise further.\

I suppose that I should also mention capability systems for privilege management, both OS/SW, but possibly also hardware to shared memory. Fine grain.   But I butted my head against that wall, most recently with what became Intel MPX.

Conclusion

"Smart" assistant embedded processors have security risks.  

We really want to prevent "smart" assistant embedded processors from being "smart-ass".  Or, worse, "evil genius".




No comments: