Krazy Glew's Blog

Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, July 27, 2009

Error Handling

I've been thinking a lot about the examples Andrei Alexandrescu used in his talks about the D programming language.

Briefly, Andrei pointed out how the "Hello World", simplest possible program examples, in nearly all popular programming language books, are not correct, or at leastr are not correct if an error occurs. They do not check for the return value of functions like printf, or they do not arrange for the OS to see an appropriate error code if the program fails.

Andrei argues that a programming language should make it as easy as possible to write correct code, complete with error handling.

The D programming language has constructs such as scope(exit/success/failure) and the now ubiquitous try/catch to make resource management in the presence of exceptions easier. But overall, the D strategy is to have errors be reported by exceptions. A lazy programmer, writing the simplest possible program, may assume success; the exceptions will get thrown by library routines on an error, and the default excedption handler will propagate the error correctly into the OS's error exit codes, etc.

Comprehensive error detection should be accomplished with the simplest possible code. Without having to clutter the code with error handling, IF syscall_return_code ! 0 ... Comprehensive error handling still requires a higher level of expertise, but there D's new features may help.

---

I think this is all good.

However, I think that scattering try/catch all over the place is quite ugly, leading to cluttered code. Yes, RAII and scope(exit) will help. But occasionally they are not the right thing.

I've written libraries that use exception throwing as thweir error reporting strategy. (I often throw string error messages, and stack up the error messages from most abstract to most detailed.)

I've written test suites for such libraries with exception based error reporting. They often look like long lists of test cases like the following:


    expected_exception_caught = 0;
    try { 
         test_function_1(a,b,c);
    }
    catch (.... ) {
         assert("expected this exception");
         expected_exception_caught = 1;
    }
    if( ! expected_exception_caught ) {
         assert( ! "did not get expected exceptrion" );
    }

I will often use macros to make this more compact. Or test suite lists, with an exception expected flag.

I have had reason to compare such code to code using traditional C style exit codes. When a function must return an ext code, it's API becomes clumsy, because it must arrange for the real return value to be returned some other way, typically via a pointer to a return area.

However, the code that exercises the error return code libraries often looks clearner, than the try/catch code.

I'd like the best of both worlds. Conceptually, return a tuple (error_code, return_value). (For that matter, multiple return values are nice.)

However, allow the programmer to omit the capture of the error_code return value. If not captured, throw an exception.

Also, possibly, to prevent the programmer from just capturing an erro_code return value, but doing nothing with it, require a captured error_code to be "consumed" - explictly markeed "I have handled this error."

Possible something like


          {
            (Error_Code error_code, char* string) =  test_function_1(a,b,c);
            assert( error_code && "error was expected" );
         }

Manager vs. Maker's Schedule

I found this on slashdot:

http://tech.slashdot.org/story/09/07/27/1528201/Managers-Schedule-vs-Makers-Schedule

http://www.paulgraham.com/makersschedule.html

Briefly:

Managers schedule their days in 1 hour chunks.

Makers schedule in chunks of half a day, at least.

A 1 hour meeting in the middle of the morning can blow away an entire half-day, for a Maker.

My team members - the interns and NCGs who have worked with and for me - will vouch that this applies to me. It applies especially to pair programming.

My coping strategy: I often block out time 10-11 and 2-4pm. And then only accept 1 meeting in each afternoon or morning. It still results in fragmented time, but it is better than nothing.

Thursday, July 16, 2009

Andrei Alexandrescu: the Case for the D programming language

=== Andrei Alexandrescu, the Case for D ===

Attended a talk by Andrei Alexandrescu, the famous C++ author. E.g. "Modern C++ Programming". Basically, the guy who wrote the book on many fancy C++ template metaprogramming techniques.

Talk arranged by Gyuszi Suto, Intel DTS. Also attending: Scott Meyers, Effective C++. I did not know he lived in the Portland area.
I used to correspond with Andrei. Had never met in person. Had never corresponded or met Scott.

Alexandrescu's talk was on his work with Walter Bright on the D programming language.

Many neat ideas about programming. In many ways, a nice replacement for C++. One might say, C++ done (almost) right.

Since you can look at the slides, and/or at the webpages, my blog will mainly be on my reactions.

---+ Introduction

I very much liked the introduction "Why all programming language book versions of Hello World are incorrect". Yes, really. Even though Andrei made mild fun of me while doing so.

Brief: K&R hello world does not return corect error codes, whether success or failure. C++, Perl, Python, Java, similarly broken.

D tries to make the easiest code to write also correct. Handle errors. E.g. throwing exceptions, implicitly caught outside main, usual path for library error reporting. E.g. multithread correct by default.

---+ Initialization

Everything is initialized (unless the programmer says not to, by saying "int x = void").

Unclear what default initialization is. Just forcing initialize to zero is better than nothing, but not great.

It took me a surprisngly long time to find this, but http://www.digitalmars.com/d/2.0/type.html.

The default initializer is nearly always 0. Except for floating point, where it is a signalling NaN. And char, where it is 0xFF, 0xFFFF, or 0x0000FFFF, depending.

Enum default initializer is the value of the first member of the enum. (Glew: usually a good idea, but may not be if you want the eum to interface to a hardwsare register. I wonder if this is the default default initializer, but if yiu can overide the default default initializer with an explicit default initializer for enum. Tongue twister.)

I found this post by Walter Bright, D's original designer, explaining: http://dobbscodetalk.com/index.php?option=com_myblog&show=Signaling-NaNs-Rise-Again.html&Itemid=29

D has a feature where all variables are initialized to a default value if no explicit initializer is given. Floating point variables are default initialized to a quiet NaN. Don suggested that instead, the default initializer should be a signaling NaN. Not only that, he submitted patches to the compiler source to do it. Even more significantly, others chimed in wanting it.

Signaling NaNs now play in D the role they were originally created for -- to make the use of uninitialized floating-point variables trivial to track down. They eradicate the nightmares you get in floating-point code when code fails intermittently. The instant recognition of how useful this can be indicates a high level of awareness of numerics in the D community.

OK, so FP are initialized to signalling NaNs. This is good. Although maybe not so good for computer architectures that have deprecated NaN support.

Initializing FP to signalling NaN is safest.

Initializing integer types to 0 is better than nothing. But even this can give errors. Initializing to something that signals wiould be nice. Buit there is no standard for signalling for integer types. I have created template classes that have a valid bit, but it is unreasonable to make that default. I guess that 0 is as good as can be done in present state of the world.

I asked the compiler guy for some project to give me a compiler switch to change all integer types like int to MyTemplate. E.g. Valid. Or to change all pointer types char* to CheckedPointer. But I think that this would disagree with D's "no name hijacking, ever" principle.

You can initialize struct/class members in place:

class C {
int a = 6;
}

Yay!!!!

---+ Shell Integration

I like.... I miss Pop-2, Pop-4.

No eval. I guess I am not surprised... although eval is powerful.

---+ System Level Features.

Real pointers, real addresses. Can forge pointers.... "forgiong" means something different to capability based security people.

Modular approach to safety: code compiled with module(safe) cannot do thins like forging pointers; code compiled with module(system) can. Mark Charney and I bounced words at each other like embeddable, subsettable, defeaturable. Mark recently changed a big tool from C++ to C to avoid the overhead of C++ run-time I chatted with BSD kernel guys on same topic. Real systems programming languages allow stuff to be stripped out, like exception handling, like GC.

"Standard library is trusted": this almost set off red flags and alarms for me as a security guy, since trusting printf like code was one of the classic Burrough's era security bugs. But I realized that Alexandrescu's "trusted", from the D language perspective, and my perspective frm the point of view of real hardware security, is very different. Basically, even though print is "trusted" as a library, I am still not going to implement printf as a system call, with the character formatting in the kernel.

---+ Large-Scale Features

True module system. Semantics independent of inclusion order. No name hijacking, ever.
I asked how Michael Feathers' tricks, in Working with Legacy code, to get at seams of legacy, e.g. by #defines, or linker scripts, would work. I understand the motivation for no hijacking. However, I suspect that a more structured way of hijacking might be desirable. Not necessarily order dependent. But more like "In module Foo, please intercept all calls to module Bar, and route them through module BarWithDebugging".
Andrei also kept on about how each module is a file. Since I'm a guty who spent days figuring out how I could cram multiple Perl packages into the same file, I am not so sure I like this module=file enforcement.
I do like how D allows the header to be created automatically from a module. And/or allows separate headers and modules, and enforces compatibility. Avoids writing code twice, but permits if you really want that. (Again: I often create "header only libraries", like Boost. And created the interface generator for iHDL, I hate writing code twice, even declaration/definition.)

Contracts. Yay! I thought at first it was only a subset of Eiffel contracts, but it appears complete.
Scott Myers asked Andrei a gotcha question. D has "protected". But apparently the class invariants were not imposed before/after protected methods. Sounds like will get fixed.

Code annotated with keyword unittest can be embedded, and run before main.
Nice. But I started getting afraid of creeping featurism.
D defines lots of keywords. At the very least, I want to give up on keywords - require keywords to be distinguished by XML. I'm into post-Ascii languages.

const and immutable.
const = "this function in this thread will not modify this const parameter reference. But other threads may be."
immutable = "guaranteed that no function in no thread anywhere will modify"
Afterwards talked with DT guys and Scott Myers about how multithreading means that const optimizations are unsafe. This is not news to me.
Quibbles: immutable seems less string to me that const. Maybe "const const"? Keyword proliferation.
void print(const char[] msg) { ... } - const bridges mutable and immutable, same function handles mutable and immutable arrays. (Similar transitivity for shared)

pure functions - good
Andrei had a contrived example of how you could declare a function pure even if it had local variables. Yawn. I think this must matter to functional purists. But to practical folk, this is obvious.

---+ Programming Styles

Imperative (OOP), functional, generics. Not purist. Not "everything is a ..."

Hash tables are built in or standard.
TBD: optimize ...

---++ struct vs class

struct have value semantics (just like int)
No struct polymorphism.
All other class-like amenities - constructors, op overloading.

classes have ref semantics. => inheritance and dynamic polymorphism

Q: I'm assuming that you can pass structs by ref or pointer... And can similarly pass class objects by value. But the latter may not be true.

---++ Object Oriented

Less runtime reflection than C++ or Java.

Q: is there any compile time reflection or introspection? I.e. can I automatically create a struct printer?

Multiple subtypes, but not inheritamce.

---++ Ranges, not Iterators

I understand motivation for
foreach (i; 2 .. n) result *= i;

although it is unfortunate, since many parallel programming types like me would like to use "foreach" to indicate visiting in no particular order.

---++ Generic Programming

I find the syntax

auto min(L, R)(L lhs, R rhs) {
return rhs < a =" min(x,">

I would like to understand rationale.

Ahhh.... the (L,R) are the "template" type parameters. Now I get it.

Andrei dwelt on the rational for not doing
auto min(L lhs, R rhs)

Syntactic ambiguity.

static if

If a regular function is evaluated in a compile time context, the compiler attempts immediate interpretation.
I meant to ask: compiler people have long known how to do this, but have had a problem: cross compilation. E.g. evaluating sin(1.0) may give different results on machine the compiler is running on than on the target. Heck, even integers, since #birs may vary. I meant to ask if D solved this, a la Java "compile once, run everywhere".

BTW, Scott Meyers asked if D detected integer overflow. No. I wonder if it would be legal to create a compiler than detected overflow. In C/C++, too much code depends on overflow, e.g. shifting bits off top.

---++ Functional Programming

Andrei wants to "erase off the face of the Earth" fib(n)=fib(n-1)+fib(n-2)

Went into a 3 slide jibe about expressing it as a loop rather than functional.

Since I have seen how Donald Knuth has an optimization, that *should* be in modern compilers, to do this, I was unimpressed.

This, and Andrei's rant about how "pure" functions can have temporary variables, leads me to suspect that he has engaged in debates with functional programming, Haskell, bigots.

Hmmm... seems most people haven't seen Knuth's recursive to iterative transformation.

Donald Knuth. Structured Programming with go-to statements. Computing Surveys 6 (4): 261–301. doi:10.1145/356635.356640. http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf. http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

Heck, that wasn't too hard to find. It was in wikipedia!

Revisiting Knuth's paper, I am reminded that he did not actually do Fibonacci. While he eliminated the tail recursion completely, he eliminated the interior recursion, but left in a stack. So you would have to go beyond Knuth's paper to convert Fibonacci to iterative, O(n) time, O(1) space. Methinks it should be doable, although I wonder if it can be done without symbolic execution.

---++ Scope statement

void transmogrify() {
string tempfile = "delete.me";
scope(exit) {
if (exists(tempfile))
remove(tempfile);
}
auto f = File(tempfile, "rw");
... use file f ...
}

Like C++ RAII

void transmogrify() {
string tempfile = "delete.me";
class FileCleanup {
public: FileCleanup(tempfile) { this->tempfile = tempfile; }
public: ~FileCleanup() { if (exists(tempfile)) remove(tempfile); }
} file_cleanup(tempfile);
auto f = File(tempfile, "rw");
... use file f ...
}

Except that you can say "If I am calling this destructor for any reason/because of an error/normal success".

OK, that wins. But.... are D's scopes first class? I have a library of C++ classes that are designed to be used as RAII objects.

By the way.... if a function or class defined locally refers to a variable in the surrounding context, and the lifetime survives the context, it can be converted implicitly to a closure. While this is great, it also sounds like a big source of bugs, performance, and memory leaks.

---++ Concurrency

D has a PGAS-like memory model. (PGAS = partitioned global address space, or, as I like to say, privae/global address space. See http://en.wikipedia.org/wiki/Partitioned_global_address_space)

By default all variables are private. Duplicated in all threads. Thread local storage. Even globals.
Hmm, interesting: this pretty much requires a base register for TLS if you want to access (private) globals. I wonder where we will get that base register?

But you can explicitly add a shared keyword.

shared is transitive. Like const leads to C++ const-ipation, shared will have similar issues. D may ameliorate them via its more convenient generic support.
I think that any transitive feature suffers const-ipation. Transitive shared or volatile const-ipation. Sounds disgusting.

Q: how do you say "void f(int a, int b)", where all combinations of shared and private int a and b are allowed?

Sure, *some* folks will just say "cast tio shared". Shared thread-safe code should be able to manipulate private variables, right?

Except... in PGA supercomputer land private and shared may use differeht instruction sets. Truly. In fact,. one of my goals is to allow the same instructions to be used for shared and private.

Andrei did not talk abot the memory ordering model. Heck, if you define a memory ordering model, on a weakly ordered machine even ordinary instructions used to access shared data must have fences added all over. Perhaps this is why we should encourage transitive shared const-ipation.

I liked Andrei's last slide:

Process-level concurrency with OS-grade isolation: safe and robust, but heavyweight
Andrei's aside: UNIX: it worked for years.

Thread-level concurrency with shared memory: fast, but fraught with peril
This is almost as pithy as Ashleisha Joshi's "Shared memory is a security hole".

Typesystem-level isolation: safe, robust, and lightweight
Although it may lead to Transitive shared or volatile const-ipation

---+ Conclusion

D is definitely interesting. Whether or not it will take off, I do not know. But it certainly has some nice features. If irt were ubiquitous, I would use it.

My biggest issues with D, apart from the fact that it is not ubiquitous (which is chicken ad egg) are

a) Keyword proliferation. I've said before: I am ready to transition into languages with XML tags around keywords.

b) Embeddability/subsettability/defeaturability. I would like to be able to, e.g. turn off the GC, or turn off the exception handling. I like the idea of a high level systems programming language, but the OS kernel programmers I know don't even use C++. They use C. It is not clear that D solves all the problems that prevent so many OSes from using C++ in the kernel.

Wednesday, July 15, 2009

Calendars and Phones

A friend just got an iPhone. Primary uses, apart from as a cell phone and music player: PDA stuff like calendaring.

Now she keeps her calendar on her iPhone. Both personal/family, and work related.

Her employer uses Outlook Exchange Calendaring. But they do not allow calendars to be synchronized with personal calendars outside of work. So my friend simply does not use the company calendar. She says that co-workers complain, when they arrange meetings with her that appear to be free on her company calendar, which is empty, but which she cannot make because of conflicts. But she says that is so much more important to her to be able to manage her personal and family life in the same place as her work meetings, that she is willing to put up with the loss of the shared work calendar.

Perhaps I should mention that she works, not exactly part-time, but very flexible hours. She is constantly managing family commitments, getting kids to events, as well as weekend and evening work assignments. Her day is not neatly partitioned into work and non-work.

MORAL:

People want both personal and work calendars on their devices like phones.
Corporate security rules get in the way.

Restart Time Matters

This morning, after a 6-7am meeting, my work laptop PC hung up. Restarting it highlights another performance issue.

The original hang was not bad, but annoying. The Nvidia nView Desktop Manager, that I use to change my screen settings - e.g. to go into and out of reduced resolution, so that the folks in Israel could see my slides - left a "turd" on my screen. A "just wait" box was thrown up, and stayed around for 15 minutes, after all Nvidia tool was closed down, after I left to do something else for a while. Past experience has shown that occasionally just waiting a few minutes leads to such turds being cleaned up - but not this time.

(I suspect that I could have killed rundll32.exe from the process (not task) manager, but I am always nervous about killing processes that I do not know exactly what they do. Especially sometrhing like rundll32.exe, that gets used by may facilities.)

I could have left this turd around and kept going. But it is annoying to have a small rectangular area on your screen that won't redraw. So I decided to restart my PC, in the hopes that (a) that would clean up the turd, and (b) I could go and have breakfast while waiting for the restart.

So, I hit restart. Waited a bit. Walked away.

After my beans and tomatos on toast (there, that proves I'm a Brit, even if my accent is more American than Canadian after 24 years in the US), I went back to check. I expected that I might have to type in my hard disk password (for PGP Full Disk Encryption).

But, surprise! Actually, no surprise, just disappointment. Windows' restart had hung, waiting for me to say "OK" to killing some application. There was a flurry of such kill dialog boxes, some with timeouts, but enough without timeouts that I had to sit and wait. The rundll32.exe that I suspect was used by Nvidia, that I suspect was causing the problem, was one of them.

OK, answer all of these dialog boxes. Now, surely, I can walk away...

Nope. Come back another 15 minutes later, to see a blank screen. Not dead - when I moved the mouse, and typed ctl-alt-del, suddenly things started proceeding to shutdown again. But, it certainly was hung waiting for something to be rejiggered or unblocked.

Finally, it reboots. Watch the reboot. Type in the PGP full disk encryption password.

What a loss! What a lost opportunity! If I could have overlapped the reboot with eating breakfast, I would have had half an hour more for work.

MORAL: even the time to resatrt matters. Automation, to allow a completely automatic reboot without having to type in passwords, or anticipation - having the reboot subsystem ask, in advance "Do you want to end all processes that aren't responsive", rather than waiting to ask me about them one by one. Automation and anticipation, even of something like the reboot process, would be a good thing.

At work we often say things like "Surely people don't care about the speed of rebooting after a Machine Check". Wrong! Maybe we don't care about is as *much* as we care about normal operation. But the performance (and automation) of anything we do a lot matters. And, on Wintel PCs, rebooting happens quite a lot.

--

There may be another moral here. I like the idea of microrebooting. Perhaps I couold microreboot the display manager, and/or otherwise tell the display manager that the non-responsive process's window should not be redrawn.

Sunday, July 12, 2009

Moving to the Net: Encrypted Execution for User Code on a Hosting Site

I'm gravitating towards establishing at least 2 net.presences - 1 on a shared hosting site, the other on something like Amazon ECC where I get to administer the (virtual) machine. Working on the assumption that it is unlikely that both will go down simultaneously. (The sysadmins of the shared hosting site are more likey than I am to respond quickly to some spreading net.malware that requires immediate patching.)

Plus, of course, the usual personally owned computers/servers.

The prospect of moving my "home" off computers I own to computers that hosting service provides is a bit worrisome. E.g. would I/should I trust storing my TurboTax on a hosting site? Running Gnu Cash?

Put that way, it is more attractive to use net.storage like Amazon S3 for encrypted storage, but to always try to do the secure manipulation on computers I own.

However, just think of how already all my financial information is spread across various provider websites.

Anyway... Thinking about virtual hosting with Amazon S3... How could you have enough trust to do sensitive work on such a system? Or, rather, could we build systems where the hosting sysadmins would not be able to look at the data their hosting clients are working with?

It's a lot like DRM, Digital Rights Management, the sort of thing that motivated LT. However, instead of trying to prevent the user and owner of a PC from accessing unencrypted data on his own PC, what I would like to do in the hosting space is prevent the provider of the hosting service from accessing unencrypted data that belongs to the person renting the hosting service.

Let's imagine that this could be done. In some ways it might prevent one of the IMHO big advantages of centralized hosting, which is that you could have a sysadmin scanning for problems. The sysadmin might still be able to look for usage patterns that indicate a malware breakin, but the sysadmin would certainly have an easier time of it if he could look at the data.

It would also get in the way of another of the IMHO big advantages of centralized hosting: data de-duplication. (That seems to be the new term for what I have mainly been calling content based filesystems, inspired by rsync, Monotone, git, etc.)

E.g. I might reasonably want to store a copy of every piece of software I have ever bought on my net.storage-system. E.g. I might want to store every blamed copy of Windows I have ever had a license too. So that I can restore to the PCs that they run (ran) on, whenever... Subject to licensing, of course. Now say that the host is a big site, like Google or Yahoo or Amazon. Millions of Windows users might be doing the same thing. The opportunities for data de-duplication are large. But if each user's data is separately encrypted, deduplication would not work.

These arguments lead me to suspect that completely encrypting all information, in a way that prevents the host from looking at it, is undesirable. At the very least, optionally encrypting on a file by file basis would be desirable. E.g. I might encryt my TurboTax files, but not my saved copy of Office 95.

Okay, okay... now, can we think of ways of preventing the sysadmins managing a hosting service from having full access to at least some user data?

Start off with encrypting data on disk. Encrypt it in main memory. Encrypt it into the cache hierarchy.

Decrypt only when moving between cache and registers. (Or possibly in the innermost levels of the cache hierarchy (L0, L1).)

How do you prevent the OS from looking at the unencrypted register state? On an interrupt, re-encrypt the registers as they are saved to memory.

Note that this means that you probably won't do "efficient" interrupts, that leave as many registers as possible live. You have to save them.... or, at least, you might encrypt them in-situ. Or have hardware "lock" them in-situ, and then do a lazy save to the encrypted area. But, you can't let the host look at the unencrypted registers.

Debuggers would have to be disabled while manipulating encrypted content. Or... you could allow debugging, but "lock" registers containing encrypted content. The debugger would either not be allowed to look at such registers, or might be allowed to look at "obfuscated" contents. I say "obfuscated" because, while it might be challenging to encrypt a 64B cache line, it will be almost impossible to encrypt 64bit (32bit, or 8bit) registers. Despite these attempts, the debugger can probably infer register contents (for all bits do: jmp if set). So, while I can imagine ways of making it harder for a debugger to observe sensitive data, at the very least debugging would be a very high bandwidth covert channel. Just like reading the time or reading a performance counter, debugging should be a facility that requires special permission.

Besides, why would a hosting sysadmin want to debug a user's application? The only legitimate reasons I can imagine involve trying to figure out if a user has been compromised by malware. Some intrusion detection systems singlestep execute in a debugger.

So, I am formulating a vision where the OS may run user tasks, but the user tasks may establish encryption based security that prevents the OS from looking at the user task data. There is some sort of key inside the processor that the OS cannot look at. This key is used to encrypt and decrypt data as it flows between processor registers and cache/memory. Registers are saved/restored encryted. Hardware probably defines a save area for all register state. Registers may be locked against the OS, and/or for lazy encrypted save/restore.

System calls? The classic UNIX system calls - read, write, etc. - do not require the OS to interpret the memory contents. Thedy can handle encrypted as well as unencrypted data.

Bootstrapping? How do you bootstrap this? How do you start a user process, and give it a private key, in a manner that the OS cannot see? Basically, you will have to have a small trusted kernel, trusted bootstrap. It doesn't have to be done once and only once: I believe there are already many patents on late secure bootstrap (with me as one of the inventors). So, the untrusted OS can be running, and can invoke a small trusted kernel. This small trusted kernel must have its own private key(s), and can lookup tables of user private keys it maintains on disk, and/or obtain such keys across the network, after the appropriate secure handshake and communication. For my usage model - wanting to execute my stuff at a hosting site - I would probably prefer the "access across the network" approach. However, if I wanted to run cron jobs on the host, host-side storage of user keys might be necessary. It would be necessary for the user to trust this kernel, even though the user did not trust the OS. But this is the same issue as for DRM.

This trusted kernel could be microcode. But it could just as well be user code, with the trusted key hardwired in.

This is all very much like DRM. But it differs in emphasis: in DRM, you typically only want one or a few applications - the audio or video player - to be protected from everything else running on a system. Whereas here I have described how to run user applications on a hosting site, protected from the host OS (and any other software). In this usage model it is desirable to be able to protect almost any application from the OS. The DRM audio and video subsystem can be a special case; but this hosting application wants to truly be part of the OS. Or, rather - it wants to NOT be poart of the OS, but wants to be a facility available to most processes and users under the OS.

This works for generic user processes on a generic OS, e.g. a UNIX user account. It is not just limited to virtual machines.

Can the OS intercede, e.g. by munging with the interrupt return IP or virtual memory? While we probably should not prevent the OS from such manipulations, we can prevent the OS from gaining access to sensitive data. The state and/or code being returned to can be authenticated. Untrusted code can be prevented from ever using the trusted keys.

It would be straightforward to similarly protect the code from the OS. But, while DRM may want that, I think that is much less necessary for this hosting usage model. Generic code, manipulating sensitive data. E.g. I want "cat" and "grep" to be securable. I envisage something like

sensitive-data gnu-cach dump account | sensitive-data grep my-hotel-bill > file-whose-sensitive-stuff-I-can-only-view-on-my-client

or

sensitive-data gnu-cach dump account | sensitive-data grep my-hotel-bill | sensitive-data -stdout unencrypted more

Friday, July 10, 2009

Yahoo Geocities closing - Don't Trust Free Websites

Yahoo Geocities is closing down.

For several years this was the home of my resume, and other stuff: http://www.geocities.com/andrew_f_glew.

Yahoo gives you the option of paying 5$/month for webhosting, or saving your files. Yahoo requires you to save the files one at a time, clicking Save As on a menu. There is no ability to download all of your files as a tarball or other archive. There is no WebDav access, or other filesystem-like access. I.e. there is no easy path to automatically transferring a large website.

Fortunately, I did not have many files on Yahoo.

Even more fortunately, I abandoned Yahoo Geocities a few months ago, and moved all of my stuff to Google Docs. http://docs.google.com/Doc?id=dcxddbtr_6dvpxg2cj&hl=en

However, there is no guarantee that Google Docs will remain free, or even available, forever.

Google Docs doesn't appear to have an easy way of downloading all files. Who was it that provided such a facility? One of the wiki sites that closed down?

Coincidentally, I have been shopping for a hopefully more permanent home for me and my files on the web. I've been looking at Amazon S3 and ECC. Yes, I am willing to pay.

Amazon ECC is good for virtual hosting. However, while I like being root, I would also like an ordinary user account on some UNIX system. I would like somebody else to be responsible for keeping up to date on security patches. Ideally I would like to run a wiki on such an account, but I want the wiki to run, not even as me, but as a Mini-Me, with even less privilege.

I guess I want it all: the convenience of having someone else sysadmin, but with the ability to run a small number of personal web services in deprivileged accounts.

For that matter, I'd like the hosting system to run a non-standard OS like FreeBSD, and a non-standard instruction set. ARM? Power? Is it security through obscurity?

As for my free websites:

* I learned that I still hasd a Yahoo mail account from long ago. Once again, no bulk download, unless I upgrade to pay, at which point I get POP.

* Google Docs - no automated access

* Google Mail - I have IMAP and POP access. I think I better start backing it up better.

Backing it up to S3? Wherever.

Just like there is a period of recent history that was lost, because information stopped being recorded on paper and started being recorded on quickly obsolete digital formats such as tape, floppies, etc., now we are traversing the stage where history is lost because of service providers closing down. One can only hope that we will come out the other end with storage as a commodity, with standard procedures for migration.