Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, July 16, 2009

Andrei Alexandrescu: the Case for the D programming language

=== Andrei Alexandrescu, the Case for D ===

Attended a talk by Andrei Alexandrescu, the famous C++ author. E.g. "Modern C++ Programming". Basically, the guy who wrote the book on many fancy C++ template metaprogramming techniques.

Talk arranged by Gyuszi Suto, Intel DTS. Also attending: Scott Meyers, Effective C++. I did not know he lived in the Portland area.
I used to correspond with Andrei. Had never met in person. Had never corresponded or met Scott.

Alexandrescu's talk was on his work with Walter Bright on the D programming language.

Many neat ideas about programming. In many ways, a nice replacement for C++. One might say, C++ done (almost) right.

Since you can look at the slides, and/or at the webpages, my blog will mainly be on my reactions.

---+ Introduction

I very much liked the introduction "Why all programming language book versions of Hello World are incorrect". Yes, really. Even though Andrei made mild fun of me while doing so.

Brief: K&R hello world does not return corect error codes, whether success or failure. C++, Perl, Python, Java, similarly broken.

D tries to make the easiest code to write also correct. Handle errors. E.g. throwing exceptions, implicitly caught outside main, usual path for library error reporting. E.g. multithread correct by default.

---+ Initialization

Everything is initialized (unless the programmer says not to, by saying "int x = void").

Unclear what default initialization is. Just forcing initialize to zero is better than nothing, but not great.

It took me a surprisngly long time to find this, but http://www.digitalmars.com/d/2.0/type.html.

The default initializer is nearly always 0. Except for floating point, where it is a signalling NaN. And char, where it is 0xFF, 0xFFFF, or 0x0000FFFF, depending.

Enum default initializer is the value of the first member of the enum. (Glew: usually a good idea, but may not be if you want the eum to interface to a hardwsare register. I wonder if this is the default default initializer, but if yiu can overide the default default initializer with an explicit default initializer for enum. Tongue twister.)

I found this post by Walter Bright, D's original designer, explaining: http://dobbscodetalk.com/index.php?option=com_myblog&show=Signaling-NaNs-Rise-Again.html&Itemid=29

D has a feature where all variables are initialized to a default value if no explicit initializer is given. Floating point variables are default initialized to a quiet NaN. Don suggested that instead, the default initializer should be a signaling NaN. Not only that, he submitted patches to the compiler source to do it. Even more significantly, others chimed in wanting it.

Signaling NaNs now play in D the role they were originally created for -- to make the use of uninitialized floating-point variables trivial to track down. They eradicate the nightmares you get in floating-point code when code fails intermittently. The instant recognition of how useful this can be indicates a high level of awareness of numerics in the D community.

OK, so FP are initialized to signalling NaNs. This is good. Although maybe not so good for computer architectures that have deprecated NaN support.

Initializing FP to signalling NaN is safest.

Initializing integer types to 0 is better than nothing. But even this can give errors. Initializing to something that signals wiould be nice. Buit there is no standard for signalling for integer types. I have created template classes that have a valid bit, but it is unreasonable to make that default. I guess that 0 is as good as can be done in present state of the world.

I asked the compiler guy for some project to give me a compiler switch to change all integer types like int to MyTemplate. E.g. Valid. Or to change all pointer types char* to CheckedPointer. But I think that this would disagree with D's "no name hijacking, ever" principle.

You can initialize struct/class members in place:

class C {
int a = 6;
}

Yay!!!!

---+ Shell Integration

I like.... I miss Pop-2, Pop-4.

No eval. I guess I am not surprised... although eval is powerful.

---+ System Level Features.

Real pointers, real addresses. Can forge pointers.... "forgiong" means something different to capability based security people.

Modular approach to safety: code compiled with module(safe) cannot do thins like forging pointers; code compiled with module(system) can. Mark Charney and I bounced words at each other like embeddable, subsettable, defeaturable. Mark recently changed a big tool from C++ to C to avoid the overhead of C++ run-time I chatted with BSD kernel guys on same topic. Real systems programming languages allow stuff to be stripped out, like exception handling, like GC.

"Standard library is trusted": this almost set off red flags and alarms for me as a security guy, since trusting printf like code was one of the classic Burrough's era security bugs. But I realized that Alexandrescu's "trusted", from the D language perspective, and my perspective frm the point of view of real hardware security, is very different. Basically, even though print is "trusted" as a library, I am still not going to implement printf as a system call, with the character formatting in the kernel.

---+ Large-Scale Features

True module system. Semantics independent of inclusion order. No name hijacking, ever.
I asked how Michael Feathers' tricks, in Working with Legacy code, to get at seams of legacy, e.g. by #defines, or linker scripts, would work. I understand the motivation for no hijacking. However, I suspect that a more structured way of hijacking might be desirable. Not necessarily order dependent. But more like "In module Foo, please intercept all calls to module Bar, and route them through module BarWithDebugging".
Andrei also kept on about how each module is a file. Since I'm a guty who spent days figuring out how I could cram multiple Perl packages into the same file, I am not so sure I like this module=file enforcement.
I do like how D allows the header to be created automatically from a module. And/or allows separate headers and modules, and enforces compatibility. Avoids writing code twice, but permits if you really want that. (Again: I often create "header only libraries", like Boost. And created the interface generator for iHDL, I hate writing code twice, even declaration/definition.)

Contracts. Yay! I thought at first it was only a subset of Eiffel contracts, but it appears complete.
Scott Myers asked Andrei a gotcha question. D has "protected". But apparently the class invariants were not imposed before/after protected methods. Sounds like will get fixed.

Code annotated with keyword unittest can be embedded, and run before main.
Nice. But I started getting afraid of creeping featurism.
D defines lots of keywords. At the very least, I want to give up on keywords - require keywords to be distinguished by XML. I'm into post-Ascii languages.

const and immutable.
const = "this function in this thread will not modify this const parameter reference. But other threads may be."
immutable = "guaranteed that no function in no thread anywhere will modify"
Afterwards talked with DT guys and Scott Myers about how multithreading means that const optimizations are unsafe. This is not news to me.
Quibbles: immutable seems less string to me that const. Maybe "const const"? Keyword proliferation.
void print(const char[] msg) { ... } - const bridges mutable and immutable, same function handles mutable and immutable arrays. (Similar transitivity for shared)

pure functions - good
Andrei had a contrived example of how you could declare a function pure even if it had local variables. Yawn. I think this must matter to functional purists. But to practical folk, this is obvious.

---+ Programming Styles

Imperative (OOP), functional, generics. Not purist. Not "everything is a ..."

Hash tables are built in or standard.
TBD: optimize ...

---++ struct vs class

struct have value semantics (just like int)
No struct polymorphism.
All other class-like amenities - constructors, op overloading.

classes have ref semantics. => inheritance and dynamic polymorphism

Q: I'm assuming that you can pass structs by ref or pointer... And can similarly pass class objects by value. But the latter may not be true.

---++ Object Oriented

Less runtime reflection than C++ or Java.

Q: is there any compile time reflection or introspection? I.e. can I automatically create a struct printer?

Multiple subtypes, but not inheritamce.

---++ Ranges, not Iterators

I understand motivation for
foreach (i; 2 .. n) result *= i;

although it is unfortunate, since many parallel programming types like me would like to use "foreach" to indicate visiting in no particular order.

---++ Generic Programming

I find the syntax

auto min(L, R)(L lhs, R rhs) {
return rhs < a =" min(x,">

I would like to understand rationale.

Ahhh.... the (L,R) are the "template" type parameters. Now I get it.

Andrei dwelt on the rational for not doing
auto min(L lhs, R rhs)

Syntactic ambiguity.

static if

If a regular function is evaluated in a compile time context, the compiler attempts immediate interpretation.
I meant to ask: compiler people have long known how to do this, but have had a problem: cross compilation. E.g. evaluating sin(1.0) may give different results on machine the compiler is running on than on the target. Heck, even integers, since #birs may vary. I meant to ask if D solved this, a la Java "compile once, run everywhere".

BTW, Scott Meyers asked if D detected integer overflow. No. I wonder if it would be legal to create a compiler than detected overflow. In C/C++, too much code depends on overflow, e.g. shifting bits off top.

---++ Functional Programming

Andrei wants to "erase off the face of the Earth" fib(n)=fib(n-1)+fib(n-2)

Went into a 3 slide jibe about expressing it as a loop rather than functional.

Since I have seen how Donald Knuth has an optimization, that *should* be in modern compilers, to do this, I was unimpressed.

This, and Andrei's rant about how "pure" functions can have temporary variables, leads me to suspect that he has engaged in debates with functional programming, Haskell, bigots.

Hmmm... seems most people haven't seen Knuth's recursive to iterative transformation.

Donald Knuth. Structured Programming with go-to statements. Computing Surveys 6 (4): 261–301. doi:10.1145/356635.356640. http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf. http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

Heck, that wasn't too hard to find. It was in wikipedia!

Revisiting Knuth's paper, I am reminded that he did not actually do Fibonacci. While he eliminated the tail recursion completely, he eliminated the interior recursion, but left in a stack. So you would have to go beyond Knuth's paper to convert Fibonacci to iterative, O(n) time, O(1) space. Methinks it should be doable, although I wonder if it can be done without symbolic execution.

---++ Scope statement

void transmogrify() {
string tempfile = "delete.me";
scope(exit) {
if (exists(tempfile))
remove(tempfile);
}
auto f = File(tempfile, "rw");
... use file f ...
}

Like C++ RAII

void transmogrify() {
string tempfile = "delete.me";
class FileCleanup {
public: FileCleanup(tempfile) { this->tempfile = tempfile; }
public: ~FileCleanup() { if (exists(tempfile)) remove(tempfile); }
} file_cleanup(tempfile);
auto f = File(tempfile, "rw");
... use file f ...
}

Except that you can say "If I am calling this destructor for any reason/because of an error/normal success".

OK, that wins. But.... are D's scopes first class? I have a library of C++ classes that are designed to be used as RAII objects.

By the way.... if a function or class defined locally refers to a variable in the surrounding context, and the lifetime survives the context, it can be converted implicitly to a closure. While this is great, it also sounds like a big source of bugs, performance, and memory leaks.

---++ Concurrency

D has a PGAS-like memory model. (PGAS = partitioned global address space, or, as I like to say, privae/global address space. See http://en.wikipedia.org/wiki/Partitioned_global_address_space)

By default all variables are private. Duplicated in all threads. Thread local storage. Even globals.
Hmm, interesting: this pretty much requires a base register for TLS if you want to access (private) globals. I wonder where we will get that base register?

But you can explicitly add a shared keyword.

shared is transitive. Like const leads to C++ const-ipation, shared will have similar issues. D may ameliorate them via its more convenient generic support.
I think that any transitive feature suffers const-ipation. Transitive shared or volatile const-ipation. Sounds disgusting.

Q: how do you say "void f(int a, int b)", where all combinations of shared and private int a and b are allowed?

Sure, *some* folks will just say "cast tio shared". Shared thread-safe code should be able to manipulate private variables, right?

Except... in PGA supercomputer land private and shared may use differeht instruction sets. Truly. In fact,. one of my goals is to allow the same instructions to be used for shared and private.

Andrei did not talk abot the memory ordering model. Heck, if you define a memory ordering model, on a weakly ordered machine even ordinary instructions used to access shared data must have fences added all over. Perhaps this is why we should encourage transitive shared const-ipation.

I liked Andrei's last slide:

Process-level concurrency with OS-grade isolation: safe and robust, but heavyweight
Andrei's aside: UNIX: it worked for years.

Thread-level concurrency with shared memory: fast, but fraught with peril
This is almost as pithy as Ashleisha Joshi's "Shared memory is a security hole".

Typesystem-level isolation: safe, robust, and lightweight
Although it may lead to Transitive shared or volatile const-ipation


---+ Conclusion

D is definitely interesting. Whether or not it will take off, I do not know. But it certainly has some nice features. If irt were ubiquitous, I would use it.

My biggest issues with D, apart from the fact that it is not ubiquitous (which is chicken ad egg) are

a) Keyword proliferation. I've said before: I am ready to transition into languages with XML tags around keywords.

b) Embeddability/subsettability/defeaturability. I would like to be able to, e.g. turn off the GC, or turn off the exception handling. I like the idea of a high level systems programming language, but the OS kernel programmers I know don't even use C++. They use C. It is not clear that D solves all the problems that prevent so many OSes from using C++ in the kernel.