The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Saturday, June 20, 2009

Blogging from ISCA: AMAS-BT keynote

I'll be at ISCA the next few days.

Today: AMAS-BT workshop.

Antonio G. keynote.

Unfortunately, Antonio G and I share the same initials,AG. I will annotate him by Antion, me by me, or AFG.

Pollack's Rule: I suppose I should be gratified that one of my laws, perf = sqrt(power), is now widespread. I am somewhat chagrined that my old boss, Fred Pollack, has his name asociated with it. He publicized it in some keynotes.

Somehow perf=sqrt(power) has also crept in. And Antonio is multiplying the effects. This was not part of my, or Fred's, formulation. Perf=sqrt(area). We often assume that power, at least leakage, is 1:1 with area, which would imply perf=sqrt(power). But I do not think that this needs to be the case. Leakage may be negligible, and active power also seems to be proportional to sqrt(area). Or even less. This implies that perf=sqrt(active power), or less.

Antonio says that multicore => 1:1 perf increases. 2 cores => 2x parallelism. Q: is this correct? The old rule of thumb is that MP, too, perf=sqrt(#processors).

Antonio makes the EPI = Vdd**2 * Cdyn + Leakage. Handwaves leakage. Says Vdd cannot be lowered. (Me: is this true? Differential signalling?) So argues about Cdyn.

Guest ISA / Host ISA. Me: although part of the story, the real challenge is not BT fro ISA to ISA. The real challenges are (a) coming up with a host uarch and ISA that makes sense - that would make sense if compatibility was not a requirement. (b) Minimizing the cost of dynamic instrumentation and optimization.

I.e. the basic host ISA and uarch must make sense, irrespective of the guest ISA.

With an exception: possibly the host ISA is big, with lots of hint bits. The guest ISA may be small and compact. In this case, perhaps the act of binary translation itself helps. The guest ISA may be considered to be a cpmpact form of the host ISA, with just the semantics. The host ISA may be considered to be an expanded form. The host ISA may be considered to be a cache of performance annotations to the guest ISA.

Antonio: memory checkpointing. AFG comment: easier to BT single threaded or message passing programs, harder shared memory.

Antonio: adapting hardware. Resizing, power gating. AFG: hardware can do simiar adaptation. Software dynamic must have larger time constants.

Antonio: BT advantages include compatibility, both over time and across different microarchitectures (which he calls scalability). He notes that forward compatibility is especially interesting. E.g. old binaries taking advantage of new hardware features, like longer vector registers.

Pardo asked about soft real time workloads. Variability introduced by dynamic systems.

No comments: