The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Tuesday, February 22, 2022

RISC-V extension names considered harmful

This is a special case of "stupid nonhuman friendly names considered harmful".

 RISC-V, like any instruction set, as extension. The extensions need names.  The current standard names are stupid and human unfriendly.

 For example, I  just now see email that talks about "working through the process to move Svnapot aka the former Zsn to public review".   I am the guy who originated the RISC-V NAPOT proposal for large pages.  *I*  had to do a double take to parse Svnapot -  but at least "NAPOT" appeared in there somewhere, after I mentally parsed Svanapot into S.v.napot, where  S = system, v = virtual memory, ...

 The earlier name Zsn,  I could never remember - and, again, I contributed the term NAPOT  to the virtual memory discussion.

Stupidly short and obscure names had friction. They waste mind share.

 other RISC-V extension names include





Some of these you can guess about what they  apply to.

 Ag comment: have verbose and compact names

Z-atomic-memory-operations ==> Zam

 and so on


 Did I mention that 1 of the reasons given for short  obscure extension names was that they needed to fit within 8 or whatever characters on the command line processor for compiler?   

Do It (All) Upfront vs Do It As Needed

Two different styles  of getting work done:

When I started as a professional programmer, e.g. writing C  code for PC hardware,  one of my "superpowers" was that I might find a hardware manual for a device, e.g. a UART,  and I would write a fairly complete implementation, e.g. a header file that had #defines for all of the encodings used write control registers that device.

 I.e. I did it all upfront.   actually not all, but I did a lot upfront.

 This served several purposes

  •  it was a good way for me to study the device, hardware, software interface that I was using
  •  compared to only implementing what I needed at the moment, it meant that when I realized I needed something else I usually did not have to stop to look at the manual. I could usually just pick the typical name that I would use out of midair and coded and it would work
BTW I conjecture that 1 of the reasons I did things like this upfront  was that before I was a professional programmer, when I was a programmer on duty at the helpdesk at the University or a lab administrator,  the code that I wrote was often used by lots of people, several classes of students or all of the computer users at the engineering RJE sites that I used.   it wasn't just the question of what I knew was going to be used in advance. It was a question of being able to head off problems for  my users.

In my Masters degree when I implemented an instruction set decoder generator (IIRC for MIPS)  I went out of my way to implement almost all of the instructions in the instruction set. Not just the instructions that I needed for my simulator project.  This allowed me to use randomized instruction testing, and also to test my decoder tools against all of the instructions and all of the libraries on the systems I could find.

This do it upfront  approach served me in good stead for many years. 

On the other hand, it probably cost me  an extra 6 months finishing my Masters degree. :-(  But the extra knowledge I acquired was probably part of the reason Intel hired me as a computer architect NCG :-)

I remember being absolutely flabbergasted when I learned that Intel's own simulators were not capable of decoding all of the instructions in the machine. When I was told that they only added them as they were needed by performance studies.  Note that this was not when I arrived at until the 1st time, but after I left for graduate school again and then returned to until. I.e. this incremental approach was adopted while I was away from Intel.

Obviously this was the right way to do things wasn't it?   obviously Intel was successful...  Although let me note that this was the time when Intel fell behind the competition.

I give more credits to the examples of XP  extreme programming and agile programming -  work incrementally,  writing tests in advance.   which caused me to  rethink twice, and many more times, about might do it upfront attitude.

One of the big things I did learn from Intel and from XP/agile is about the cost of testing.   when I wrote a complete header file from a manual page I usually wasn't writing a test for every individual feature. Often times writing the test takes 2 or 3 or many more times more than actually writing the simple interface.

But note what i said  about a reasonably complete implementation of an instruction set decoder allowing randomized  testing, and testing using all of the actual real code that you can find. Those were tests that were easy to write. And for the most part, easy to implement the code the random correctly.

Importantly, the way that I chose to implement my instruction set  decoder generator, which also inspired Intel's current microcode approach, is decentralized.  Pattern driven. It makes it very easy to or remove arbitrary instruction  encodings.   the human programmer, me, doesn't write code that looks at bitfields at a time. The human programmer writes patterns directly derived from the instruction set tables, and has a program that generates the (hopefully optimize) code that looks at the  bitfields from the patterns.   Independent patterns make it easy to add and subtract things on the fly. Whereas  the older style of instruction set decoder  centralized code needs to be modified as arbitrary things are added or subtracted. 

So my decentralized approach makes it easy to disable particular test cases.    and therefore easy to enable the general case and disable only the cases that are too hard to test at any point in time.

But conversely, my decentralized approach also makes it easy to add things  only as needed.   Which is what those later Intel simulators did.   and as long as the decoders were complete, as long as the decoders did not incorrectly report that an unimplemented construction was something else, they  would at least catch the  unimplemented things.

What's the cost?

Well, my upfront approach often cost me time upfront, but helped me come an expert

The do it later approach might have reduced time upfront. But sometimes it produced really surprising errors -  like telling you that there was an unimplemented instruction when you knew very well that it was a totally implemented instruction. Requiring that the user who was not a  simulator expert  learn what the simulator was not implementing.  It also led to project scheduling surprises -  when you were implementing a feature,  running you traces/performance benchmarks/workloads, and all of a sudden a hitherto  unimplemented instruction was discovered. One that you had not planned on spending the time to implement.

There needs to be a balance here  between upfront cost and avoiding surprises downstream.

 for the instruction set simulator/decoder generator example, I think the cost might be along the lines of
  •  completely decoding so that you can provide a full disassembly of all instruction
  •  but not necessarily implementing them, if the implementation is too complicated
Although for an OOO micro-dataflow simulator I think you would also be reasonable to  provide all dataflow inputs and outputs, such that you could correctly implement the dataflow, albeit not necessarily with reasonable  latency characteristics.    and then to have the simulator  count and flag how many such inaccurately modeled instructions are present.


The fellow who espouses only doing stuff  as needed will often win out in the corporate rat race:  managers always like the lowest work estimates.

And although I mentioned that XP/agile tends to lead to incremental work, XP/agile also encourages refactoring, to reduce technical debt

 doing stuff upfront reduces technical debt.   but of course only if you actually use stuff that you did upfront.

 deferring stuff to do as needed --  well, that's the definition of technical debt, assuming you actually need it.

 Another note:  doing stuff upfront often exposes excess complexity.     Whereas defining stuff, e.g. the actual instructions yet, but not implementing it in your actual  performance   accurate simulator, hides the complexity under the carpet.     IIRC many of the unimplemented features were things like security, virtualization, encrypted memory -  supposedly not performance critical, therefore supposedly not belonging in the performance simulator.    


 the above thoughts are me retrospectively about computer architecture simulators, a very large part of my career.

 however, they have been inspired today by me working on speech commands so that I can control my computer without exacerbating my computeritis hand pain.

 unfortunately  the standard way to have a speech recognition program like Nuance Dragon control other applications is to emulate keyboard and mouse events. 

Note that I do not call this the BKM because it is not the best-known way -  BKM  is for the application to have a command language like GNU EMACS. But that is not the state-of-the-art in speech recognition.

 unfortunately, I have to use other applications in addition to GNU EMACS. Applications like Microsoft OneNote and Outlook and ...

  usually when  I start using an application via speech recognition, I  do several things
  •  I start by googling for keyboard shortcuts for the application in question. when I find them, it is usually a fairly simple set of text edits to convert them into a primitive command set.
  •  I start using things,  and over time I observed the need for new commands
  • One of the key things to record commands that I want to say but which are errors
  •  When I would like to have a speech command to do something that is currently on a menu tree
  •  I will typically take screenshots of menu tree, and  device commands appropriate to each of the leaves
 note: to some extent speech recognition systems like Dragon and Microsoft support allow you to navigate menu trees by a speech. But frequently you can say things much faster and more conveniently, and perhaps more importantly more memorably, more naturally, with appropriate speech commands rather than navigating "menu 1/menu 2/menu 3 ..."

  so I have been using speech commands for Microsoft OneNote for quite some time.  but today I realized that  I had not already implemented speech commands for things like drawing
  •  e.g. I said "... insert line...."   and there was no insert line command
  •  so I had to navigate through the menus " drawing >  shapes >   use arrow keys to select the graphic line  shape"
 this annoyed me. I was a little bit pissed off that I had not implemented this upfront.

 Of course, when I implemented "... draw line"  I also implemented all of 16 Items on the drawing shape menu".    with appropriate aliases  or synonyms, e.g. " insert diamond/rhombus/lozenge..."

Do I wish I had done this in upfront?   now, yes I do. However, at the time I implemented the  earlier sets of OneNote commands, I remember getting exhausted.    So perhaps it was appropriate.

 I also remember, and regret to admit that I still do not have a really good way of automating  the testing of my speech commands for OneNote.   I have much better automation for my gnu emacs speech commands,  not so much for the majority of Microsoft applications. Which was 1 of the reasons why it was exhausting. As I said above, implementing the command is often a lot easier than testing it.

 why am I writing this?

 Certainly not to say that I know the best known method, do it all upfront or do it incrementally.

Musing  myself about how to better implement some of these things. 
  •  I'm really glad that I'm using speech recognition. It is very much helping my computeritis pain.
  •  But I sometimes despair every time I need to write a whole new set of commands
  •  I would like to accelerate the process.
Thinking that  perhaps this "defer subtree" approach  was reasonable. Do all of a particular level of the menu tree upfront. But the defer subtrees until needed.

#Speech-recognition  #work-scheduling #HFIM