Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Sunday, March 04, 2012

IBM z196 high word facility


http://semipublic.comp-arch.net/wiki/High-word_facility

== Discussion ==

The [[high-word facility]] is somewhat intermediate between [[overlapping registers]] and [[extending registers]].

In terms of dataflow, assuredly a large, 64 bit, register contains at least two separate pieces whose dataflow must be tracked.
This means that a natural OOO implementation of the 16 64-bit registers would have to track 32 different 32 bit words.
Writing to a 64 bit register would update the renaming pointers for both halves, etc.
(Note that I say "natural". Alternate implementations are possible, e.g. at the cost of [[partuial register stalls]].)

However, the [[high-word facility]] uses the high word
without increasing the size of the register number field in the instruction encoding.

IBM's own literature implies that it was led to do this
because 16 GPRs was not enough.

The question is, whether one should consider something like the high word facility
- generalized, perhaps, to not just be high word, but to allow access to different parts of overlapped registers,
e.g. 32 bit scalars extended to 128 bit SIMD packed vector registers.

One might argue that, while 16 registers is not enough, 32 or 64 is more than enough.  So why bothedr?

Note that there are two levels. Level 0 is simply to provide accessors - e.g. access the X Y, Z, or W channels of a 128 buit wide quanity.
Level 1 provides operations, not just accessors.

== High word facility ==


The IBM z196 processor added the [[high-word facility]] or [[high word extension]]
to the [[IBM System z Mainframe ISA]], circa 2010.

Since its introduction in 1964, System 360 and all of its successors provided 16 general purpose registers.
Many programs were constrained by this rather small number of registers.

When the registers were extended to 64 bits, the upper 32 bits of the 64 bit registers became underused.
(In IBM parlance, bits 0-31 of the GPRs are the upper half, the most significant, and bits 32-63 are the lower half, the last significant.)

Many programs only needed 32 bit instructions and addresses;
indeed, many programs were limited to 32 bit addresses.
And even programs that use 64 bit instructions and addresses do not use them everywhere.
E.g. in C parlance, there may be 32 bit ints in a 64 bit register.
(Also, for that matter, 16 bit shorts and 8 bit chars or bytes.)

But even these 32 bit programs can benefit from an extra 16 32-bit in terms of register storage,
in the upper halves of the 64 bit registers.

So, this is what the [[high-word facility]] does: it provides a limited set of instructions that use the upper 32 bit halves of the 64 bit registers.

== List of Instructions ==

* ADD
** ADD HIGH, [[AHHHR]]: r_hi = r_hi + r_hi
** ADD HIGH, [[AHHLR]]: r_hi = r_hi + r_lo
*** Presenter comments "should perhaps be called ADD HIGH AND LOW".
** ADD HIGH IMMEDIATE, [[AIH]]: r_hi += imm32
* [[ADD LOGICAL]]
** ADD LOGICAL HIGH, [[ALHHHR]]: r_hi = r_hi + r_hi
** ADD LOGICAL HIGH, [[ALHHLR]]: r_hi = r_hi + r_lo
** ADD LOGICAL WITH SIGNED IMMEDIATE HIGH, [[ALSIH]]: r_hi += imm32
** ADD LOGICAL WITH SIGNED IMMEDIATE HIGH, [[ALSIHN]]: r_hi = -r_hi(self) + imm32
***  [[ALSIHN]] is like [[ALSIH]], but is an example of [[instructions that do not change condition codes]].

* BRANCH RELATIVE ON COUNT HIGH, [[BRCTH]]: if( --r_hi) goto target

* COMPARE
** COMPARE HIGH, [[CHHR]]: r_hi ? r_hi
** COMPARE HIGH, [[CHLR]]: r_hi ? r_lo
** COMPARE HIGH, [[CHF]]: r_hi ? S20
*** F/S20 is a storage operand, i.e. in memory specified by an addressing mode.
** COMPARE IMMEDIATE HIGH, [[CIH]]: r_hi ? imm32

* COMPARE LOGICAL HIGH: in HH, HL, HF and IH flavors

* LOADs:
** LOAD BYTE HIGH (signed/unsigned (LOGICAL))
** LOAD HALFWORD HIGH (signed/unsigned (LOGICAL))
** LOAD HIGH

* ROTATE THEN INSERT SELECTED BITS HIGH/LOW
** flavours to add 32 to bit indices

* STORE HIGH - 8 16, 32

* SUBTRACT HIGH: signed/unsigned, HHH and HHL flavors

The IBM slideset that I got this from says


    "Also note that, of necessity, certain characters in the mnemonics have become a bit overloaded. The
    rookie programmer will likely find using the high-word facility challenging. We hope the benefits will
    be worth it."

GLEW COMMENT: it is sad when mnemonics become an obstacle to use.


== References ==



    Many references, scattered.
    * New CPU Facilities in the IBM zEnterprise 196, share.confex.com, 4 August 2010, http://share.confex.com/share/115/webprogram/Handout/Session7034/New%20CPU%20Facilities%20in%20the%20z196.pdf
    * IBM z/Architecture Principles of Operation (SA22-7832-08), August 2010, http://www.ibm.com/servers/resourcelink/lib03010.nsf/0/B9DE5F05A9D57819852571C500428F9A/$File/SA22-7832-08.pdf (requires registration)


No comments: