Krazy Glew's Blog: Wednesday, July 06, 2011

Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Wednesday, July 06, 2011

Address Generation Writeback

http://semipublic.comp-arch.net/wiki/Address_Generation_Writeback

= Hardware Motivation for Address Register Modifying Addressing Modes =

Apart from the software datastructure motivation,
there is a hardware motivation for addressing modes such as pre-increment or decrement.

In more generality - addressing modes that calculate an address, and then save the results of that calculation
in a register - typically one of the registers involved in the addressing mode.

Consider [[Mitch Alsup's Favorite Addressing Mode]],
the x86's [[Base+Scaled-Index+Offset]].

A succession of such instructions might look like:

;an [[RMW]]
r1 := load( M[ rB+rO*4+offset1 ] )
r1 := ... some calculation involving r1 ...
store( M[ rB+rO*4+offset1 ] := r1 )

or

; nearby loads
r1 := load( M[ rB+rO*4+offset1 ] )
r2 := load( M[ rB+rO*4+offset2 ] )
r3 := load( M[ rB+rO*4+offset3 ] )

Notice the possibility of [[CSE (Common Subexpression Elimination)]]:

For the [[RMW]] case, this is straightforward:

rAtmp := [[lea]]( M[ rB+rO*4+offset1 ] )
r1 := load( rAtmp ] )
r1 := ... some calculation involving r1 ...
store( M[ rAtmp ] := r1 )

Is this a win? It depends:
* in terms of performance, on a classic RISC, probably note: the extra instruction adds latency, probably costs a cycle.
* in terms of power, possibly

For the nearby load case, we could do:

rAtmp := lea( M[ rB+rO*4 ] )
r1 := load( M[ rAtmp+offset1 ] )
r2 := load( M[ rAtmp*4+offset2 ] )
r3 := load( M[ rAtmp*4+offset3 ] )

or

rAtmp := lea( M[ rB+rO*4+offset1 ] )
r1 := load( M[ rAtmp ] )
r2 := load( M[ rAtmp*4+(offset2-offset1) ] )
r3 := load( M[ rAtmp*4+(offset3-offset1) ] )

Is this a win? Again, it depends:
* probably not in performance
* possibly in terms of power.

As an aside, let me note that if the base address rB+rO*4 is aligned, then the subsequent addresses could use [[OR indexing]] rather than [[ADD indexing]]. Which may further save power. At the cost of yet another piece of crap in the instruction set.
To avoid conundrums like this - is it better to [[CSE]] the addressing mode or not?
- processors like the [[AMD 29K]]
abandoned addressing modes except for [[absolute addressing mode]] and [[register indirect addressing mode]]
- so to form any nomn-trivial address you had to save the results to a register, and thereby were
encouraged to CSE it whenever possible.

Generalized pre-increment/decrement is a less RISCy approach to this.
Instead of encouraging CSE via separate instructions,
it encourages CSE by making it "free" to save the result ofthe addressing mode calculation.

:Let's call this [[Address Generation Saving]], or, if it modifies a register used in the addressing mode, [[Address Register Modification]]:

In our examples:

;RMW
rAtmp := [[lea]]( M[ rB+rO*4+offset1 ] )
r1 := load( rAtmp := rB+rO*4+offset1 ] )
r1 := ... some calculation involving r1 ...
store( M[ rAtmp ] := r1 )

; nearby load
r1 := load( M[ rAtnp :=rB+rO*4+offset1 ] )
r2 := load( M[ rAtmp*4+(offset2-offset1) ] )
r3 := load( M[ rAtmp*4+(offset3-offset1) ] )

Note that the nearby case that changes the offsets is simpler than the nearby case that saves only part ofthe address calculation:

;Using C's comma expression:
r1 := load( M[ (rAtmp := rB+rO*4, rAtmp+offset1) ] )
r2 := load( M[ rAtmp*4+offset2 ] )
r3 := load( M[ rAtmp*4+offset3 ] )

:: Again, [[OR indexing]] may benefit, for addressing mode [[(Base+Scaled-Index)}Offset]]

What is the probem with this?
* Writeback ports

Such an [[Address Generation Saving]] requires an extra writeback port, in addition to the result of an instruction like load.

For this reason, some instruction sets have proposed to NOT have [[address generation writeback]] for instructions that write other results, such as load,
but only to have [[address generation writeback]] for instructions that do not have a normal writeback, such as a store.

: Glew opinion: workable, minor benefit, but ugly.

Except that pre-increment/decre more general form:

Post-increment and decrement goes further, generalized in this way:
then the address calculation looks like

{ old := address_reg; address_reg := new_address; return old }

Not only does this require a writeback port for the updated address register,
but it also requires two paths from the AGU or RF to where the address is used
- one for the address, the other for the updated address_reg.

OVERALL OBSERVATION: addressing modes begin the slippery slope to VLIW.

Krazy Glew's Blog

Disclaimer

Wednesday, July 06, 2011

Address Generation Writeback

Blog Archive

Labels

Search This Blog

Followers

About Me

Links to Me