Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Sunday, February 06, 2011

Managing branch prediction history: copying short history versus pointing to large history

http://semipublic.comp-arch.net/wiki/Managing_branch_prediction_history:_copying_short_history_versus_pointing_to_large_history

When [[branch prediction history]] was short, 8-12 bits, and global,
then it was not unreasonable to manage the history by copying it.

E.g. in one simulator I actually arranged so that a branch [[uop]]
wrote  back, as its result
* an indication of whether it was mispredicted or not
* the taken [[target IP]]
* the branch predictor history to be restored on a branch misprediction.

I.e. the branch prediction history was propagated, in this simulator, from the [[instruction fetch front end]]
across the scheduler to execution, and back again.

While simple, this involves a lot of unnecessary data movement - both for the history, but also for the [[target IP]].
Most machines of my acquaintance create a [[branch information table (BIT)]], holding information for branches in flight.
This avoids copying the history from the front end to execution and back again,
but nevertheless may involve making copies of  the history.


    I confess an ulterior motive for copying the history and other branch information around: this naturally leads to the number of branches in flight scaling with the window size. Many early OOO designs were crippled by supported too few branches in flight.


Making copies of the history seems silly if, as in a TNT history, they differ only by 2 bits:
 new_history := (old_history << 1) | new_branch_taken_or_not


    (Note: when shifting a branch history, we will often say h<


But making copies seems to be required if we want to be able to restore to any mispredicted branch point, i.e. if we want to do [[instantaneous versus incremental branch misprediction repair]].

Making copies works well enough if the branch prediction history is small enough - 8 bits, etc.
But by the time we are talking about 16 bit histories and 32 or 64 bit IPs, we are talking about quite a few bits.

Furthermore, in the  late 1990s and early 2000s branch predictors arose with  hitherto unseen long histories,
such as Seznec's OGEHL predictor (with multiple history lengths 9, 2, 4, 8, 16, 32, 64, 128, ...).
Copying around a 128 bit history, even to a [[BIT]], is wasteful;
copying around the even larger 1000+ bit histories that have been proposed is even worse.

Hence the interest in pointing to a position in a branch prediction history,
rather than copying the entire history.
On a branch misprediction one would restore the pointer, rather than overwriting the  history with a savedcopy.

This would  be straightforward for a long [[TNT]] history, since only 1 bit depends on any branch.
You would simply keep a history of total length TL=PHL+BIF, the sum of predictor history length plus branches in flight.
On a misprediction you would restore the pointer in this circular buffer.
(Or equivalently shift the buffer- I suspect that shifting is too power hungry.)

This could scale to almost any length of history, potentially thousands of bits.

Unfortunately, modern [[stew]] histories are more complicated than [[TNT]] histories.
The [[branch IP]], or even both [[from IP]] and [[to IP]], may be [[hashed, e.g. XORed]] into the stew.
This means that several of the youngest bits in the history may change on every branch.
Simply restoring a pointer will not suffice.

TBD: explain stew management in more detail.

Simple strategy:  constrain the stew to have only the N youngest bits affected by the most recent branch.  Bits TL..N are unaffected  by the most recent branch, except for shifting.
One can then keep a copy of the parts of the history affected by recent branches, the N youngest bits, and a pointer that locates the older bits.

= See Also =

* [[How to use a really long predictor history]]

No comments: