The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, February 21, 2011

Pattern History Invariant Branch Prediction

    Taking a vacation day - actually, a so-called "floating holiday" on today, USA President's Day - so that I could be with my daughter, who is off school, and so that I could pick up my wife at the airport.

{{Terminology Term}}
[[Category:Branch Prediction]]

= Terminology =

Many branch predictors have a property that I call [[Pattern History Invariant Branch Prediction]].
By this I mean that the cannot change their prediction for a branch, e.g. from taken to not-taken, or vice versa, without having an intervening misprediction.

    Okay, I admit: "Pattern History Invariant" is a bad name. It is pompous, and not very intuitive. I welcome proposal for a different name. Something like "Branch predictor that can't change it's mind without a misprediction"?
      == Bad Joke == Oooh, how about "Can't change its mind without a fight"? I'm tempted to call it a Scottish, Irish, or a "My Wife" branch predictor - but I'll get in trouble for any and all of these, and only mention them because I am humor impaired.

= Why does this matter? =

== [[Timing Invariant Branch Prediction]] ==

Pattern history invariance can be a useful property, because it makes it easier to build
[[Timing Invariant Branch Prediction]].
Which has several good properties, such as making val

In particular, it means that delaying updating the pattern tables used to make predictions until retirement will NOT result in any more mispredictions than would otherwise occur
- any such mispredictions would already be on the wrong path, younger than a misprediction that would occur no matter what.

Delaying the pattern table update from, e.g. branch execution to retirement has the possible benefit that [[wrong path]] execution will not corrupt the branch predictor.

However, it may be pointed out that it is possible that there will be a performance loss because of other benefits of execution on the multiply speculative wrong path that speculatively updating a branch pattern history table may bring.
But to get this, you must have several nested effects:
* there must be a first branch misprediction pending
* a younger branch may update the predictor tables
* this may result in a misprediction underneath the first branch misprediction
* which may itself update the predictor tables
** possibly eliminating a misprediction when execution resumes after the first branch misprediction is repaired
** possibly producing instruction prefetch and other [[good wrong path effects]]

I.e. with [[Pattern History Invariant Branch Prediction]] there must be a misprediction to result in a change of prediction.
However, that misprediction may itself occur, and be repaired, speculatively.
And that misprediction may have other [[good wrong path effects]].

== [[BTB Unrolling]] ==

[[Pattern History Invariant Branch Prediction]]
also enables [[BTB Unrolling]]:
give a current (IP,history) tuple
and a separate pattern table used to make a prediction
one can "unroll" the predictor to produce a trace
of multiple, N, branch (from,to) addresses.
And such a trace unrolling is guaranteed to be as accurate as if the predictor were accessed one branch at a time.

= Examples of Pattern History Invariance =

= Examples of non-Pattern History Invariance =

I was tempted to say
    The most powerful example of a non-pattern history invariant branch predictor is a [[loop predictor]]. E.g. it may predict that the loop closing branch is taken 99 times in a row, and then not-taken the 100th time. Even more advanced loop predictors, such as predicting that if the last few loops were executed were 67,66,65, times, then the next will be executed 64 times, are even less pattern history invariant.
But this is not true: such a loop predictor will always make the same prediction, if the current loop count
is considered to be part of the history that used, along with the branch in question, to index
the pattern tables.

It is necessary to work ahrder to contrive branch predictors that are not pattern history invariant.

TBD: I believe that certain McFarling-style branch predictors may break the pattern history invariance property.
TBD: get the details right.

In particular, in a multi-predictor system such as McFarling,
if you update the predictors and choosers that are NOT being used as well as the predictors and choosers that are being used,
you can "mask" a misprediction in a second predictor,
if a first predictor was being used.
If the choice of which predictor should be used can then change in the pattern table so that
when the same (IP,history) pair is used at a later time the second predictor wins...

= See Also =

* [[Branch predictor state update]]
* [[Timing Invariant Branch Prediction]]

Timing Invariant Branch Prediction

{{Terminology Term}}
[[Category:Branch Prediction]]

By [[Timing Invariant Branch Prediction]] I mean branch prediction that is independent of timing.

E.g. a branch predictor design that, if you slowed down the clock or changed the pipeline, e.g. by inserting idle pipestages,
would not vary.

Such invariance is very convenient
(1) for validation
(2) for separation of concerns - it allows you to change the pipeline without worrying about its effect on branch prediction accuracy, etc.

However, many microarchitectures do not adhere to this design principle.

Tweaks and adjustments to the branch prediction microarchitecture may be necessary to attain [[Timing Invariant Branch Prediction]].
For example
* Per-branch history often leads to timing dependent branch prediction, which can be remedied by [[path history]]
* Updating the branch history tables can lead to timing dependent branch prediction

= Per-branch history often leads to timing dependent branch prediction =

For example, if you are using a [[per-branch history]] such as a [[TNT branch history]],
in many pipelines there are several clocks between instruction fetch,
i.e. delivering an instruction pointer to the instruction cache and branch predictor,
and instruction decode.
This means that an instruction fetch/branch prediction time you do not know where the branch instructions are;
you only know where the branch instructions are at decode time.

Therefore, unless you stall,
branch prediction may be using a [[per-branch history]] that is several cycles out of date,
and which may miss several branches
from what it would be ideally using if branches could be identified instantaneously.
Moreover, how many branches are missing may depend on timing, such as [[I$]] misses, pipeline stalls, etc.

== Path history enables timing independent branch prediction ==

If it is a problem to have timing dependent branch prediction
caused by per-branch history, this can be assuaged by [[path history]] branch prediction.

Instruction fetch does not necessarily know where the branches are. However, it necessarily does know the sequence of instruction fetch addresses.
If it is possible to create a path history suitable for use in a branch predictor,
e.g. by XORing the instruction fetch pointers,
then this [[path history]] is accurate and timing invariant.
Since XOR hashing is fast, this probably can be acheived.

However, XORing all fetch IPs may not be the best [[path history]] to use in branch prediction.
Creating a hash, probably XOR based, of [[branch from/to addresses]],
suffices to describe the path - although it losses information about branches between instruction fetch blocks.
Hashing such a [[branch from/to addresses]] [[path history]] with the current instruction fetch pointer
is about as good as you can do at instruction fetch,
without identifying individual instructions.

== Combining ... ==

[[Path history]] based branch predictors are usually reported as being more accurate than [[per-branch history TNT]],
so the pipeline adjustments above may help performance as well as providing [[Timing Invariant Branch Prediction]].

However, if they do not, you can obtain a hybrid that provides many of the benefits of [[timing invariant branch prediction]]
along with the possible improved accuracy of [[per-branch history]]:
* use [[path history]] at instruction fetch
* use [[per-branch history]] at the decoder, in a form of [[late-pipestage branch prediction]]

This gives you timing invariance,
but it also gives the [[decoder branch predictor]] the chance to make corrections to the earlier branch prediction.

= Branch History Update Time =

Q: when should the prediction tables, the [[pattern history table (PHT)]], also sometimes called the [[branch histogram]] or [[branch history table (BHT)]],
be updated?

At execution time, or at retirement time.

Updating at retirement time enables [[Timing Invariant Branch Prediction]]

Updating at execution time may cause only minor issues on an in-order machine.
On an out-of-order machine, however, updates may be done out of order.
In either case they may be done speculatively,
for branches that will not actually be retired because of earlier misspeculations.

Furthermore there arises the question of what history or [[stew]] is used to update the pattern history table.
If every branch at execution carries its history or stew with it, no problem.
But if a big complicated history is maintained only at retirement,
some processor designs have updated the pattern table for branches at execution
with a history corresponding to a position in the instruction stream several cycles before the branch.
Not necessarily a consistent number of cycles, either.

Updating the prediction tables a tretirement time seems to avoid these issues.

TBD: performance cost

TBD: latencies of table update - immaterial if [[pattern history invariant branch prediction]].

= Conclusion =

Is [[Timing Invariant Branch Prediction]] an absolutely vital design goal?

Not necessarily - if performance is increased by timing variant branch prediction, so be it.

However, [[Timing Invariant Branch Prediction]] is definitely a nice thing to have:
* it makes validation much easier
* it makes the design more robust, less fragile, less likely to break if you have to add a pipestage or stall late in the design cycle.
* it is usually associated with higher performance rather than lower performance branch prediction algorithms
* it can usually be achieved by a hybrid predictor design.

It is my experience that most timing dependent branch predictors
happened by accident, rather than by design:
* naive designers building a [[per-branch history]] out of a textbook
* naive extension of in-order designs to out-of-order exacerbating unnecessary timing dependence

[[Timing Invariant Branch Prediction]] is not necessarily a must-have,
but it is always a good thing to keep in mind when designing
your branch predictor and your microarchitecture/pipeline.
It is a pity to lose its benefits due to ignorance rather than deliberation.