The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Monday, September 05, 2011

Bit Interleaving, Instruction Set Support for


= Instruction Set Support for Bit Interleaving =

== What is Bit Interleaving? ==

[[Bit interleaving]] is a very common form of multi-operand bit permutation.
It has been important enough to occasionally warrant special instruction set support.

2-way bit interleaving of A and B;

for i = 0 to N do
result.bit[2*i] := A.bit[i]
result.bit[2*i+1] := B.bit[i]

3-way bit interleaving off A, B, C:

for i = 0 to N do
result.bit[3*i] := A.bit[i]
result.bit[3*i+1] := B.bit[i]
result.bit[3*i+2] := C.bit[i]

M-way bit interleaving of A[i]

for i = 0 to N do
for j = 0 to M do
result.bit[N*i+j] := A[j].bit[i]

Note that since bit interleaving combines multiple operands into a single operand, issues of overflow may arise.
For example, the result is often used in address arithmetic,
in which case the inputs (aka the coordinates)
are probably not full width addresses.
The bit interleaving may be used in only a subrange of the input bits.

== Why Bit Interleaving? ==

See [[http://en.wikipedia.org/wiki/Morton_number_(number_theory) Wikipedia Morton numbers]].

Bit interleaving is often used as a component of hash functions.

In particular, bit interleaving of 2D and 3D coordinates in computer graphics
often produces hash functions that have better cache locality than uninterleaved coordinates used to index an array.

== Bit Interleaving Instructions ==

Intel Larrabee has 1:1 and 2:1 interleave instructions, in both scalar and vector form:

vbitinterleave11pi: 1:1 bit-interleave vectors
vbitinterleave21pi: 2:1 bit-interleave vectors

bitinterleave11: 1:1 bit-interleave scalar
bitinterleave21: 2:1 bit-interleave scalar

(See [[http://drdobbs.com/architecture-and-design/216402188?pgno=5 Mike Abrash's Dr. Dobb's article]] which comments that
"is useful for generating swizzled addresses, particularly in conjunction with vinsertfield, for example in preparation for texture sample fetches (volume textures in the case of vbitinterleave21pi".)

The 1:1 interleaving instruction obviously accomplishes 2D interleaving,
i.e. the interleaving of two coordinates for a 2D system.

The 2:1 interleaving instruction accoplishes 3D interleaving.
Or, rather, first one interleaves two coordinates, say X and Y, using 1:1,
and then one interleaves the third coordinate, Z, using 2:1.

4:1 interleaving is accomplished by two 1:1 interleaves.
However, operations on objects and spaces of more than 1D, 2D, or 3D are much less common.

== Expand (Interleave with 0) ==

Although 1:1 and 2:1 interleaving "fit" into commion instruction sets,
higher degrees of interleaving may not - too many operands may be required.

Some support may be afforded by bit expansion,
or interleaving with 0.
This takes 2 (or 3) inputs:
* the bitvector to which 0s will be inserted between elements
* the number of 0 bits to be inserted between elements
* possibly an offset to cause the result to be aligned appropriately for the last step.

The last step would be ORing the results together.
(Or equivalently, ANDing).