Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Thursday, March 31, 2011

Hardware data structures

Computer software education (at least, if not the whole field) was greatly influenced by Niklaus Wirth's 1976 book
"Algorithms + Data Structures = Programs".
This popularized the importance of a programmer or software practitioner having a toolkit of datastructures,
and algorithms that employ or operate on such datastructures.

The situation in computer hardware is somewhat inchoate, less well defined.
Nevertheless the same [[hardware data structures]]
crop up over and over again in different applications.

I am not really sure what "algorithms" would be in the hardware version of "Algorithms + Data Structures = Programs".
Nor, for that matter, what "programs" would be.
Modules? Devices?
Sure, hardware implements algorithms, but it is hard to tease apart the concept of algorithms and data structures.
Moderately hard for software; especially so for hardware.

However, we should always keep in mind that hardware "algorithms" are just like software algorithms,
and data structures, except that implementation in hardware has diofferent constraint:
* size is limited - software often treats program size and data memory suze as infinite
* communications is restricted: whereas in software any module can talk to any other module or object, in hardware only a limited number of nearby communications are cheap.

== List of [[hardware data structures]] ==
* [[Arrays]]
** [[RAMs]] versus [[CAMs]]
** [[Encoded CAMs versus Decoded CAMs]]
-
* [[CAM]]s
** [[Equality match CAM]]s
** [[Priority CAM]]s
** [[Greater than CAM]]s
** [[Range CAM]]s
** [[Bitmask CAM]]s, a more general case of [[Ternary CAM]]s
-
* [[Pipelines]] - I'm not sure that these should be considered a [[hardware data structure]]
** [[Stalling pipelines]]
** [[Stall-free pipelines]]
*** [[Draining pipelines]]
*** [[Replay pipelines]]
** [[Bubble collapsing pipelines]]
-
* [[Queues and Buffers]]
** [[Collapsing queues]]
-
* [[Circular arrays]]
-
* [[FIFOs]]
** [[True data movement FIFOs versus array-based circular buffers]]
** [[Synchronizers (asynchronous logic)]]
*** [[Synchronization and clock domain crossing FIFOs]]
**** [[AMD synchronization FIFO]]
**** [[Intel Bubble Generating FIFO (BGF)]]
-
** [[Alignment issues for queues, buffers, and pipelines]]
** [[Holes in queues, buffers, and pipelines]]
-
[[Caches]]
* [[Fully associative in-array CAM tag matching]]
* [[Separate tag and data arrays]]
* [[N-way associative tag matching outside array]]
** [[index sequential]] - tag match, and then index data array
** [[index parallel]] - read out tags and data in parallel, then do a late mux
** [[in-array tag matching for N-way associative arrays]]
-
* [[Bloom Filters]]
** [[M of N Bloom filter]]
** [[N x 1 of M Bloom filter]]
** [[Pipelined Bloom filter]]
** [[Counted Bloom filter]]
-
* [[Arrays of 2-bit saturating counters]] crop up again and again, in predictors and elsewhere
-
* [[Permutation bit matrix]]es
-
* Schedulers
** [[Pickers versus packers]]
** [[Prioritizers]]
*** [[Carry chain prioritizer]]
**** [[Pick most extreme on one side]] - [[pick oldest (or youngest)]]
**** [[Pick most extreme on both sides]] - [[pick oldest and youngest]]
*** [[CAM matching prioritizer]]
**** [[Pick N-th]] - [[pick N-th oldest]]
*** [[Prioritizers after permutation bit matrix]]




In the aftermath of [[Mead and Conway]]'s "VLSI revolution"
there was a flurry of work on arbitrary hardware datastructures,
often 1:1 corresponding to software datastructures.
These include
* [[hardware sorting networks]]
* [[hardware heap sorting structures]]
* [[true hardware stack]]s
* [[true hardware FIFO]]s
Unfortunately, many of these have not proved practical in most computer applications.

== Duality between Hardware and Software Data Structures ==

We have long observed a regular pattern of [[duality between hardware and software data structures]].

This can be observed
* When implementing a hardware microarchitecture inside a software simulator.
* When converting an algorithm implemented in software into hardware
Oftentimes the most direct implementatin of a hardware data structure in software is stupidly slow, but there are regular patterns of good equivalences.

Such duals include

Hardware Software Comments

Equality match CAM

Hash table


Range or Greater Than CAM

No good SW equivalent


[[PLA]]s

[[Lookup table (LUT)]]

[[HW]] can handle irregular lookup tables:
different granularity of LUTs for different ranges.
[[SW]] must

The duality is useful,
not just for implementing HW in SW or vice versa,
but also because trends in VLSI scaling have meant, more and more, that the "traditional" hardware datastructures
must be foregone.
E.g. limiting the number of bits on a bit-line in a [[RAM array]] or a [[CAM match bitline]]
often means that that hash table like approaches should be used.

== Dual or equivalent implementations for [[hardware data structures]] ==

* [[Arrays are equivalent to multiple flopped registers outputting to a MUX]]

== Other [[hardware data structures]] ==

The ShuffleBox: a rectangular array of bits that can store, XOR and rotate all bits in all directions.
* Hala A. Farouk
* An efficient hardware data structure for cryptographic applications
* http://www.universitypress.org.uk/journals/cc/cc-10.pdf

=== Shifters ===

Various shift implementations - [[barrel shifter]]s,
[[cascaded multistage shifter]]s
- "feel" to me like the hardware instantiation of an algorithm more than a data structure,
since they are stateless.

Perhaps I should limit the concept of a [[hardware data structure]] to something that can store state for retrieval later.

= References =
* http://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures_%3D_Programs
* Wirth, Niklaus (1976) (in English). Algorithms + Data Structures = Programs. Prentice Hall. ISBN 978-0-13-022418-7. 0130224189.

Intel BGF mentioned in:
* ftp://download.intel.com/design/processor/datashts/320835.pdf

* http://en.wikipedia.org/wiki/Mead_%26_Conway_revolution