The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Tuesday, March 06, 2012

malloc and transactyional memory (TM)


Dynamic memory allocation such as malloc is a classic example of why one may want memory accesses that can leave a transaxction.

Consider a simple mallocator, that allocates from a single queue or arena.
This might mean that multiple transactions, otherwise completely independent
might appear to interfere with each other through malloc.

(This interference might take several forms.
E.g. if the transactions are first performed totally independently, they may allocate the same memory.
E.g. or if the transactions malloc separate areas, but sequentially dependent, rollback of one might require rollback of the other.
(Although even this is an advanced TM topic, if the footprints overlap.))

Certainly, there can be malloc algorithms that minimize this - e.g. multiple areas, increasing the chance that different transactions might not interfere.

Or... the topic of this page:  permit malloc to exit the transaction
- so the [[parallel or concurrent]] mallocs may be properly synchronized, and receive independent memory addresses.

Q: what happens when a transaction aborts?
* Possibly could free the memory in the abort handler.  But this becomes less hardware, and more software TM.  Or at least a hybrid.
* Let garbage collection recover the memory speculative allocated within an aborted transaction.

(Q: what about [[deterministic finalization and aborted transactions]]?)

= Related Topics =

* [[speculative multithreading and memory allocation]]:
** [[malloc and speculative multithreading]]
** [[stack allocation and speculative multithreading]]

Pseudo-atomic - atomic operations that can fail


[[Pseudo-atomic]] is a term that I (Glew) have coined to refer to atomic operations that can fail to be atomic, such as:
* [[Load-linked/store-conditional (LL/SC)]]
* even [[hardware transactional memory]]
** such as [[Intel Transactional Synchronization Extensions (TSX)]]'s [[Restricted Transactional Memory (RTM)]], where "the [[XBEGIN]] instruction takes an operand that provides a relative offset to the fallback instruction address if the RTM region could not be successfully executed transactionally."

Q: what does IBM transactional memory provide? Is it pseudo-atomic, or does it provide guarantees?

I.e. these [[pseudo-atomic]] operations do not guarantee completion.
At least not at the instruction level.
While it is possible to imagine implementations that detect use of pseudo-atomic instruction sequences
and provide guarantees that certain code sequences will eventually complete.
such mechanisms are
(1) not necessarily architectural
and (2) more complicated that non-pseudo[-atomic instructions.

E.g. for [[LL/SC]] hardware could "pick up" the instructions that lie between the load-linked and the store-conditional,
and map them onto a vocabulary of atomic instructions such as [[fetch-and-op]] that is supported by the memory subsystem.
Similarly, [[LL/SC]] might be implemented using [[transactional memory (TM)]].

Intel's TSX documentation says

    RTM instructions do not have any data memory location associated with them. While
    the hardware provides no guarantees as to whether an RTM region will ever successfully
    commit transactionally, most transactions that follow the recommended guidelines
    (See Section 8.3.8) are expected to successfully commit transactionally.
    However, programmers must always provide an alternative code sequence in the fallback
    path to guarantee forward progress. This may be as simple as acquiring a lock
    and executing the specified code region non-transactionally. Further, a transaction
    that always aborts on a given implementation may complete transactionally on a
    future implementation. Therefore, programmers must ensure the code paths for the
    transactional region and the alternative code sequence are functionally tested.

GLEW OPINION: requiring alternate code paths has historically been a bad idea.
E.g. [[Intel Intanium ALAT]]. Now [[RTM (Restricted Transactional Memory)]]

Why, then, provide pseudo-atomicity?

* Pseudo-atomic operations allow complicated atomic operations to be built up out of simpler
* Plus, of course, it is easier than providing real atomicity.  Most of the time it works.  Most of the time may be good enough for many people, who may not care if it occasionally crashes when the seldom used alternate path is exercised.

New(ish) IBM z196 synchronization instructions


= New IBM z196 synchronization instructions =

The IBM z196 adds new [[synchronization instructions]] to the [[IBM System z Mainframe ISA]].

Augmenting the legacy instructions

The reference [share] comments "there is no need for a COMPARE AND SWAP loop to perform these operations!"
- their exclamation mark!
This suggests the motivation for these instructions
- while [[compare and swap]] is one of the most powerful synchronization instructions,
it is not necessarily efficient.
Atomic operations such as [[atomic add to memory]] can perform in one instruction,
without a loop, things that would require looping for [[compare and swap]], [[test-and-set]], and [[load-linked store-conditional]].
Looping that may require special mechanisms to guarantee forward progress.

The z196 [[interlocked-access facility]] instructions include

New atomic instructions:

    The "LOAD AND ..." part of these instructions' names is a bit misleading.
    These are really just [[fetch-and-op]] instructions to memory.
    Fetching the old value, performing the operation with a register input,
    storing the new value so produced,
    and returning the old value in a register.

    Flavours: add signed/unsigned, logical and/or/xor, 32/64 bits wide.

An interesting instruction that I feel must be called [[pseudo-atomic]], in much the same way [[LL/SC]] is [[pseudo-atomic]]:

    [[LOAD PAIR DISJOINT]] loads two separate, non-contiguous, memory locations (each of which must be naturally aligned),
    into an [[even/odd register pair]].
    It sets condition codes to indicate whether the fetch was atomic,
    or whether some other CPU or channel managed to sneak in a store between them.
    (The language in the [share] presentation suggests that the condition codes are only set if a store actually occurred,
    not "may have occurred" - since if thye latter, a correct implementation might always returned "failed to be atomic".)

    GLEW COMMENT: although undoubtedly most of the time such actions are atomic, it is not clear to me that there is any guarantee of forward progress,
    for a loop around [[LOAD PAIR DISJOINT]] that spins until atomicity is observed.

    I almost suspect that there may be plans to eventually provide some such guarantees,
    but that the detail of such guarantees does not want to be cast in concrete architecture at the time of introduction.

In addition, the existing instructions that perform [[add immediate to memory]], in signed/unsigned and 32b/64b forms, are declared to be atomic
when "the storage operand is aligned on an [[integral boundary]]" - IBM speak for [[natural alignment]].

    GLEW COMMENT: it is obviously desirable to be able to have a fetch-and-add immediate to memory, to avoid having to blow a register on the operand for the atomic modification.

    It is a bit unusual to be able to extend an existing instruction in this way.  If these existing instructions were in wide use, one would expect existing code to be somewhat slowed down.
    However (1) I suspect the trend towards simpler OOO instructions has already made these "add to memory" instructions slower,
    while on the other hand (2) advanced implementations make such atomic instructions neglibly slower than their non-atomic versions.

= [[Atomicity is always relative]] =

IBM literature says: "as observed by other CPUs and the channel subsystem, ... appear to occur atomically".

IBM also uses the phrase "block-concurrent interlocked update".  "Block concurrent" is a special IBM term related to memory ordering, that says that all bytes are accessed atomically as observed by other CPUs. However, they may be observed a byte at a time by channel programs... but "interlocked" means that channel program requests cannot be inserted between the bytes.
Bottom line: atomixc wrt CPUs and I/O channels.

= Reference =

Many references, scattered.

: New CPU Facilities in the IBM zEnterprise 196, share.confex.com, 4 August 2010, http://share.confex.com/share/115/webprogram/Handout/Session7034/New%20CPU%20Facilities%20in%20the%20z196.pdf

Sounds like a Fraudster Phone Call

I just received a phone call whose caller ID says "Credit Service", 11:59am, 1-701-661-1003.

Recorded message saying something like "This is your credit card company." Note: they did not say what company, just a generic phrase like "your credit card company".

Going on "There is no reason to be alarmed. You are eligible for a special reduction in interest rate to 6.9%. You must act quickly, because this special offer expires soon. Press 1 if you want to receive this special lower interest rate."

When I pressed 1, after some hold music, eventually I got what sounded like a human. He said something that again sounded, to my recollection, like "This is the charge department".

At this point I said "Hold on, you guys cold called me, so I need you to tell me what company you are, and how I can verify..." And they hung up.


Now, I must be careful, since I am reporting the exact caller ID as reported by my phone, including the phone number, but since my notes above as to what they and I said are only an approximate recollection.

I don't record all of my telephone calls. At least, not at this time.

If this was a legitimate business call, at the very least they exhibited bad customer service.

However, this is also exactly the sort of thing that a fraud operation might do: try to fool people into giving out account numbers over the phone, etc.


I wonder if there are an police systems to report this sort of thing to.

Oregon: http://www.doj.state.or.us/finfraud/engexplanation.shtml

Wish there was a national service.  FBI has a webpage, but looks like it is for actual fraud only, not suspicion.