The basics of multithreading explained

On SystemLogic is a very extensive article published on the theory behind multithreading. As you know, placing a second processor in a system never automatically results in the system’s performance being twice as high. The reasons for this are all kinds of problems that both software and hardware encounter in an SMP environment. Things to consider, for example, are caches and latencies. One way to solve that is getting better and better and that is to bake a single chip that behaves like multiple processors. According to the latest rumors, Intel will introduce something like this in its Xeon processors in the course of 2002, but other companies such as IBM and Compaq are already working intensively on it. The various methods and of course also the disadvantages that this technology entails are discussed in detail.

Although it can generally be said without lying that the software must be programmed multithreaded to take advantage of an SMP or SMT system, this statement is not always necessarily true. A number of algorithms have been devised that analyze a program while it is running and, where possible, split it into multiple threads. This can make even better use of the available resources of a single-chip multithreading processor. Although one of the founders of this technique works at Intel, it is not clear whether Jackson Technology uses this kind of trick.

One of these methods is the so-called Slipstreaming, where each program is executed by the processor twice at the same time. The processor always looks ahead in one stream so that the second stream knows where the code is going before it actually executes. That’s handy, because the second stream might delete useless code at that point. The same code is then also plucked from the main stream and as a result a single-threaded program can run up to 50% faster on a multithreading processor, without extra programming. Although this all sounds very advanced, it probably won’t be too long before these kinds of ideas are incorporated into mainstream computers:

CMP uses two (or more) smaller cores to increase functional unit efficiency (removing horizontal waste), and has shown itself in new processors such as the Sun MAJC architecture, and the IBM POWER4; and the SledgeHammer from AMD should be too. CMT and FMT both use the ability to switch rapidly between threads to hide memory latencies, and decrease vertical waste. CMT can be found in the MAJC architecture, and FMT in the Terra Supercomputing architecture. SMT operates by running any thread, in any functional unit, on any clock, thus removing both horizontal, and vertical waste, and will be found in the Alpha 21464, and possibly in a later incarnation of the P4 architecture. DMT and Slipstreaming processors both aim at increasing single-thread performance.

Though the many varied forms of multithreading take very different approaches, the goal is the same: higher real-world throughput. All of these techniques allow additional functional units to be added to a processor, and show something more akin to return to scales than to diminishing returns. While previous processors tended not to go far beyond 4 functional units per processor, due to diminishing returns, there are now techniques available which allow more units to be added with increased efficiency. We shall see some strange architectures in the future…