Energy-Exposed Instruction Sets

Mark Hampton, Ronny Krashinsky, Emmett Witchel, Krste Asanovic
Modern instruction set architecture (ISA) styles such as RISC or VLIW are based on extensive research into the effects of instruction set design on performance, and provide a purely performance-oriented hardware-software interface. These ISAs avoid providing alternate ways to perform the same task unless it will increase performance significantly. Implementations of these ISAs perform many energy-consuming microarchitectural operations during execution of each user level instruction and these dominate total power dissipation. For example, when executing an integer add instruction on a simple RISC processor only around 5% of the total energy consumption is due to the adder circuitry itself. The rest is dissipated by structures such as cache tag and data arrays, TLBs, register files, pipeline registers, and pipeline control logic. For a complex out-of-order superscalar processor, the energy consumed by the adder is an even smaller fraction of the total. Modern machine pipelines have been refined to the point where most of the additional microarchitectural work is performed in a pipelined or parallel manner which does not affect the throughput or user-visible latency of a "simple" add instruction. Because their performance effects can be hidden, there is no incentive to expose these constituent micro-operations in a purely performance-oriented hardware-software interface --- their energy consumption is hidden from software.

Within the SCALE project, we are proposing new energy-exposed instruction sets [1, 4]. An energy-exposed instruction set provides software with alternative methods of executing an operation, possibly with the same performance, but where greater compile-time knowledge can be used to deactivate unnecessary portions of the machine microarchitecture. One example we have developed is tag-unchecked loads and stores [3]. We provide provide two types of load instruction, one that checks cache tags and one that does not. Both types of load take the same amount of time to execute, but where software is certain that there will be a cache hit, it can use the tag-unchecked version to save energy. Because there is no performance difference between these two versions, there would be no incentive to expose the alternative mechanism in a purely performance-oriented hardware-software interface. We have implemented C and Java compilers that eliminate up to 76% of all tag checks in benchmark programs [3].

The major drawback with any scheme that expose more machine state to software, such as tag-unchecked loads, is the handling of exceptions. Exception management is one of the main contributors to design complexity and run-time energy dissipation in modern machines. Highly parallel and out-of-order implementations must buffer temporary results to ensure earlier instructions have cleared all exception checks before committing changes to software-visible architectural state. To reduce exception energy, we introduce software restart markers [2, 4], which allow the compiler to annotate at which points it requires precise exception behavior. As execution passes each restart marker, the machine saves the program counter. Subsequently, if an exception occurs, program execution will resume at the last saved program counter. The hardware makes no other effort to save machine state, relying on software to ensure that the code executed between markers is idempotent, i.e., can be re-executed multiple times without changing final program results. Our results with C and Java compilers, show that even simple local compiler analyses can reduce the number of precise exception points by a factor of three. More importantly, this technique allows additional machine state to be made visible between restart points, enabling the introduction of more energy-exposed features without incurring additional exception management costs.

One example use of software restart markers is in a hybrid architecture we have devised, which adds software-visible accumulators to a conventional RISC architecture [2, 4]. Many register values in a computation are short-lived, being produced by one instruction, consumed by the next instruction, and then never used again. By exploiting the register lifetime information available to the compiler, these intermediate values can be allocated to the accumulators, avoiding register file reads and writes. The accumulators are only visible to software between software restart markers to avoid the need to save their values across exceptions. For C and Java programs, we have implemented energy-conscious compiler passes that eliminate a third of all register file writes [2, 4].

Status (September 2005)

We are currently developing the SCALE processor which includes an energy-exposed instruction set with software restart markers and software-visible accumulator registers.

Publications

[1] "Energy-Exposed Instruction Set Architectures", Krste Asanovic, Work In Progress Session, HPCA-6, Toulouse, France, January 2000. (PDF abstract, PDF slides)
[2] "Exposing Datapath Elements to Reduce Microprocessor Energy Consumption", Mark Hampton, S.M. Thesis, Massachusetts Institute of Technology, June 2001. (PDF)
[3] "Direct Addressed Caches for Reduced Power Consumption", Emmett Witchel, Sam Larsen, C. Scott Ananian, and Krste Asanovic, 34th International Symposium on Microarchitecture (MICRO-34), Austin, TX, December 2001. (PDF paper, PDF slides)
[4] "Energy-Exposed Instruction Sets", Krste Asanovic, Mark Hampton, Ronny Krashinsky, and Emmett Witchel, Power Aware Computing, Robert Graybill and Rami Melhem (Eds.), Kluwer Academic/Plenum Publishers, June 2002. (PDF paper, Book Website)

Funding

We gratefully thank the past and present sponsors of this work, including NSF, DARPA, Epoch-IT, and Infineon.