Low-Power Microprocessor Design

Power consumption has become one of the primary design constraints for all types of microprocessor. We have been developing techniques that combine new circuit designs and microarchitectural algorithms to reduce both switching and leakage power in components that dominate energy consumption, including flip-flops, caches, datapaths, and register files.

Flip-flops and latches along with the clock networks used to drive them consume a significant fraction of total power in any synchronous digital system. We have developed activity-sensitive selection of flip-flops and latches, which uses local signal activity to determine the lowest energy structure to use at each point in a circuit [10]. The activity-sensitive approach gives large power savings over conventional transistor sizing. We have also investigated the effect of circuit loading on the energy-delay behavior of flip-flops, finding that both absolute load and electrical effort can change the optimal choice of flip-flop design [11]. We are currently experimenting with skewed static logic design styles [26] and have invented the currently fastest known static flip-flop (DPSCRFF) [13] with ideal properties for driving skewed logic.

Caches consume a significant fraction of total microprocessor power, particularly for embedded processors with simpler in-order architectures. Our research has looked in particular at highly-associative caches [7], which can be very energy-efficient and offer the potential for flexible reconfiguration. Many applications have working sets that vary during execution and which only require a subset of the available cache space. Our cache resizing techniques reduce the active portion of the processor cache to lower switching and leakage energy consumption with minimal impact on miss rate [16]. To reduce the bitline energy for each access, we have developed the use of dynamic compression with word-line gating for both instruction fetches [1] and data accesses [6]. Tag search can be especially energy-consuming in an associative cache, and we have developed schemes to reduce tag match energy for both instruction fetches [9] and data accesses [12]. We are continuing with research into reducing the energy of the large on-chip cache hierarchies that are expected to be standard on future microprocessors. We have also recently developed a cache energy-performance model for rapid exploration of the low-power cache design space [19].

Register files are another significant component of power consumption both for simpler embedded processors and for high-end superscalar processors. We have investigated the relative benefits of various techniques to reduce register file energy consumption, particularly how energy saving techniques interact [2, 5]. For modern superscalar processors, large highly multiported register files have a significant impact on area, power, and instruction latency. We have been developing control schemes for building bank-interleaved register files that can reduce the area of a multiported register file by a factor of three with minimal impact on processor performance [17]. We are extending this work for use within simultaneous-multithreaded processors, which have extreme register file capacity requirements [23]. We have further developed a new complexity-effective superscalar microarchitecture, RingScalar, which is based on extensive banking [27, 28].

We have also studied entire datapaths to find potential for power saving. We designed a complete low-power datapath of a MIPS RISC processor in 0.25um CMOS [4] to serve as a testbed for our datapath studies. To allow us to gather detailed energy-performance data for a whole processor layouts running large benchmark applications, we developed a detailed cycle-accurate power-performance C++ simulation framework called SyCHOSys [3, 8]. SyCHOSys executes seven orders of magnitude faster than SPICE, while measuring energy on each wire within the processor layout to within 10% of SPICE. More recently, we have investigated the power consumption of floating-point units [18].

As CMOS technology scales into the sub-100nm regime, static leakage current has emerged as a major contributor to system power consumption. Leakage current is concentrated in critical path circuitry, particularly after circuit design techniques are applied to reduce leakage in non-critical path circuits. To reduce leakage current in critical path components, we exploit the fact that many critical paths are only infrequently activated. We are developing fine-grain dynamic leakage reduction techniques that turn off small pieces of an active processor for short periods of time. We have developed families of leakage-biased circuits that use leakage currents themselves to bias internal nodes into low-leakage states, reducing the energy and delay overheads of switching into a low-leakage mode [14, 15]. Effective use of these circuits requires changes to processor microarchitectures to lengthen the time that circuit blocks can be put to sleep without affecting performance, and to give sufficient advance notice that a circuit block must be powered up before use. We are continuing to experiment with circuits and microarchitectures to reduce leakage, including dynamically resizable CMOS circuits that change transistor size on-the-fly depending on whether a circuit will be on the critical path [21].

The dramatic rise in processor power consumption has made thermal management a key design challenge. In particular, certain components of a processor such as register files and execution units can have over twenty times the power density of other cooler components on the die. As a result, hot spots appear on a die due to the poor thermal conductivity of silicon. We have been investigating the use of activity migration [20] to mitigate the hot spot problem. Activity migration reduces peak die temperature by moving computation between redundant units, allowing the first unit to cool down while the second unit takes over.

To help understand the possibilities for power reduction, we have been analyzing optimal pipelining and communication strategies for deep submicron designs [22, 24, 25].

Publications

[1] "Reducing Instruction Cache Energy Using Gated Wordlines", Mukaya Panich, M.Eng. Thesis, Massachusetts Institute of Technology, August 1999. (PDF)
[2] "Energy-Efficient Register File Design", Jessica Tseng, S.M. Thesis, Massachusetts Institute of Technology, December 1999. (PDF)
[3] "SyCHOSys: Compiled Energy-Performance Cycle Simulation", Ronny Krashinsky, Seongmoo Heo, Michael Zhang, and Krste Asanovic, Workshop on Complexity-Effective Design, ISCA-27, Vancouver, BC, Canada, June 2000. (PDF paper, PDF slides, PPT slides)
[4] "A Low-power 32-bit Datapath Design", Seongmoo Heo, S.M. Thesis, Massachusetts Institute of Technology, August 2000. (PDF)
[5] "Energy-Efficient Register Access", Jessica Tseng and Krste Asanovic, XIII Symposium on Integrated Circuits and System Design (SBCCI2000), Manaus, Amazonas, Brazil, September 2000. (PDF paper, PDF slides, PPT slides)
[6] "Dynamic Zero Compression for Cache Energy Reduction", Luis Villa, Michael Zhang, and Krste Asanovic, 33rd International Symposium on Microarchitecture (MICRO-33), Monterey, CA, December 2000. (PDF paper, PDF slides, PPT slides)
[7] "Highly-Associative Caches for Low-Power Processors", Michael Zhang and Krste Asanovic, Kool Chips Workshop, MICRO-33, Monterey, CA, December 2000. (PDF paper, PDF slides, PPT slides)
[8] "Microprocessor Energy Characterization and Optimization through Fast, Accurate, and Flexible Simulation", Ronny Krashinsky, S.M. Thesis, Massachusetts Institute of Technology, May 2001. (PDF)
[9] "Way Memoization to Reduce Fetch Energy in Instruction Caches", Albert Ma, Michael Zhang, and Krste Asanovic, Workshop on Complexity-Effective Design, ISCA-28, Goteborg, Sweden, June 2001. (PDF paper)
[10] "Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy", Seongmoo Heo, Ronny Krashinsky, and Krste Asanovic, 19th Conference on Advanced Research in VLSI (ARVLSI'01), Salt Lake City, UT, March 2001. (PDF paper, PDF slides, PPT slides)
[11] "Load-Sensitive Flip-Flop Characterization", Seongmoo Heo and Krste Asanovic, IEEE Workshop on VLSI, Orlando, FL, April 2001. (PDF paper, PDF slides, PPT slides)
[12] "Direct Addressed Caches for Reduced Power Consumption", Emmett Witchel, Sam Larsen, C. Scott Ananian, and Krste Asanovic, 34th International Symposium on Microarchitecture (MICRO-34), Austin, TX, December 2001. (PDF paper, PDF slides)
[13] "A Double-Pulsed Set-Conditional-Reset Flip-Flop", Albert Ma and Krste Asanovic, MIT LCS Technical Report, MIT-LCS-TR-844, May 2002. (PDF report)
[14] "Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines", Seongmoo Heo, Kenneth Barr, Mark Hampton, and Krste Asanovic, 29th International Symposium on Computer Architecture (ISCA-29), Anchorage, AK, May 2002. (PDF paper, PPT slides, PDF slides)
[15] "Leakage-Biased Domino Circuits for Dynamic Fine-Grain Leakage Reduction", Seongmoo Heo and Krste Asanovic, VLSI Circuits Symposium, Honolulu, HI, June 2002. (PDF paper, PPT slides, PDF slides)
[16] "Miss Tags for Fine-Grain CAM-Tag Cache Resizing", Michael Zhang and Krste Asanovic, International Symposium on Low Power Electronics and Design (ISLPED-2002), Monterey, CA, August 2002. (PDF paper, PPT slides)
[17] "Banked Multiported Register Files for High-Frequency Superscalar Microprocessors", Jessica Tseng and Krste Asanovic, 30th International Symposium on Computer Architecture (ISCA-30), San Diego, CA, June 2003. (PDF paper, PDF slides, PPT slides)
[18] "Low-Power Single-Precision IEEE Floating-Point Unit", Sheetal Jain, M.Eng. Thesis, Massachusetts Institute of Technology, May 2003. (PDF)
[19] "ZOOM: A Performance-Energy Cache Simulator", Regina Sam, M.Eng. Thesis, Massachusetts Institute of Technology, May 2003. (PDF)
[20] "Reducing Power Density through Activity Migration", Seongmoo Heo, Ken Barr, and Krste Asanovic, International Symposium on Low Power Electronics and Design (ISLPED'03), Seoul, Korea, August 2003. (PDF)
[21] "Dynamically Resizable Static CMOS Logic for Fine-Grain Leakage Reduction", Seongmoo Heo and Krste Asanovic, MIT LCS Technical Report, MIT-LCS-TR-957, July 2004. (PDF report)
[22] "Power-Optimal Pipelining in Deep Submicron Technology", Seongmoo Heo and Krste Asanovic, International Symposium on Low Power Electronics and Design (ISLPED'04), Newport Beach, CA, August 2004. (PDF)
[23] "A Speculative Control Scheme for an Energy-Efficient Banked Register File", Jessica H. Tseng and Krste Asanovic, IEEE Transactions on Computers, 54(6):741-751, June 2005. (PDF)
[24] "Replacing Global Wires with an On-Chip Network: A Power Analysis", Seongmoo Heo and Krste Asanovic, International Symposium on Low Power Electronics and Design (ISLPED'05), San Diego, CA, August 2005. (PDF)
[25] "Optimal Digital System Design in Deep Submicron Technology", Seongmoo Heo, Ph.D. Thesis, Massachusetts Institute of Technology, January 2006. (PDF)
[26] "Circuits for High-Performance Low-Power VLSI Logic", Albert Ma, Ph.D. Thesis, Massachusetts Institute of Technology, May 2006. (PDF)
[27] "Banked Microarchitectures for Complexity-Effective Superscalar Microprocessors", Jessica H. Tseng, Ph.D. Thesis, Massachusetts Institute of Technology, May 2006. (PDF)
[28] "RingScalar: A Complexity-Effective Out-of-Order Superscalar Microarchitecture", Jessica H. Tseng and Krste Asanovic, MIT CSAIL Technical Report, MIT-CSAIL-TR-2006-066, September 2006. (PDF report)
[29] "Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy", Seongmoo Heo, Ronny Krashinsky, and Krste Asanovic, IEEE Transactions on VLSI Systems, 15(9), September 2007. (PDF)

Funding

We gratefully thank the past and present sponsors of this work, including DARPA, NSF, CMI, IBM, and Intel.