# **UC Santa Cruz**

# **UC Santa Cruz Previously Published Works**

# **Title**

Current-Mode Clock Distribution

# **Permalink**

https://escholarship.org/uc/item/49n227wd

# **Authors**

Islam, Riadul Guthaus, Matthew

# **Publication Date**

2014-06-01

Peer reviewed

# Current-Mode Clock Distribution

Riadul Islam, Matthew R. Guthaus

Department of CE, University of California Santa Cruz, Santa Cruz, CA 95064

{rislam,mrg}@ucsc.edu

Abstract—We propose a new paradigm for clock distribution that uses current, rather than voltage, to distribute a global clock signal with reduced power consumption. While current-mode (CM) signaling has been used in one-to-one signals, this is the first usage in a one-to-many clock distribution network. To accomplish this, we create a new high-performance current-mode pulsed flipflop (CMPFF) using a representative 45nm CMOS technology. When the CMPFF is combined with a CM transmitter, the first CM clock distribution network exhibits 45.2% lower average power compared to traditional voltage mode clocks.

#### I. INTRODUCTION

Portable electronic devices require long battery lifetimes which can only be obtained by utilizing low-power components. Recently, low-power design has become quite critical in synchronous Application Specific Integrated Circuits (ASICs) and System-on-Chips (SOCs) because interconnect in scaled technologies is consuming an increasingly significant amount of power. Researchers have demonstrated that the major consumers of this power are global buses, clock distribution networks (CDNs), and synchronous signals in general [1].

In addition to power, interconnect delay poses a major obstacle to high-frequency operation. Technology scaling reduces transistor and local interconnect delay while increasing global interconnect delay [2]. Moreover, conventional CDN structures are becoming increasingly difficult for multi-GHz ICs because skew, jitter, and variability are often proportional to large latencies [3].

Prior to and in early CMOS technologies, current-mode (CM) logic was an attractive high-speed signaling scheme [4]. CM logic, however, consumes significant static power to offer these high speeds. Because of this, standard CMOS voltage-mode (VM) signaling has been the *de facto* standard logic family for several decades.

Low-swing and current-mode signaling, however, are highly attractive solutions to help address the interconnect power and variability problems [1], [2], [5], [6]. Static power is typically more significant than dynamic power in CM signaling. However, this static power is often significantly less than VM dynamic power in global CM interconnects while latency is also improved. CM signaling schemes can also offer higher reliability since they are less susceptible to single-event transient upsets due to the absence of buffers with source/drain diffusion areas that can be hit by high-energy particles.

Previous CM schemes have been used for long global wires or, more commonly, off-chip signals. Standard logic signals, however, have remained VM to benefit from the low static power of CMOS logic. In our proposed scheme, it is not

practical to make each individual point-to-point segment of the CDN CM, but the clock signal should still benefit from the power and reliability of CM signaling. Instead, the power savings is maximized by creating a high-fanout symmetric (H-tree) distribution that feeds many CM flip-flop (FF) receivers. Logic signals on the FF receivers retain VM compatibility with low-power CMOS logic in the remainder of the chip.

In this paper, we present the first true CM CDN and a new CM pulsed D-type FF where the clock (CLK) input is a CM receiver and the data input (D) and output (Q) are VM. In particular, the key contributions of this paper are:

- The first demonstration of a CM clocked FF.
- The first demonstration of a symmetric H-tree CM CDN using the CM FF.
- The effective integration of the CM FF with VM CMOS logic.

The rest of the paper is organized as follows: Section II gives a brief overview of some existing CM signaling schemes. Section III proposes our CM FF and CDN. Section IV compares our new FF and CDN with existing schemes. Finally, Section V concludes the paper.

## II. OVERVIEW OF EXISTING CM SIGNALING SCHEMES

In a CM signaling scheme, a transmitter (Tx) utilizes a VM input signal to transmit a current with minimal voltage swing into an interconnect (transmission line), while a receiver (Rx) converts current-to-voltage providing a full swing output voltage. The representative CM scheme in Figure 1 uses a CMOS inverter as the Tx while the Rx is based on a transimpedance amp [7]. This scheme provides delay improvement over VM schemes, but the Rx voltage swings around a common-mode voltage ( $V_{CM}$ ) and any  $V_{CM}$  shift would cause a large CDN skew [8].



Fig. 1. Previous CM schemes used an expensive transimpedance amp Rx which could result in significant skew due to  $V_{CM}$  shift if applied to CDNs [7].

Other researchers have used a dynamic over-driving Tx with a strong and weak driver alongside a low-gain inverter amp Rx and a controlled current source that addresses the previous  $V_{CM}$  problem [2]. However, this scheme results in rise- and fall-time mismatch at the output [6] which can be problematic in CDNs. Variation-tolerant CM signaling schemes have used a CM Tx with corner-aware bias circuitry [6]. In this scheme, the inverter amp Rx circuit provides low-impedance to ground and holds the terminal point at the switching threshold. However, this comes at the expense of large static and dynamic power when compared to the other CM techniques and makes it unattractive compared to existing VM clock signaling.

### III. CURRENT-MODE CLOCKING

All of the previous CM signaling schemes perform current-to-voltage conversion and then use the buffered VM clock signal. However, driving the lowest level of a CDN with a full-swing voltage results in large dynamic power in addition to significant buffer area to drive the clock pin capacitances. Our CM scheme is highly integrated into the FFs that directly receive the CM signal to reduce overall power consumption and silicon area.

## A. Current-Mode Pulsed Flip-Flop (CMPFF)

Figure 2 shows our proposed CM Pulsed Flip-Flop (CMPFF) and the simulation results. The CMPFF uses an input current-comparator (CC) stage, a register stage, and a static storage cell. The CC stage compares the input push-pull current with a reference current and conditionally amplifies the clock to a full-swing voltage pulse that triggers the data to latch at the register stage. The feedback pulsed FF is in stark contrast to the previous CM schemes which utilized expensive Rx circuits and buffers to drive final FFs.

The choice of push-pull current enables a simple Tx circuit (discussed further in Section III-B) while maintaining a constant (or at least low-swing) bias voltage on the CDN interconnect. The CMPFF in Figure 2(a) is only sensitive to unidirectional push current which provides the positive edge trigger operation of the FF. This design is easily modified using a complementary current comparator into negative clock edge FF using the pull current.

In the input stage, the reference voltage generator (M1-M2) creates a reference current (Iref1) that is mirrored by M5 and generates I1. Similarly, the M3-M4 pair creates the FF reference current (Iref2) which is combined with the input current (i\_in); this current is then mirrored by M6 to I2.

It is possible to replace the reference voltage for M5 with a global reference which would increase robustness by reducing transistor mismatch between FFs. This would also save two transistors per FF and reduce static power with a negligible performance penalty. However, the reference would require global distribution and consume valuable metal routing resources which is thought to outweigh the benefits.

The mirrored currents I1 and I2 are compared using the inverting amp (A1) at node B and further extended to a CMOS logic level at node C by another inverting amp (A2).



(a) The proposed CMPFF uses current-comparator and feedback connection to generate a voltage pulse that triggers a register stage to store data in the storage cell.



(b) Simulation waveforms confirm the internal current-to-voltage pulse generation (clk\_p) that triggers input data capture.

Fig. 2. Proposed CMPFF and simulation results.

The inverter pair (X1-X2) generate the required voltage pulse duration before the feedback connection in M7.

The feedback connection from the generated voltage pulse with M7 quickly pulls down the current comparator node B which facilitates generating small voltage pulse (i.e. less than 50% duty cycle of input CLK) and results in fewer transistors in the register stage.

The register stage is similar to a single-phase register [9], but requires fewer transistors and has a reduced clock load compared to other pulsed FFs. The current-generated voltage pulse (clk\_p) triggers storing data in the output storage cell.

The sizing of M7 is critical to the voltage pulse; we use a minimum sized NMOS transistor with unity aspect ratio. The width of the generated voltage pulse (clk\_p) is also sensitive to the width and amplitude of input current (i\_in). The amplitude of i\_in strongly affects the FF performance by changing the operating point of M6 and adding extra delay to generated clk\_p signal. In order to achieve minimum C-to-Q delay, the ideal input current has a  $\pm 2.3 \mu A$  amplitude and 70ps.

## B. Current-Mode Transmitter and Distribution

In order to integrate the CMPFF, we need a reliable Tx that can provide a push-pull current into the clock network and distribute the required amount of current to each CMPFF. Our proposed CM CDN with Tx, interconnect, and the CMPFF is shown in Figure 3(a). The Tx receives a traditional voltage CLK from a PLL/clock divider at the root of the H-tree

network and supplies a pulsed current to the interconnect which is held at a near constant voltage. The clock distribution is a symmetric H-tree with equal impedances in each branch so that current is distributed equally to each CMPFF leaf node.

The pulsed current Tx in Figure 3(a) is similar to previous Tx circuits [2], [6], but we have used a NAND-NOR design. The NAND gate uses the CLK signal and a delayed inverted CLK signal, clkb, as inputs to generate a small negative pulse to briefly turn on M1. Hence, the PMOS transistor briefly sources charge from the supply while the NMOS is off. Similarly, the NOR gate utilizes the negative edge of the CLK and clkb signals to briefly turn on M2. Hence, the NMOS transistor briefly sinks current while the M1 is off. The non-overlapping input signals from the NAND-NOR gates remove any short circuit current from Tx.

The Tx M1 and M2 device sizes are adjusted to supply/sink charge into the CDN. The root wires of the CDN carry current that is distributed to all branches so the sizing of CDN wires are critical for both performance and reliability. If the resistance of the wire is too high, the current waveform magnitude and period will be distorted and affect performance of the CMPFFs. The wire width must also consider electromigration effects while carrying a total current to drive all the FFs with the required current amplitude and duration.



(a) The proposed CM Tx and CDN converts an VM input signal to a push-pull current with minimal interconnect voltage swing and distributes current equally to the CMPFFs.



(b) Simulation waveforms confirm a VM input is converted to a constant CDN voltage and a representative push-pull current at each CMPFF.

Fig. 3. Proposed CM-CDN and simulation results.

| Types of FF | Normalized<br>Area | Delay (ps) |       |       | Normalized Power (static + dynamic) |       |       |       |       |
|-------------|--------------------|------------|-------|-------|-------------------------------------|-------|-------|-------|-------|
|             |                    | C-Q        | $t_s$ | $t_h$ | 1.5 GHz                             | 2 GHz | 3 GHz | 4 GHz | 5 GHz |
| MS DFF      | 1                  | 37         | 21    | 5     | 1                                   | 1     | 1     | 1     | 1     |
| Tra. PFF    | 1.49               | 75.5       | -46   | 87    | 1.61                                | 1.57  | 1.41  | 1.4   | 1.4   |
| CMPFF       | 1.45               | 45         | -30   | 71    | 4.08                                | 3.37  | 2.47  | 1.91  | 1.61  |

IV. EXPERIMENTS

#### A. Experimental Setup

We implemented our proposed CMPFF, a traditional VM master-slave DFF (MS DFF), and a traditional VM pulsed FF (Tra. PFF) [10] in a representative 45nm CMOS technology [11]. Each FF is compatible with a standard cell library height of 12 horizontal M2 tracks. The layout areas, maximum clock-to-Q (C-Q) delay, setup times  $(t_s)$ , hold times  $(t_h)$ , and total power are listed in Table I. The performance of the FFs was evaluated using post-layout SPICE simulation at clock frequencies from 1.5-5GHz and a 1V supply voltage. The power considers input data at 100% activity and 4 minimum size inverter load.

In order to validate the functionality of the CM Tx and the proposed CMPFF in a CDN, we implemented a symmetric H-tree network spanning  $1.2mm \times 1.2mm$ . Each branch of clock tree is modeled as a lumped 3-component  $\Pi$ -model and then connected together to make a distributed CDN model. The interconnect unit capacitance and resistance values are from the 2009-2010 ISPD Clock Synthesis contest for 45nm CMOS technology [12]. The functional simulation results with the resulting output current are shown in Figure 3(b).

## B. CMPFF Analysis

The CMPFF consumes 2.9% less silicon area compared to the Tra. PFF and uses 25 transistors (including the local reference generator) while traditional MS DFF and Tra. PFF use 20 and 26 transistors, respectively.

The C-Q delays of the FFs are measured under relaxed timing conditions – the data is stable sufficiently before the arrival of the clock edge. This applies both to the rising edge of the VM signal and the current pulse for the CM clock. Table I shows the maximum C-Q delay for both high-to-low and low-to-high Q transitions. Clearly, the CMPFF has lower C-Q delay than the Tra. PFF but is only slightly slower than the MS DFF.

We also measured the  $t_s$  and  $t_h$  times for each FF. These use the common definition as the time margin that causes a C-Q delay increase of 10% beyond nominal. The  $t_s$  and  $t_h$  of the CMPFF are -30ps and 71ps, respectively. The setup time of the CMPFF is 2.43x lower than the traditional MS DFF.

Table I presents the total power including both static and dynamic. At low frequencies the CMPFF consumes higher power than the Tra. PFF and MS DFF due to a high static power overhead. However, the dynamic power of the CMPFF increases proportional to the frequency at a slower rate than the Tra. PFF. At high frequencies, the power consumption of the CMPFF is comparable to the Tra. PFF.

TABLE II OUR CM CDN saves 45.2% power on average compared to a VM CDN @ 3 GHz CLK.

| # sinks             | chip-edge<br>(mm) | VM signaling power normalized with respect to CM signaling |        |        |        |        |        |          |  |
|---------------------|-------------------|------------------------------------------------------------|--------|--------|--------|--------|--------|----------|--|
|                     |                   | VM CDN                                                     | VM FFs | CM CDN | CM FFs | VM Tot | CM Tot | % saving |  |
| 4                   | 0.48              | 4.93                                                       | 0.572  | 1      | 1      | 1.58   | 1      | 36.77    |  |
| 16                  | 0.96              | 7.80                                                       | 0.572  | 1      | 1      | 1.87   | 1      | 46.43    |  |
| 64                  | 1.92              | 9.64                                                       | 0.572  | 1      | 1      | 1.89   | 1      | 46.95    |  |
| 256                 | 3.84              | 9.67                                                       | 0.572  | 1      | 1      | 1.90   | 1      | 47.47    |  |
| 1024                | 7.69              | 28.36                                                      | 0.572  | 1      | 1      | 1.94   | 1      | 48.45    |  |
| Average Savings (%) |                   |                                                            |        |        |        |        |        |          |  |

The FF power, however, does not represent the overall power consumption of a CDN. In the next section, we show that the power savings in the CDN is worth the increase in CMPFF total power despite the additional static power.

#### C. Current-Mode CDN Analysis

Total system power consumption of a CDN includes both the CDN interconnect power and the FF power consumption. In a VM CDN, the dynamic switching power of the interconnect and clock load capacitances along with clock buffers dominate the power consumption. In a CM CDN, the power due to small fluctuations in  $V_{CM}$  and the Tx power contribute, but the static power of the CMPFF dominates. In both cases, the number of sinks and chip dimensions increase the total power consumption.

We use the same H-tree model in both the CM and VM CDN, but buffers drive the VM CDN instead of the CM Tx circuit. The VM buffered network is optimized for an output clock signal with less than 20ps slew at 3GHz. The VM CDN considers only the Tra. PFF [10] and not the MS DFF.

Table II shows the power breakdown of the VM and CM CDNs simulation of a typical clock at 3GHz. On average, our CM CDN consumes less power than the VM CDN for all sizes of CDN. This is due to the large dynamic power consumption due to the voltage swing  $(0\text{-to-}V_{dd})$  in the VM CDN, whereas the CM CDN has negligible voltage swing as shown in Figure 3(b).

As expected, the power of the CMPFFs is higher than the VM case, but this is a fixed ratio. The VM interconnect power dominates the CM FF power even at small H-tree sizes. The real advantage, however, is that the CM CDN power does not increase like the VM CDN power. Since the fluctuation of  $V_{CM}$  is relatively small, the dynamic power consumption of the CM CDN is negligible. At 3GHz in particular, the CM CDN system exhibits 36% to 48% total power savings considering 4 to 1024 sinks. As previously suggested in Table I, we can save more power at high frequencies due to the relative power consumptions of the Tra. PFF and CMPFF.

We used homogeneous wire sizing from the root to each sink, and verified the maximum current density of CM CDN in the root wire to be  $2KA/cm^2$  less than VM CDN,  $5KA/cm^2$ . This more than satisfies the ITRS suggestion that current density be limited to  $1.5MA/cm^2$ . Therefore, electromigration is not a problem for the demonstrated sizes.

In order to measure the noise immunity, we compare crosstalk noise simulations for both CM and VM. In scaled technologies, traditional VM schemes are most susceptible

when the aggressors are  $180^{\circ}$  out of phase compared to the victim line. We mimic the worst case crosstalk by considering 3 parallel interconnections (5mm long) driven by variable impedance drivers/buffers (VM). Each 5mm interconnect line was buffered/segmented every 1mm. In this case, simulation shows that victim line delay can increase up to 35%. In the CM design, two aggressors are driven by VM buffers, while the victim line is a CM Tx. Simulations suggest that the CM scheme exhibits negligible performance penalty and more robustness to noise because the CM victim line has a much larger capacitance without buffering. This means that the relatively short neighbouring VM aggressor lines have less crosstalk coupling and therefore less influence on CM delay.

#### V. CONCLUSION

In this paper, we presented the first true CM FF and its usage in a fully CM CDN. The proposed CMPFF is 60% faster, requires similar silicon area and consumes only 15% more power compared to a traditional PFF at 5GHz. Better yet, the CMPFF enables a 45.2% power reduction on average when used in a CM CDN compared to conventional VM CDNs. The CMPFF also eliminates the need for complex CM Rx circuitry and/or local VM buffers as in previously proposed CM signaling schemes.

#### ACKNOWLEDGMENTS

This work was supported in part by the National Science Foundation under grant CCF-1053838.

### REFERENCES

- H. Zhang, G. Varghese, and J. M. Rabaey, "Low swing on-chip signaling techniques: effectiveness and robustness," *TVLSI*, vol. 8, no. 3, pp. 264 – 272. Jun 2000.
- [2] A. Katoch, H. Veendrick, and E. Seevinck, "High speed current-mode signaling circuits for on-chip interconnects," in *ISCAS*, May 2005, pp. 4138 – 4141.
- [3] M. R. Guthaus, G. Wilke, and R. Reis, "Revisiting automated physical synthesis of high-performance clock networks," ACM TODAES, vol. 18, no. 2, pp. 31:1–31:27, Apr. 2013.
- [4] M. Yamashina and H. Yamada, "An MOS current mode logic (MCML) circuit for low-power sub-GHz processors," *IEICE Transactions on Electronics*, vol. E75-C, no. 10, pp. 1181–1187, 1992.
- [5] E. Seevinck, P. J. V. Beers, and H. Ontrop, "Current-mode techniques for high-speed VLSI circuits with application to current sense amplifier for CMOS SRAM's," *JSSC*, vol. 26, no. 4, pp. 525 – 536, Apr 1991.
- [6] M. Dave, M. Jain, S. Baghini, and D. Sharma, "A variation tolerant current-mode signaling scheme for on-chip interconnects," *IEEE TVLSI*, vol. PP, no. 99, pp. 1 – 12, Jan 2012.
- [7] A. Narasimhan, S. Divekar, P. Elakkumanan, and R. Sridhar, "A low-power current-mode clock distribution scheme for multi-GHz NoC-based SoCs," in *VLSI Design*, Jan 2005, pp. 130–135.
- [8] N. K. Kancharapu *et al.*, "A low-power low-skew current-mode clock distribution network in 90nm cmos technology," in *ISVLSI*, Jul 2011, pp. 132–137.
- [9] J. Yuan and C. Svensson, "High-speed CMOS circuit technique," Solid-State Circuits, IEEE Journal of, vol. 24, no. 1, pp. 62–70, 1989.
- [10] S. Kozu et al., "A 100 MHz, 0.4 W RISC processor with 200 MHz multiply adder, using pulse-register technique," in ISSCC, 1996, pp. 140– 141.
- [11] NCSU, "FreePDK45," http://www.eda.ncsu.edu/wiki/FreePDK45.
- [12] C. N. Sze et al., "Clocking and the ISPD'09 clock synthesis contest," in ISPD, Mar 2009, pp. 149 – 150.